ovirt 3.6 and self hosted engine: clarification on datacenter input

tried latest version with http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm that used ovirt-hosted-engine-setup-1.3.0-1.el7.centos.noarch vdsm-4.17.10-0.el7.centos.noarch I presume it is the latest RC available. after install, inside the sh engine webadmin portal I see: oVirt Engine Version: 3.6.0.1-1.el7.centos During storage configuration I choose: --== STORAGE CONFIGURATION ==-- During customization use CTRL-D to abort. Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: Please specify the full shared storage connection path to use (example: host:/path): ovc71 .localdomain.local:/SHE_DOMAIN ESC[92m[ INFO ]ESC[0m Installing on first host Please provide storage domain name. [hosted_storage]: she_sdomain Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI. Please enter local datacenter name [hosted_datacenter]: she_datacenter I don't understand the meaning of the sentence above: Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI. How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)? Also, I know there is open bug https://bugzilla.redhat.com/show_bug.cgi?id=1269768 But it seems I'm not able to to import the storage domain... In events, when I import, I have the sequence Storage Domain she_sdomain was added by admin@internal VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Failed to attach Storage Domains to Data Center Default. (User: admin@internal) Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal) What should be the flow to compensate the bug? Do I have actually to attache it to "Default" datacenter or what? Is it expected to be fixed before 3.6? Thanks, Gianluca

On Mon, Oct 26, 2015 at 6:26 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
tried latest version with http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm that used ovirt-hosted-engine-setup-1.3.0-1.el7.centos.noarch vdsm-4.17.10-0.el7.centos.noarch
I presume it is the latest RC available.
after install, inside the sh engine webadmin portal I see:
oVirt Engine Version: 3.6.0.1-1.el7.centos
During storage configuration I choose:
--== STORAGE CONFIGURATION ==--
During customization use CTRL-D to abort. Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]:
Please specify the full shared storage connection path to use (example: host:/path): ovc71 .localdomain.local:/SHE_DOMAIN ESC[92m[ INFO ]ESC[0m Installing on first host Please provide storage domain name. [hosted_storage]: she_sdomain Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI. Please enter local datacenter name [hosted_datacenter]: she_datacenter
I don't understand the meaning of the sentence above:
Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.
It's just an internal label. I think we can just remove that question always using the default value and nothing will change.
How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)?
Also, I know there is open bug
https://bugzilla.redhat.com/show_bug.cgi?id=1269768
But it seems I'm not able to to import the storage domain... In events, when I import, I have the sequence
Storage Domain she_sdomain was added by admin@internal VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Failed to attach Storage Domains to Data Center Default. (User: admin@internal) Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal)
What should be the flow to compensate the bug? Do I have actually to attache it to "Default" datacenter or what? Is it expected to be fixed before 3.6?
Postponing to 3.6.1 not being identified as a blocker. You can try to add the first additional storage domain for other VMs. The datacenter should came up and at that point you try importing the hosted-engine storage domain. You cannot add other VMs to that storage domain neither you'll can when the auto-import will work.
Thanks, Gianluca
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Tue, Oct 27, 2015 at 5:06 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
I don't understand the meaning of the sentence above:
Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.
It's just an internal label. I think we can just remove that question always using the default value and nothing will change.
Probably better.
How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)?
Also, I know there is open bug
https://bugzilla.redhat.com/show_bug.cgi?id=1269768
But it seems I'm not able to to import the storage domain... In events, when I import, I have the sequence
Storage Domain she_sdomain was added by admin@internal VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Failed to attach Storage Domains to Data Center Default. (User: admin@internal) Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal)
What should be the flow to compensate the bug? Do I have actually to attache it to "Default" datacenter or what? Is it expected to be fixed before 3.6?
Postponing to 3.6.1 not being identified as a blocker.
But is this a regression from 3.5.x or did this problem exist also in all the 3.5 versions where sh engine was in place?
You can try to add the first additional storage domain for other VMs. The datacenter should came up and at that point you try importing the hosted-engine storage domain. You cannot add other VMs to that storage domain neither you'll can when the auto-import will work.
So I was indeed able to add a separate data NFS domain and able to attach it to the default DC that came then up as an active one. Then tried to import/attach also the sh engine domain; it went in locked state but then the sh engine VM itself went down (no qemu process on hypervisor). In /var/log/libvirt/qemu/HostedEngine.log of hypervisor I can see 2015-10-28 13:59:02.233+0000: shutting down expected? what now to have the sh engine come up again and see what happened? Any logs on hypevisor to check? In /var/log/sanlock.log 2015-10-28 14:57:14+0100 854 [829]: s4 lockspace 3662a51f-39de-4533-97fe-d49bf98e2d43:1:/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/3662a51f-39de-4533-97fe-d49bf98e2d43/dom_md/ids:0 2015-10-28 14:57:34+0100 874 [829]: s4:r3 resource 3662a51f-39de-4533-97fe-d49bf98e2d43:SDM:/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/3662a51f-39de-4533-97fe-d49bf98e2d43/dom_md/leases:1048576 for 4,17,1698 2015-10-28 14:57:35+0100 875 [825]: s4 host 1 1 854 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:57:35+0100 875 [825]: s4 host 250 1 0 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:59:00+0100 960 [830]: s1:r4 resource 9f1ec45d-0c32-4bfc-8b67-372d6f204fd1:SDM:/rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN/9f1ec45d-0c32-4bfc-8b67-372d6f204fd1/dom_md/leases:1048576 for 4,17,1698 2015-10-28 14:59:02+0100 962 [825]: s1 kill 3341 sig 9 count 1 2015-10-28 14:59:02+0100 962 [825]: dead 3341 ci 2 count 1 2015-10-28 14:59:08+0100 968 [830]: s5 lockspace 9f1ec45d-0c32-4bfc-8b67-372d6f204fd1:1:/rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN/9f1ec45d-0c32-4bfc-8b67-372d6f204fd1/dom_md/ids:0 2015-10-28 14:59:30+0100 990 [825]: s5 host 1 4 968 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:59:30+0100 990 [825]: s5 host 250 1 0 aa89bb89-20a1-414b-8ee3-0430fdc330f8.ovc71.loca /var/log/vdsm/vdsm.log Thread-1247::DEBUG::2015-10-28 14:59:00,043::task::993::Storage.TaskManager.Task::(_decref) Task=`56 dd2372-f454-4188-8bf3-ab543d677c14`::ref 0 aborting False Thread-1247::ERROR::2015-10-28 14:59:00,096::API::1847::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '9f1ec45d-0c32-4bfc-8b67-372d6f204fd1'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Thread-1247::INFO::2015-10-28 14:59:00,112::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Reques t handler for 127.0.0.1:42165 stopped Thread-1248::DEBUG::2015-10-28 14:59:00,137::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StoragePool.connectStorageServer' in bridge with {u'connectionParams': [{u'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8', u'connection': u'ovc71.localdomain.local:/SHE_DOMAIN', u'iqn': u'', u'user': u'', u'protocol_version': u'3', u'tpgt': u'1', u'password': '********', u'port': u''}], u'storagepoolID': u'00000000-0000-0000-0000-000000000000', u'domainType': 1} Thread-1248::DEBUG::2015-10-28 14:59:00,138::task::595::Storage.TaskManager.Task::(_updateState) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::moving from state init -> state preparing Thread-1248::INFO::2015-10-28 14:59:00,139::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8', u'connection': u'ovc71.localdomain.local:/SHE_DOMAIN', u'iqn': u'', u'user': u'', u'protocol_version': u'3', u'tpgt': u'1', u'password': '********', u'port': u''}], options=None) Thread-1248::DEBUG::2015-10-28 14:59:00,142::fileUtils::143::Storage.fileUtils::(createdir) Creating directory: /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN mode: None Thread-1248::DEBUG::2015-10-28 14:59:00,143::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 ovc71.localdomain.local:/SHE_DOMAIN /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN (cwd None) Thread-1248::DEBUG::2015-10-28 14:59:00,199::hsm::2405::Storage.HSM::(__prefetchDomains) nfs local path: /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN Thread-1248::DEBUG::2015-10-28 14:59:00,201::hsm::2429::Storage.HSM::(__prefetchDomains) Found SD uuids: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1',) Thread-1248::DEBUG::2015-10-28 14:59:00,202::hsm::2489::Storage.HSM::(connectStorageServer) knownSDs : {9f1ec45d-0c32-4bfc-8b67-372d6f204fd1: storage.nfsSD.findDomain, 3662a51f-39de-4533-97fe-d49bf98e2d43: storage.nfsSD.findDomain} Thread-1248::INFO::2015-10-28 14:59:00,202::logUtils::51::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}]} Thread-1248::DEBUG::2015-10-28 14:59:00,202::task::1191::Storage.TaskManager.Task::(prepare) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::finished: {'statuslist': [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}]} Thread-1248::DEBUG::2015-10-28 14:59:00,202::task::595::Storage.TaskManager.Task::(_updateState) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::moving from state preparing -> state finished Thread-1248::DEBUG::2015-10-28 14:59:00,203::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-1248::DEBUG::2015-10-28 14:59:00,203::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-1248::DEBUG::2015-10-28 14:59:00,203::task::993::Storage.TaskManager.Task::(_decref) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::ref 0 aborting False Thread-1248::DEBUG::2015-10-28 14:59:00,203::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'StoragePool.connectStorageServer' in bridge with [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}] Thread-1249::DEBUG::2015-10-28 14:59:00,218::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.getInfo' in bridge with {u'storagedomainID': u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1'} Current filesystem layout on hypervisor, but still withou any qemu process for the hosted engine: [root@ovc71 log]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 27G 2.6G 24G 10% / devtmpfs 4.9G 0 4.9G 0% /dev tmpfs 4.9G 4.0K 4.9G 1% /dev/shm tmpfs 4.9G 8.6M 4.9G 1% /run tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup /dev/mapper/OVIRT_DOMAIN-NFS_DOMAIN 20G 36M 20G 1% /NFS_DOMAIN /dev/mapper/OVIRT_DOMAIN-SHE_DOMAIN 25G 2.9G 23G 12% /SHE_DOMAIN /dev/mapper/OVIRT_DOMAIN-ISO_DOMAIN 5.0G 33M 5.0G 1% /ISO_DOMAIN /dev/sda1 497M 130M 368M 27% /boot ovc71.localdomain.local:/NFS_DOMAIN 20G 35M 20G 1% /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN ovc71.localdomain.local:/SHE_DOMAIN 25G 2.9G 23G 12% /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN

This is a multi-part message in MIME format. --------------040803090001080908090005 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 28-10-2015 15:12, Gianluca Cecchi wrote:
On Tue, Oct 27, 2015 at 5:06 PM, Simone Tiraboschi <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
I don't understand the meaning of the sentence above:
Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.
It's just an internal label. I think we can just remove that question always using the default value and nothing will change.
Probably better.
How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)?
Also, I know there is open bug
https://bugzilla.redhat.com/show_bug.cgi?id=1269768
But it seems I'm not able to to import the storage domain... In events, when I import, I have the sequence
Storage Domain she_sdomain was added by admin@internal VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Failed to attach Storage Domains to Data Center Default. (User: admin@internal) Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal)
I have a hosted-engine installation on my laptop (F22) and have the exact same error message about sanlock when activating the hosted-engine storage domain. Situation as follows: - export a NFS share from local disk for hosted-engine vm - export a NFS share from local disk for regular data domain - export a NFS share from local disk for iso/export domain. Hosted engine setup (Centos7) completes without problems, using the webui works too, importing the hosted-engine storage domain is also OK but then activating it gives the dreaded sanlock error. Have started with a clean slate a couple of times but it happens everytime. Regular ovirt-engine setup from the same repository works fine. Time permitting I'll do a clean slate install and make available all the logs. Joop --------------040803090001080908090005 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 28-10-2015 15:12, Gianluca Cecchi wrote:<br> </div> <blockquote cite="mid:CAG2kNCxhxSYtiz928nk+uAC_Cn-1LVrhbzqSzTaYxjZWK-9cWA@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Tue, Oct 27, 2015 at 5:06 PM, Simone Tiraboschi <span dir="ltr"><<a moz-do-not-send="true" href="mailto:stirabos@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:stirabos@redhat.com">stirabos@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><br> <div class="gmail_extra"> <div class="gmail_quote"><span class=""> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div><span> <div><br> </div> <div>I don't understand the meaning of the sentence above:</div> <div><br> </div> <div> <div> Local storage datacenter name is an internal name</div> <div> and currently will not be shown in engine's admin UI.</div> </div> <div><br> </div> </span></div> </div> </div> </div> </div> </blockquote> <div><br> </div> </span> <div>It's just an internal label. I think we can just remove that question always using the default value and nothing will change.</div> </div> </div> </div> </blockquote> <div><br> </div> <div>Probably better.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"><span class=""> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div><span> <div><br> </div> <div>How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)?</div> <div><br> </div> <div>Also, I know there is open bug</div> <div><br> </div> <div><a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1269768" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1269768</a><br> </div> <div><br> </div> <div>But it seems I'm not able to to import the storage domain...</div> <div>In events, when I import, I have the sequence</div> <div><br> </div> <div>Storage Domain she_sdomain was added by admin@internal<br> </div> <div>VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))<br> </div> <div>Failed to attach Storage Domains to Data Center Default. (User: admin@internal)<br> </div> <div>Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal)<br> </div> <div><br> </div> </span> </div> </div> </div> </div> </div> </blockquote> </span></div> </div> </div> </blockquote> </div> </div> </div> </blockquote> I have a hosted-engine installation on my laptop (F22) and have the exact same error message about sanlock when activating the hosted-engine storage domain.<br> Situation as follows:<br> - export a NFS share from local disk for hosted-engine vm<br> - export a NFS share from local disk for regular data domain<br> - export a NFS share from local disk for iso/export domain.<br> Hosted engine setup (Centos7) completes without problems, using the webui works too, importing the hosted-engine storage domain is also OK but then activating it gives the dreaded sanlock error.<br> Have started with a clean slate a couple of times but it happens everytime. Regular ovirt-engine setup from the same repository works fine.<br> <br> Time permitting I'll do a clean slate install and make available all the logs.<br> <br> Joop<br> </body> </html> --------------040803090001080908090005--

On Wed, Oct 28, 2015 at 3:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
So I was indeed able to add a separate data NFS domain and able to attach it to the default DC that came then up as an active one. Then tried to import/attach also the sh engine domain; it went in locked state but then the sh engine VM itself went down (no qemu process on hypervisor). In /var/log/libvirt/qemu/HostedEngine.log of hypervisor I can see
2015-10-28 13:59:02.233+0000: shutting down
expected?
what now to have the sh engine come up again and see what happened? Any logs on hypevisor to check?
Update: after about 10 minutes the sh engine VM was powered on again automatically and the default datacenter came up again. For about 1-2 minutes vdsm process went 100% cpu then stabilized But the sh engine storage domain results as in this screenshot: https://drive.google.com/file/d/0BwoPbcrMv8mvSTVqTEhYVkJEVWc/view?usp=sharin... What I see in events of sh engine storage domain is: The Hosted Engine Storage Domain doesn't no exist. It should be imported into the setup. Gianluca

On Wed, Oct 28, 2015 at 3:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Tue, Oct 27, 2015 at 5:06 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
I don't understand the meaning of the sentence above:
Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.
It's just an internal label. I think we can just remove that question always using the default value and nothing will change.
Probably better.
How is the chosen "she_datacenter" name related with the "Default" datacenter where the hypervisor is put in? Do I have to manually create it (I don't see this se_datacenter in webadmin portal)?
Also, I know there is open bug
https://bugzilla.redhat.com/show_bug.cgi?id=1269768
But it seems I'm not able to to import the storage domain... In events, when I import, I have the sequence
Storage Domain she_sdomain was added by admin@internal VDSM ovc71.localdomain.local command failed: Cannot acquire host id: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Failed to attach Storage Domains to Data Center Default. (User: admin@internal) Failed to attach Storage Domain she_sdomain to Data Center Default. (User: admin@internal)
What should be the flow to compensate the bug? Do I have actually to attache it to "Default" datacenter or what? Is it expected to be fixed before 3.6?
Postponing to 3.6.1 not being identified as a blocker.
But is this a regression from 3.5.x or did this problem exist also in all the 3.5 versions where sh engine was in place?
It's not a regression cause the hosted-engine storage domain wasn't neither visible in 3.5. Once again, also if you see it in the engine you cannot use it for anything apart from the engine VM itself, you still have to add another storage domain for regular VMs.
You can try to add the first additional storage domain for other VMs. The datacenter should came up and at that point you try importing the hosted-engine storage domain. You cannot add other VMs to that storage domain neither you'll can when the auto-import will work.
So I was indeed able to add a separate data NFS domain and able to attach it to the default DC that came then up as an active one. Then tried to import/attach also the sh engine domain; it went in locked state but then the sh engine VM itself went down (no qemu process on hypervisor). In /var/log/libvirt/qemu/HostedEngine.log of hypervisor I can see
2015-10-28 13:59:02.233+0000: shutting down
expected?
what now to have the sh engine come up again and see what happened? Any logs on hypevisor to check?
In /var/log/sanlock.log 2015-10-28 14:57:14+0100 854 [829]: s4 lockspace 3662a51f-39de-4533-97fe-d49bf98e2d43:1:/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/3662a51f-39de-4533-97fe-d49bf98e2d43/dom_md/ids:0 2015-10-28 14:57:34+0100 874 [829]: s4:r3 resource 3662a51f-39de-4533-97fe-d49bf98e2d43:SDM:/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/3662a51f-39de-4533-97fe-d49bf98e2d43/dom_md/leases:1048576 for 4,17,1698 2015-10-28 14:57:35+0100 875 [825]: s4 host 1 1 854 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:57:35+0100 875 [825]: s4 host 250 1 0 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:59:00+0100 960 [830]: s1:r4 resource 9f1ec45d-0c32-4bfc-8b67-372d6f204fd1:SDM:/rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN/9f1ec45d-0c32-4bfc-8b67-372d6f204fd1/dom_md/leases:1048576 for 4,17,1698 2015-10-28 14:59:02+0100 962 [825]: s1 kill 3341 sig 9 count 1 2015-10-28 14:59:02+0100 962 [825]: dead 3341 ci 2 count 1 2015-10-28 14:59:08+0100 968 [830]: s5 lockspace 9f1ec45d-0c32-4bfc-8b67-372d6f204fd1:1:/rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN/9f1ec45d-0c32-4bfc-8b67-372d6f204fd1/dom_md/ids:0 2015-10-28 14:59:30+0100 990 [825]: s5 host 1 4 968 1bfba2b1-2353-4d4e-9000-f97585b54df1.ovc71.loca 2015-10-28 14:59:30+0100 990 [825]: s5 host 250 1 0 aa89bb89-20a1-414b-8ee3-0430fdc330f8.ovc71.loca
/var/log/vdsm/vdsm.log Thread-1247::DEBUG::2015-10-28 14:59:00,043::task::993::Storage.TaskManager.Task::(_decref) Task=`56 dd2372-f454-4188-8bf3-ab543d677c14`::ref 0 aborting False Thread-1247::ERROR::2015-10-28 14:59:00,096::API::1847::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '9f1ec45d-0c32-4bfc-8b67-372d6f204fd1'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Thread-1247::INFO::2015-10-28 14:59:00,112::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Reques t handler for 127.0.0.1:42165 stopped Thread-1248::DEBUG::2015-10-28 14:59:00,137::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StoragePool.connectStorageServer' in bridge with {u'connectionParams': [{u'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8', u'connection': u'ovc71.localdomain.local:/SHE_DOMAIN', u'iqn': u'', u'user': u'', u'protocol_version': u'3', u'tpgt': u'1', u'password': '********', u'port': u''}], u'storagepoolID': u'00000000-0000-0000-0000-000000000000', u'domainType': 1} Thread-1248::DEBUG::2015-10-28 14:59:00,138::task::595::Storage.TaskManager.Task::(_updateState) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::moving from state init -> state preparing Thread-1248::INFO::2015-10-28 14:59:00,139::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8', u'connection': u'ovc71.localdomain.local:/SHE_DOMAIN', u'iqn': u'', u'user': u'', u'protocol_version': u'3', u'tpgt': u'1', u'password': '********', u'port': u''}], options=None) Thread-1248::DEBUG::2015-10-28 14:59:00,142::fileUtils::143::Storage.fileUtils::(createdir) Creating directory: /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN mode: None Thread-1248::DEBUG::2015-10-28 14:59:00,143::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 ovc71.localdomain.local:/SHE_DOMAIN /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN (cwd None) Thread-1248::DEBUG::2015-10-28 14:59:00,199::hsm::2405::Storage.HSM::(__prefetchDomains) nfs local path: /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN Thread-1248::DEBUG::2015-10-28 14:59:00,201::hsm::2429::Storage.HSM::(__prefetchDomains) Found SD uuids: (u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1',) Thread-1248::DEBUG::2015-10-28 14:59:00,202::hsm::2489::Storage.HSM::(connectStorageServer) knownSDs : {9f1ec45d-0c32-4bfc-8b67-372d6f204fd1: storage.nfsSD.findDomain, 3662a51f-39de-4533-97fe-d49bf98e2d43: storage.nfsSD.findDomain} Thread-1248::INFO::2015-10-28 14:59:00,202::logUtils::51::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}]} Thread-1248::DEBUG::2015-10-28 14:59:00,202::task::1191::Storage.TaskManager.Task::(prepare) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::finished: {'statuslist': [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}]} Thread-1248::DEBUG::2015-10-28 14:59:00,202::task::595::Storage.TaskManager.Task::(_updateState) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::moving from state preparing -> state finished Thread-1248::DEBUG::2015-10-28 14:59:00,203::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-1248::DEBUG::2015-10-28 14:59:00,203::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-1248::DEBUG::2015-10-28 14:59:00,203::task::993::Storage.TaskManager.Task::(_decref) Task=`9ca908a0-45e2-41d5-802c-dc0bd2414a69`::ref 0 aborting False Thread-1248::DEBUG::2015-10-28 14:59:00,203::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'StoragePool.connectStorageServer' in bridge with [{'status': 0, 'id': u'189c29a5-6830-453c-aca3-7d82f2382dd8'}] Thread-1249::DEBUG::2015-10-28 14:59:00,218::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.getInfo' in bridge with {u'storagedomainID': u'9f1ec45d-0c32-4bfc-8b67-372d6f204fd1'}
Current filesystem layout on hypervisor, but still withou any qemu process for the hosted engine: [root@ovc71 log]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 27G 2.6G 24G 10% / devtmpfs 4.9G 0 4.9G 0% /dev tmpfs 4.9G 4.0K 4.9G 1% /dev/shm tmpfs 4.9G 8.6M 4.9G 1% /run tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup /dev/mapper/OVIRT_DOMAIN-NFS_DOMAIN 20G 36M 20G 1% /NFS_DOMAIN /dev/mapper/OVIRT_DOMAIN-SHE_DOMAIN 25G 2.9G 23G 12% /SHE_DOMAIN /dev/mapper/OVIRT_DOMAIN-ISO_DOMAIN 5.0G 33M 5.0G 1% /ISO_DOMAIN /dev/sda1 497M 130M 368M 27% /boot ovc71.localdomain.local:/NFS_DOMAIN 20G 35M 20G 1% /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN ovc71.localdomain.local:/SHE_DOMAIN 25G 2.9G 23G 12% /rhev/data-center/mnt/ovc71.localdomain.local:_SHE__DOMAIN

On Wed, Oct 28, 2015 at 4:10 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
It's not a regression cause the hosted-engine storage domain wasn't neither visible in 3.5. Once again, also if you see it in the engine you cannot use it for anything apart from the engine VM itself, you still have to add another storage domain for regular VMs.
understood. But I'm also not able to connect to the sh engine VM itself via spice, so in case of problems with the engine, you are not able to connect to it via web admin (that is ok), but I don't see any way to understand its state to be able to debug/resolve problems... Are there any command line commands to run to see status of sh engine VM? Joop, are you able to access your sh engine console? Is it vnc or spice? under hypervisor in /etc/pki/vdsm/certs/ [root@ovc71 certs]# ll total 16 -rw-r--r--. 1 root kvm 1415 Oct 26 16:17 cacert.pem -rw-------. 1 vdsm kvm 1131 Oct 26 14:43 cacert.pem.20151026161748 -rw-r--r--. 1 root kvm 1623 Oct 26 16:17 vdsmcert.pem -rw-------. 1 vdsm kvm 1249 Oct 26 14:43 vdsmcert.pem.20151026161748 During install I was able to connect via remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" using the fie that was then renamed in ca-cert.pem.20151026161748: [root@ovc71 certs]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem.20151026161748 -noout -text | grep Subject Subject: C=EN, L=Test, O=Test, CN=TestCA Subject Public Key Info: X509v3 Subject Key Identifier: But I'm not able to connect based on the current certificate: [root@ovc71 certs]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem -noout -text | grep Subject Subject: C=US, O=localdomain.local, CN=shengine.localdomain.local.37976 Subject Public Key Info: X509v3 Subject Key Identifier: [root@ovc71 certs]# hosted-engine --add-console-password Enter password: code = 0 message = 'Done' Also from hypervisor itself: [root@ovc71 ~]# remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=shengine.localdomain.local.37976" ** (remote-viewer:7992): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-QzfEVK7OiG: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:7992): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=shengine.localdomain.local.37976' verification failed (/usr/bin/remote-viewer:7992): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed (remote-viewer:7992): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1) The error in remote-viewer windows: Unable to connect to the graphic server spice://ovc71.localdomain.local?tls-port=5900

This is a multi-part message in MIME format. --------------060407070206080802060902 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 28-10-2015 17:00, Gianluca Cecchi wrote:
On Wed, Oct 28, 2015 at 4:10 PM, Simone Tiraboschi <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
It's not a regression cause the hosted-engine storage domain wasn't neither visible in 3.5. Once again, also if you see it in the engine you cannot use it for anything apart from the engine VM itself, you still have to add another storage domain for regular VMs.
understood. But I'm also not able to connect to the sh engine VM itself via spice, so in case of problems with the engine, you are not able to connect to it via web admin (that is ok), but I don't see any way to understand its state to be able to debug/resolve problems...
Are there any command line commands to run to see status of sh engine VM?
Joop, are you able to access your sh engine console? Is it vnc or spice?
I'll have time to try coming weekend/next week and let you know how it goes. Joop --------------060407070206080802060902 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 28-10-2015 17:00, Gianluca Cecchi wrote:<br> </div> <blockquote cite="mid:CAG2kNCwUT3rdqcJswOGEONvO6Pg9Jk-GTTi0Es99BnPgVEi6iw@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Wed, Oct 28, 2015 at 4:10 PM, Simone Tiraboschi <span dir="ltr"><<a moz-do-not-send="true" href="mailto:stirabos@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:stirabos@redhat.com">stirabos@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><br> <div class="gmail_extra"> <div class="gmail_quote"><span> <div><br> </div> </span> <div>It's not a regression cause the hosted-engine storage domain wasn't neither visible in 3.5.</div> <div>Once again, also if you see it in the engine you cannot use it for anything apart from the engine VM itself, you still have to add another storage domain for regular VMs.</div> <div> <div> <div> </div> </div> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>understood. But I'm also not able to connect to the sh engine VM itself via spice, so in case of problems with the engine, you are not able to connect to it via web admin (that is ok), but I don't see any way to understand its state to be able to debug/resolve problems...</div> <div><br> </div> <div>Are there any command line commands to run to see status of sh engine VM?</div> <div><br> </div> <div>Joop, are you able to access your sh engine console? Is it vnc or spice?</div> <div><br> </div> <br> </div> </div> </div> </blockquote> I'll have time to try coming weekend/next week and let you know how it goes.<br> <br> Joop<br> <br> </body> </html> --------------060407070206080802060902--

Hi, just adding some info, I have the same problem, no VM engine on webui (nfs4 as storage). and when adding the hosted_storage, the VM engine crashes entirely, but I think it's much more than that, I think the whole DC goes down. One time I had two VMs started plus the VM engine (global maintenance was activated), I did an update on the VM engine, then I tried to import the the hosted_storage, as always the VM engine went down, but also the two other VMs. 2015-10-29 22:09 GMT+01:00 Joop <jvdwege@xs4all.nl>:
On 28-10-2015 17:00, Gianluca Cecchi wrote:
On Wed, Oct 28, 2015 at 4:10 PM, Simone Tiraboschi < <stirabos@redhat.com> stirabos@redhat.com> wrote:
It's not a regression cause the hosted-engine storage domain wasn't neither visible in 3.5. Once again, also if you see it in the engine you cannot use it for anything apart from the engine VM itself, you still have to add another storage domain for regular VMs.
understood. But I'm also not able to connect to the sh engine VM itself via spice, so in case of problems with the engine, you are not able to connect to it via web admin (that is ok), but I don't see any way to understand its state to be able to debug/resolve problems...
Are there any command line commands to run to see status of sh engine VM?
Joop, are you able to access your sh engine console? Is it vnc or spice?
I'll have time to try coming weekend/next week and let you know how it goes.
Joop
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Gianluca Cecchi
-
Joop
-
Simone Tiraboschi
-
wodel youchi