[ovirt-users] The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup.

Simone Tiraboschi stirabos at redhat.com
Wed Mar 22 10:59:57 UTC 2017


On Tue, Mar 21, 2017 at 9:24 AM, Paolo Margara <paolo.margara at polito.it>
wrote:

> Hi Simone,
>
> I'll respond inline
>
> Il 20/03/2017 11:59, Simone Tiraboschi ha scritto:
>
>
>
> On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>> On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara at polito.it>
>> wrote:
>>
>>> Hi Yedidyah,
>>>
>>> Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:
>>> > On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <
>>> paolo.margara at polito.it> wrote:
>>> >> Hi list,
>>> >>
>>> >> I'm working on a system running on oVirt 3.6 and the Engine is
>>> reporting
>>> >> the warning "The Hosted Engine Storage Domain doesn't exist. It should
>>> >> be imported into the setup." repeatedly in the Events tab into the
>>> Admin
>>> >> Portal.
>>> >>
>>> >> I've read into the list that Hosted Engine Storage Domain should be
>>> >> imported automatically into the setup during the upgrade to 3.6
>>> >> (original setup was on 3.5), but this not happened while the
>>> >> HostedEngine is correctly visible into the VM tab after the upgrade.
>>> > Was the upgrade to 3.6 successful and clean?
>>> The upgrade from 3.5 to 3.6 was successful, as every subsequent minor
>>> release upgrades. I rechecked the upgrade logs I haven't seen any
>>> relevant error.
>>> One addition information: I'm currently running on CentOS 7 and also the
>>> original setup was on this release version.
>>> >
>>> >> The Hosted Engine Storage Domain is on a dedicated gluster volume but
>>> >> considering that, if I remember correctly, oVirt 3.5 at that time did
>>> >> not support gluster as a backend for the HostedEngine at that time I
>>> had
>>> >> installed the engine using gluster's NFS server using
>>> >> 'localhost:/hosted-engine' as a mount point.
>>> >>
>>> >> Currently on every nodes I can read into the log of the
>>> >> ovirt-hosted-engine-ha agent the following lines:
>>> >>
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(start_monitoring)
>>> >> Current state EngineUp (score: 3400)
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(start_monitoring)
>>> >> Best remote host virtnode-0-1 (id: 2
>>> >> , score: 3400)
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(_initialize_vdsm)
>>> >> Initializing VDSM
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(_initialize_storage_images)
>>> >> Connecting the storage
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.li
>>> b.storage_server.StorageServer::(connect_storage_server)
>>> >> Connecting storage server
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.li
>>> b.storage_server.StorageServer::(connect_storage_server)
>>> >> Connecting storage server
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.li
>>> b.storage_server.StorageServer::(connect_storage_server)
>>> >> Refreshing the storage domain
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(_initialize_storage_images)
>>> >> Preparing images
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.I
>>> mage::(prepare_images)
>>> >> Preparing images
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.age
>>> nt.hosted_engine.HostedEngine::(_initialize_storage_images)
>>> >> Reloading vm.conf from the
>>> >>  shared storage domain
>>> >> MainThread::INFO::2017-03-17
>>> >> 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.host
>>> ed_engine.HostedEngine.config::(refresh_local_conf_file)
>>> >> Trying to get a fresher copy
>>> >> of vm configuration from the OVF_STORE
>>> >> MainThread::WARNING::2017-03-17
>>> >> 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf
>>> .ovf_store.OVFStore::(scan)
>>> >> Unable to find OVF_STORE
>>> >> MainThread::ERROR::2017-03-17
>>> >> 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.host
>>> ed_engine.HostedEngine.config::(refresh_local_conf_file)
>>> >> Unable to get vm.conf from O
>>> >> VF_STORE, falling back to initial vm.conf
>>> > This is normal at your current state.
>>> >
>>> >> ...and the following lines into the logfile engine.log inside the
>>> Hosted
>>> >> Engine:
>>> >>
>>> >> 2017-03-16 07:36:28,087 INFO
>>> >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>>> >> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to
>>> object
>>> >> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>>> >> 2017-03-16 07:36:28,115 WARN
>>> >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>>> >> (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action
>>> >> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons:
>>> >> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAIL
>>> ED_STORAGE_DOMAIN_NOT_EXIST
>>> > That's the thing to debug. Did you check vdsm logs on the hosts, near
>>> > the time this happens?
>>> Some moments before I saw the following lines into the vdsm.log of the
>>> host that execute the hosted engine and that is the SPM, but I see the
>>> same lines also on the other nodes:
>>>
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init ->
>>> state preparing
>>> Thread-1746094::INFO::2017-03-16
>>> 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect:
>>> getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27',
>>> options=None)
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,413::resourceManager::199::Storage.ResourceManager.
>>> Request::(__init__)
>>> ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=
>>> `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request
>>> was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at
>>> 'getImagesList'
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,413::resourceManager::545::Storage.ResourceManager:
>>> :(registerResource)
>>> Trying to register resource
>>> 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared'
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,414::resourceManager::604::Storage.ResourceManager:
>>> :(registerResource)
>>> Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now
>>> locking as 'shared' (1 active user)
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,414::resourceManager::239::Storage.ResourceManager.
>>> Request::(grant)
>>> ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=
>>> `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted
>>> request
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired:
>>> Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared)
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False
>>> Thread-1746094::ERROR::2017-03-16
>>> 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error
>>> Traceback (most recent call last):
>>>   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>>>     return fn(*args, **kargs)
>>>   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>>>     res = f(*args, **kwargs)
>>>   File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList
>>>     images = dom.getAllImages()
>>>   File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages
>>>     self.getPools()[0],
>>> IndexError: list index out of range
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,415::task::885::Storage.TaskManager.Task::(_run)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run:
>>> ae5af1a1-207c-432d-acfa-f3e03e014ee6
>>> ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing
>>> (force False)
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True
>>> Thread-1746094::INFO::2017-03-16
>>> 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted:
>>> u'list index out of range' - code 100
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list
>>> index out of range
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,416::resourceManager::980::Storage.ResourceManager.
>>> Owner::(cancelAll)
>>> Owner.cancelAll requests {}
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing
>>> -> state aborting
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy
>>> none
>>> Thread-1746094::DEBUG::2017-03-16
>>> 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState)
>>> Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting
>>> -> state failed
>>>
>>> After that I tried to execute a simple query on storage domains using
>>> vdsClient and I got the following information:
>>>
>>> # vdsClient -s 0 getStorageDomainsList
>>> 3b5db584-5d21-41dc-8f8d-712ce9423a27
>>> 0966f366-b5ae-49e8-b05e-bee1895c2d54
>>> 35223b83-e0bd-4c8d-91a9-8c6b85336e7d
>>> 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7
>>> # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9
>>> 423a27
>>>     uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27
>>>     version = 3
>>>     role = Regular
>>>     remotePath = localhost:/hosted-engine
>>>
>>
>> Your issue is probably here: by design all the hosts of a single
>> datacenter should be able to see all the storage domains including the
>> hosted-engine one but if try to mount it as localhost:/hosted-engine this
>> will not be possible.
>>
>>
>>>     type = NFS
>>>     class = Data
>>>     pool = []
>>>     name = default
>>> # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27
>>> list index out of range
>>>
>>> All other storage domains have the pool attribute defined, could be this
>>> the issue? How can I assign to a pool the Hosted Engine Storage Domain?
>>>
>>
>> This will be the result of the auto import process once feasible.
>>
>>
>>> >
>>> >> 2017-03-16 07:36:28,116 INFO
>>> >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>>> >> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object
>>> >> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>>> >>
>>> >> How can I safely import the Hosted Engine Storage Domain into my
>>> setup?
>>> >> In this situation is safe to upgrade to oVirt 4.0?
>>>
>>
> This could be really tricky because I think that upgrading from an
> hosted-engine env from 3.5 deployed on hyperconverged env but mounted on
> NFS over a localhost loopback mount to 4.1 is something that is by far out
> of the paths we tested so I think you can hit a few surprises there.
>
> In 4.1 the expected configuration under /etc/ovirt-hosted-engine/hosted-engine.conf
> includes:
> domainType=glusterfs
> storage=<FIRST_HOST_ADDR>:/path
> mnt_options=backup-volfile-servers=<SECOND_HOST_ADDR>:<THIRD_HOST_ADDR>
> But these requires more recent vdsm and ovirt-hosted-engine-ha versions.
> Then you also have to configure your engine to have both virt and gluster
> on the same cluster.
> Nothing is going to do them automatically for you on upgrades.
> I see two options here: 1. easy, but with a substantial downtime: shutdown
> your whole DC, start from scratch with gdeploy from 4.1 to configure a new
> gluster volume and anew engine over there, once you have a new engine
> import your existing storage domain and restart your VMs
> 2. a lot trickier, try to reach 4.1 status manually
> editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and
> upgrading everything to 4.1; this could be pretty risky because you are on
> a path we never tested since hyperconverged hosted-engine at 3.5 wasn't
> released.
>
> I understood, definitely bad news for me. But currently I'm running oVirt
> 3.6.7 that, if I remember correctly, supports hyperconverged setup, it's
> not possible to fix this issue with my current version? I've installed vdsm
> 4.17.32-1.el7 with the vdsm-gluster package and ovirt-hosted-engine-ha
> 1.3.5.7-1.el7.centos on CentOS 7.2.1511 and my engine it's already
> configured to have both virt and gluster on the same cluster. I cannot put
> the cluster in maintenance, stop the hosted-engine, stop
> ovirt-hosted-engine-ha, edit hosted-engine.conf by changing domainType,
> storage and mnt_options and restart aovirt-hosted-engine-ha and the
> hosted-engine?
>

The support for custom mount options has been introduced here
https://gerrit.ovirt.org/#/c/57787/ so it should be available since
ovirt-hosted-engine-ha-1.3.5.6 and so you already have it on 3.6.7
In the mean time we had a lot of improvements for the hyperconverged
scenario, that's why I'm strongly suggesting you to upgrade to 4.1.


>
>>
>> > I'd first try to solve this. > > What OS do you have on your hosts? Are
>>> they all upgraded to 3.6? > > See also: > >
>>> https://www.ovirt.org/documentation/how-to/hosted-engine-hos
>>> t-OS-upgrade/ > > Best, > >> >> Greetings, >>     Paolo >> >>
>>> _______________________________________________ >> Users mailing list
>>> >> Users at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > >
>>> Greetings,     Paolo _______________________________________________
>>> Users mailing list Users at ovirt.org http://lists.ovirt.org/mailman
>>> /listinfo/users
>>>
>> Greetings,     Paolo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170322/abee2ade/attachment-0001.html>


More information about the Users mailing list