Hi Simone,
I'll respond inline
I understood, definitely bad news for me. But currently I'm running oVirt 3.6.7 that, if I remember correctly, supports hyperconverged setup, it's not possible to fix this issue with my current version? I've installed vdsm 4.17.32-1.el7 with the vdsm-gluster package and ovirt-hosted-engine-ha 1.3.5.7-1.el7.centos on CentOS 7.2.1511 and my engine it's already configured to have both virt and gluster on the same cluster. I cannot put the cluster in maintenance, stop the hosted-engine, stop ovirt-hosted-engine-ha, edit hosted-engine.conf by changing domainType, storage and mnt_options and restart aovirt-hosted-engine-ha and the hosted-engine?
Il 20/03/2017 11:59, Simone Tiraboschi ha scritto:
On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Yedidyah,
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:
> On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it> wrote:
>> Hi list,
>>
>> I'm working on a system running on oVirt 3.6 and the Engine is reporting
>> the warning "The Hosted Engine Storage Domain doesn't exist. It should
>> be imported into the setup." repeatedly in the Events tab into the Admin
>> Portal.
>>
>> I've read into the list that Hosted Engine Storage Domain should be
>> imported automatically into the setup during the upgrade to 3.6
>> (original setup was on 3.5), but this not happened while the
>> HostedEngine is correctly visible into the VM tab after the upgrade.
> Was the upgrade to 3.6 successful and clean?
The upgrade from 3.5 to 3.6 was successful, as every subsequent minor
release upgrades. I rechecked the upgrade logs I haven't seen any
relevant error.
One addition information: I'm currently running on CentOS 7 and also the
original setup was on this release version.
Some moments before I saw the following lines into the vdsm.log of the>
>> The Hosted Engine Storage Domain is on a dedicated gluster volume but
>> considering that, if I remember correctly, oVirt 3.5 at that time did
>> not support gluster as a backend for the HostedEngine at that time I had
>> installed the engine using gluster's NFS server using
>> 'localhost:/hosted-engine' as a mount point.
>>
>> Currently on every nodes I can read into the log of the
>> ovirt-hosted-engine-ha agent the following lines:
>>
>> MainThread::INFO::2017-03-17
>> 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring)
>> Current state EngineUp (score: 3400)
>> MainThread::INFO::2017-03-17
>> 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring)
>> Best remote host virtnode-0-1 (id: 2
>> , score: 3400)
>> MainThread::INFO::2017-03-17
>> 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_vdsm)
>> Initializing VDSM
>> MainThread::INFO::2017-03-17
>> 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images)
>> Connecting the storage
>> MainThread::INFO::2017-03-17
>> 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server)
>> Connecting storage server
>> MainThread::INFO::2017-03-17
>> 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server)
>> Connecting storage server
>> MainThread::INFO::2017-03-17
>> 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server)
>> Refreshing the storage domain
>> MainThread::INFO::2017-03-17
>> 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images)
>> Preparing images
>> MainThread::INFO::2017-03-17
>> 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images)
>> Preparing images
>> MainThread::INFO::2017-03-17
>> 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images)
>> Reloading vm.conf from the
>> shared storage domain
>> MainThread::INFO::2017-03-17
>> 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_local_conf_file)
>> Trying to get a fresher copy
>> of vm configuration from the OVF_STORE
>> MainThread::WARNING::2017-03-17
>> 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(scan)
>> Unable to find OVF_STORE
>> MainThread::ERROR::2017-03-17
>> 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_local_conf_file)
>> Unable to get vm.conf from O
>> VF_STORE, falling back to initial vm.conf
> This is normal at your current state.
>
>> ...and the following lines into the logfile engine.log inside the Hosted
>> Engine:
>>
>> 2017-03-16 07:36:28,087 INFO
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCo mmand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object
>> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>> 2017-03-16 07:36:28,115 WARN
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCo mmand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action
>> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons:
>> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAIL ED_STORAGE_DOMAIN_NOT_EXIST
> That's the thing to debug. Did you check vdsm logs on the hosts, near
> the time this happens?
host that execute the hosted engine and that is the SPM, but I see the
same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16
07:36:00,412::task::595::Storage.TaskManager.Task::(_updateS tate)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init ->
state preparing
Thread-1746094::INFO::2017-03-16
07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect:
getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None)
Thread-1746094::DEBUG::2017-03-16
07:36:00,413::resourceManager::199::Storage.ResourceManager. Request::(__init__)
ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a 3c7b8d4`::Request
was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList'
Thread-1746094::DEBUG::2017-03-16
07:36:00,413::resourceManager::545::Storage.ResourceManager: :(registerResource)
Trying to register resource
'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared'
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::resourceManager::604::Storage.ResourceManager: :(registerResource)
Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now
locking as 'shared' (1 active user)
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::resourceManager::239::Storage.ResourceManager. Request::(grant)
ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a 3c7b8d4`::Granted
request
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::task::827::Storage.TaskManager.Task::(resource Acquired)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcqui red:
Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared)
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False
Thread-1746094::ERROR::2017-03-16
07:36:00,415::task::866::Storage.TaskManager.Task::(_setErro r)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList
images = dom.getAllImages()
File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages
self.getPools()[0],
IndexError: list index out of range
Thread-1746094::DEBUG::2017-03-16
07:36:00,415::task::885::Storage.TaskManager.Task::(_run)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run:
ae5af1a1-207c-432d-acfa-f3e03e014ee6
('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task
Thread-1746094::DEBUG::2017-03-16
07:36:00,415::task::1246::Storage.TaskManager.Task::(stop)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing
(force False)
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True
Thread-1746094::INFO::2017-03-16
07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare )
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted:
u'list index out of range' - code 100
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare )
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list
index out of range
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort )
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::resourceManager::980::Storage.ResourceManager. Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::595::Storage.TaskManager.Task::(_updateS tate)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing
-> state aborting
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::550::Storage.TaskManager.Task::(__state_ aborting)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::595::Storage.TaskManager.Task::(_updateS tate)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting
-> state failed
After that I tried to execute a simple query on storage domains using
vdsClient and I got the following information:
# vdsClient -s 0 getStorageDomainsList
3b5db584-5d21-41dc-8f8d-712ce9423a27
0966f366-b5ae-49e8-b05e-bee1895c2d54
35223b83-e0bd-4c8d-91a9-8c6b85336e7d
2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7
# vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9423a27
uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27
version = 3
role = Regular
remotePath = localhost:/hosted-engine
Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.type = NFS
class = Data
pool = []
name = default
# vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27
list index out of range
All other storage domains have the pool attribute defined, could be this
the issue? How can I assign to a pool the Hosted Engine Storage Domain?
This will be the result of the auto import process once feasible.>
>> 2017-03-16 07:36:28,116 INFO
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCo mmand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object
>> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>>
>> How can I safely import the Hosted Engine Storage Domain into my setup?
>> In this situation is safe to upgrade to oVirt 4.0?
This could be really tricky because I think that upgrading from an hosted-engine env from 3.5 deployed on hyperconverged env but mounted on NFS over a localhost loopback mount to 4.1 is something that is by far out of the paths we tested so I think you can hit a few surprises there.
In 4.1 the expected configuration under /etc/ovirt-hosted-engine/hosted-engine.conf includes: domainType=glusterfs
storage=<FIRST_HOST_ADDR>:/path
mnt_options=backup-volfile-servers =<SECOND_HOST_ADDR>:<THIRD_HOST_ADDR> But these requires more recent vdsm and ovirt-hosted-engine-ha versions.Then you also have to configure your engine to have both virt and gluster on the same cluster.Nothing is going to do them automatically for you on upgrades.I see two options here: 1. easy, but with a substantial downtime: shutdown your whole DC, start from scratch with gdeploy from 4.1 to configure a new gluster volume and anew engine over there, once you have a new engine import your existing storage domain and restart your VMs2. a lot trickier, try to reach 4.1 status manually editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and upgrading everything to 4.1; this could be pretty risky because you are on a path we never tested since hyperconverged hosted-engine at 3.5 wasn't released.
Greetings, Paolo> I'd first try to solve this. > > What OS do you have on your hosts? Are they all upgraded to 3.6? > > See also: > > https://www.ovirt.org/documentation/how-to/hosted-engine-hos > > Best, > >> >> Greetings, >> Paolo >> >> ______________________________t-OS-upgrade/ _________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman /listinfo/users > > Greetings, Paolo _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman /listinfo/users