The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup.

Hi list, I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal. I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade. The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point. Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines: MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf ...and the following lines into the logfile engine.log inside the Hosted Engine: 2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST 2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' How can I safely import the Hosted Engine Storage Domain into my setup? In this situation is safe to upgrade to oVirt 4.0? Greetings, Paolo

On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi list,
I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal.
I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade.
Was the upgrade to 3.6 successful and clean?
The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point.
Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines:
MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf
This is normal at your current state.
...and the following lines into the logfile engine.log inside the Hosted Engine:
2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST
That's the thing to debug. Did you check vdsm logs on the hosts, near the time this happens?
2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
How can I safely import the Hosted Engine Storage Domain into my setup? In this situation is safe to upgrade to oVirt 4.0?
I'd first try to solve this. What OS do you have on your hosts? Are they all upgraded to 3.6? See also: https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/ Best,
Greetings, Paolo
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

Hi Yedidyah, Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:
On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi list,
I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal.
I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade. Was the upgrade to 3.6 successful and clean? The upgrade from 3.5 to 3.6 was successful, as every subsequent minor release upgrades. I rechecked the upgrade logs I haven't seen any relevant error. One addition information: I'm currently running on CentOS 7 and also the original setup was on this release version.
The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point.
Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines:
MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf This is normal at your current state.
...and the following lines into the logfile engine.log inside the Hosted Engine:
2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST That's the thing to debug. Did you check vdsm logs on the hosts, near the time this happens? Some moments before I saw the following lines into the vdsm.log of the host that execute the hosted engine and that is the SPM, but I see the same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init -> state preparing Thread-1746094::INFO::2017-03-16 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None) Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::199::Storage.ResourceManager.Request::(__init__) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList' Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::545::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared' Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::604::Storage.ResourceManager::(registerResource) Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now locking as 'shared' (1 active user) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::239::Storage.ResourceManager.Request::(grant) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted request Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired: Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False Thread-1746094::ERROR::2017-03-16 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::885::Storage.TaskManager.Task::(_run) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run: ae5af1a1-207c-432d-acfa-f3e03e014ee6 ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing (force False) Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True Thread-1746094::INFO::2017-03-16 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted: u'list index out of range' - code 100 Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False Thread-1746094::DEBUG::2017-03-16 07:36:00,416::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing -> state aborting Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting -> state failed After that I tried to execute a simple query on storage domains using vdsClient and I got the following information: # vdsClient -s 0 getStorageDomainsList 3b5db584-5d21-41dc-8f8d-712ce9423a27 0966f366-b5ae-49e8-b05e-bee1895c2d54 35223b83-e0bd-4c8d-91a9-8c6b85336e7d 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7 # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9423a27 uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27 version = 3 role = Regular remotePath = localhost:/hosted-engine type = NFS class = Data pool = [] name = default # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27 list index out of range All other storage domains have the pool attribute defined, could be this the issue? How can I assign to a pool the Hosted Engine Storage Domain?
2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
How can I safely import the Hosted Engine Storage Domain into my setup? In this situation is safe to upgrade to oVirt 4.0? I'd first try to solve this.
What OS do you have on your hosts? Are they all upgraded to 3.6?
See also:
https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/
Best,
Greetings, Paolo
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Greetings, Paolo

On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Yedidyah,
On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi list,
I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal.
I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade. Was the upgrade to 3.6 successful and clean? The upgrade from 3.5 to 3.6 was successful, as every subsequent minor release upgrades. I rechecked the upgrade logs I haven't seen any relevant error. One addition information: I'm currently running on CentOS 7 and also the original setup was on this release version.
The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point.
Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines:
MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.
Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.
Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto: lib.storage_server.StorageServer::(connect_storage_server) lib.storage_server.StorageServer::(connect_storage_server) lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib. image.Image::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf This is normal at your current state.
...and the following lines into the logfile engine.log inside the Hosted Engine:
2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_ FAILED_STORAGE_DOMAIN_NOT_EXIST That's the thing to debug. Did you check vdsm logs on the hosts, near the time this happens? Some moments before I saw the following lines into the vdsm.log of the host that execute the hosted engine and that is the SPM, but I see the same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init -> state preparing Thread-1746094::INFO::2017-03-16 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None) Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::199::Storage.ResourceManager. Request::(__init__) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27` ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList' Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::545::Storage.ResourceManager: :(registerResource) Trying to register resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared' Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::604::Storage.ResourceManager: :(registerResource) Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now locking as 'shared' (1 active user) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::239::Storage.ResourceManager. Request::(grant) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27` ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted request Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired: Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False Thread-1746094::ERROR::2017-03-16 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::885::Storage.TaskManager.Task::(_run) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run: ae5af1a1-207c-432d-acfa-f3e03e014ee6 ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing (force False) Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True Thread-1746094::INFO::2017-03-16 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted: u'list index out of range' - code 100 Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False Thread-1746094::DEBUG::2017-03-16 07:36:00,416::resourceManager::980::Storage.ResourceManager. Owner::(cancelAll) Owner.cancelAll requests {} Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing -> state aborting Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting -> state failed
After that I tried to execute a simple query on storage domains using vdsClient and I got the following information:
# vdsClient -s 0 getStorageDomainsList 3b5db584-5d21-41dc-8f8d-712ce9423a27 0966f366-b5ae-49e8-b05e-bee1895c2d54 35223b83-e0bd-4c8d-91a9-8c6b85336e7d 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7 # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9423a27 uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27 version = 3 role = Regular remotePath = localhost:/hosted-engine
Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.
type = NFS class = Data pool = [] name = default # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27 list index out of range
All other storage domains have the pool attribute defined, could be this the issue? How can I assign to a pool the Hosted Engine Storage Domain?
This will be the result of the auto import process once feasible.
2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
How can I safely import the Hosted Engine Storage Domain into my setup? In this situation is safe to upgrade to oVirt 4.0? I'd first try to solve this.
What OS do you have on your hosts? Are they all upgraded to 3.6?
See also:
engine-host-OS-upgrade/
Best,
Greetings, Paolo
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Greetings, Paolo _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Yedidyah,
On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto: paolo.margara@polito.it> wrote:
Hi list,
I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal.
I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade. Was the upgrade to 3.6 successful and clean? The upgrade from 3.5 to 3.6 was successful, as every subsequent minor release upgrades. I rechecked the upgrade logs I haven't seen any relevant error. One addition information: I'm currently running on CentOS 7 and also the original setup was on this release version.
The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point.
Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines:
MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf This is normal at your current state.
...and the following lines into the logfile engine.log inside the Hosted Engine:
2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAIL ED_STORAGE_DOMAIN_NOT_EXIST That's the thing to debug. Did you check vdsm logs on the hosts, near the time this happens? Some moments before I saw the following lines into the vdsm.log of the host that execute the hosted engine and that is the SPM, but I see the same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init -> state preparing Thread-1746094::INFO::2017-03-16 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None) Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::199::Storage.ResourceManager. Request::(__init__) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList' Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::545::Storage.ResourceManager: :(registerResource) Trying to register resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared' Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::604::Storage.ResourceManager: :(registerResource) Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now locking as 'shared' (1 active user) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::239::Storage.ResourceManager. Request::(grant) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted request Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired: Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False Thread-1746094::ERROR::2017-03-16 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::885::Storage.TaskManager.Task::(_run) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run: ae5af1a1-207c-432d-acfa-f3e03e014ee6 ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing (force False) Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True Thread-1746094::INFO::2017-03-16 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted: u'list index out of range' - code 100 Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False Thread-1746094::DEBUG::2017-03-16 07:36:00,416::resourceManager::980::Storage.ResourceManager. Owner::(cancelAll) Owner.cancelAll requests {} Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing -> state aborting Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting -> state failed
After that I tried to execute a simple query on storage domains using vdsClient and I got the following information:
# vdsClient -s 0 getStorageDomainsList 3b5db584-5d21-41dc-8f8d-712ce9423a27 0966f366-b5ae-49e8-b05e-bee1895c2d54 35223b83-e0bd-4c8d-91a9-8c6b85336e7d 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7 # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9 423a27 uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27 version = 3 role = Regular remotePath = localhost:/hosted-engine
Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.
type = NFS class = Data pool = [] name = default # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27 list index out of range
All other storage domains have the pool attribute defined, could be this the issue? How can I assign to a pool the Hosted Engine Storage Domain?
This will be the result of the auto import process once feasible.
2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
How can I safely import the Hosted Engine Storage Domain into my setup? In this situation is safe to upgrade to oVirt 4.0?
This could be really tricky because I think that upgrading from an hosted-engine env from 3.5 deployed on hyperconverged env but mounted on NFS over a localhost loopback mount to 4.1 is something that is by far out of the paths we tested so I think you can hit a few surprises there. In 4.1 the expected configuration under /etc/ovirt-hosted-engine/hosted-engine.conf includes: domainType=glusterfs storage=<FIRST_HOST_ADDR>:/path mnt_options=backup-volfile-servers=<SECOND_HOST_ADDR>:<THIRD_HOST_ADDR> But these requires more recent vdsm and ovirt-hosted-engine-ha versions. Then you also have to configure your engine to have both virt and gluster on the same cluster. Nothing is going to do them automatically for you on upgrades. I see two options here: 1. easy, but with a substantial downtime: shutdown your whole DC, start from scratch with gdeploy from 4.1 to configure a new gluster volume and anew engine over there, once you have a new engine import your existing storage domain and restart your VMs 2. a lot trickier, try to reach 4.1 status manually editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and upgrading everything to 4.1; this could be pretty risky because you are on a path we never tested since hyperconverged hosted-engine at 3.5 wasn't released.
I'd first try to solve this.
What OS do you have on your hosts? Are they all upgraded to 3.6?
See also:
https://www.ovirt.org/documentation/how-to/hosted-engine-hos
t-OS-upgrade/
Best,
Greetings, Paolo
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Greetings, Paolo _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------17C4B33F916F521113721D9F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi Simone, I'll respond inline Il 20/03/2017 11:59, Simone Tiraboschi ha scritto:
On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it <mailto:paolo.margara@polito.it>> wrote:
Hi Yedidyah,
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto: > On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it <mailto:paolo.margara@polito.it>> wrote: >> Hi list, >> >> I'm working on a system running on oVirt 3.6 and the Engine is reporting >> the warning "The Hosted Engine Storage Domain doesn't exist. It should >> be imported into the setup." repeatedly in the Events tab into the Admin >> Portal. >> >> I've read into the list that Hosted Engine Storage Domain should be >> imported automatically into the setup during the upgrade to 3.6 >> (original setup was on 3.5), but this not happened while the >> HostedEngine is correctly visible into the VM tab after the upgrade. > Was the upgrade to 3.6 successful and clean? The upgrade from 3.5 to 3.6 was successful, as every subsequent minor release upgrades. I rechecked the upgrade logs I haven't seen any relevant error. One addition information: I'm currently running on CentOS 7 and also the original setup was on this release version. > >> The Hosted Engine Storage Domain is on a dedicated gluster volume but >> considering that, if I remember correctly, oVirt 3.5 at that time did >> not support gluster as a backend for the HostedEngine at that time I had >> installed the engine using gluster's NFS server using >> 'localhost:/hosted-engine' as a mount point. >> >> Currently on every nodes I can read into the log of the >> ovirt-hosted-engine-ha agent the following lines: >> >> MainThread::INFO::2017-03-17 >> 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUp (score: 3400) >> MainThread::INFO::2017-03-17 >> 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Best remote host virtnode-0-1 (id: 2 >> , score: 3400) >> MainThread::INFO::2017-03-17 >> 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >> Initializing VDSM >> MainThread::INFO::2017-03-17 >> 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2017-03-17 >> 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.li <http://ovirt_hosted_engine_ha.li>b.storage_server.StorageServer::(connect_storage_server) >> Connecting storage server >> MainThread::INFO::2017-03-17 >> 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.li <http://ovirt_hosted_engine_ha.li>b.storage_server.StorageServer::(connect_storage_server) >> Connecting storage server >> MainThread::INFO::2017-03-17 >> 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.li <http://ovirt_hosted_engine_ha.li>b.storage_server.StorageServer::(connect_storage_server) >> Refreshing the storage domain >> MainThread::INFO::2017-03-17 >> 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >> Preparing images >> MainThread::INFO::2017-03-17 >> 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) >> Preparing images >> MainThread::INFO::2017-03-17 >> 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >> Reloading vm.conf from the >> shared storage domain >> MainThread::INFO::2017-03-17 >> 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) >> Trying to get a fresher copy >> of vm configuration from the OVF_STORE >> MainThread::WARNING::2017-03-17 >> 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >> Unable to find OVF_STORE >> MainThread::ERROR::2017-03-17 >> 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) >> Unable to get vm.conf from O >> VF_STORE, falling back to initial vm.conf > This is normal at your current state. > >> ...and the following lines into the logfile engine.log inside the Hosted >> Engine: >> >> 2017-03-16 07:36:28,087 INFO >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] >> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object >> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' >> 2017-03-16 07:36:28,115 WARN >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] >> (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action >> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: >> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST > That's the thing to debug. Did you check vdsm logs on the hosts, near > the time this happens? Some moments before I saw the following lines into the vdsm.log of the host that execute the hosted engine and that is the SPM, but I see the same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init -> state preparing Thread-1746094::INFO::2017-03-16 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None) Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::199::Storage.ResourceManager.Request::(__init__) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList' Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::545::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared' Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::604::Storage.ResourceManager::(registerResource) Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now locking as 'shared' (1 active user) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::239::Storage.ResourceManager.Request::(grant) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted request Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired: Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False Thread-1746094::ERROR::2017-03-16 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::885::Storage.TaskManager.Task::(_run) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run: ae5af1a1-207c-432d-acfa-f3e03e014ee6 ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing (force False) Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True Thread-1746094::INFO::2017-03-16 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted: u'list index out of range' - code 100 Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False Thread-1746094::DEBUG::2017-03-16 07:36:00,416::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing -> state aborting Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting -> state failed
After that I tried to execute a simple query on storage domains using vdsClient and I got the following information:
# vdsClient -s 0 getStorageDomainsList 3b5db584-5d21-41dc-8f8d-712ce9423a27 0966f366-b5ae-49e8-b05e-bee1895c2d54 35223b83-e0bd-4c8d-91a9-8c6b85336e7d 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7 # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9423a27 uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27 version = 3 role = Regular remotePath = localhost:/hosted-engine
Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.
type = NFS class = Data pool = [] name = default # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27 list index out of range
All other storage domains have the pool attribute defined, could be this the issue? How can I assign to a pool the Hosted Engine Storage Domain?
This will be the result of the auto import process once feasible.
> >> 2017-03-16 07:36:28,116 INFO >> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] >> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object >> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' >> >> How can I safely import the Hosted Engine Storage Domain into my setup? >> In this situation is safe to upgrade to oVirt 4.0?
This could be really tricky because I think that upgrading from an hosted-engine env from 3.5 deployed on hyperconverged env but mounted on NFS over a localhost loopback mount to 4.1 is something that is by far out of the paths we tested so I think you can hit a few surprises there.
In 4.1 the expected configuration under /etc/ovirt-hosted-engine/hosted-engine.conf includes: domainType=glusterfs storage=<FIRST_HOST_ADDR>:/path mnt_options=backup-volfile-servers=<SECOND_HOST_ADDR>:<THIRD_HOST_ADDR> But these requires more recent vdsm and ovirt-hosted-engine-ha versions. Then you also have to configure your engine to have both virt and gluster on the same cluster. Nothing is going to do them automatically for you on upgrades. I see two options here: 1. easy, but with a substantial downtime: shutdown your whole DC, start from scratch with gdeploy from 4.1 to configure a new gluster volume and anew engine over there, once you have a new engine import your existing storage domain and restart your VMs 2. a lot trickier, try to reach 4.1 status manually editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and upgrading everything to 4.1; this could be pretty risky because you are on a path we never tested since hyperconverged hosted-engine at 3.5 wasn't released.
I understood, definitely bad news for me. But currently I'm running oVirt 3.6.7 that, if I remember correctly, supports hyperconverged setup, it's not possible to fix this issue with my current version? I've installed vdsm 4.17.32-1.el7 with the vdsm-gluster package and ovirt-hosted-engine-ha 1.3.5.7-1.el7.centos on CentOS 7.2.1511 and my engine it's already configured to have both virt and gluster on the same cluster. I cannot put the cluster in maintenance, stop the hosted-engine, stop ovirt-hosted-engine-ha, edit hosted-engine.conf by changing domainType, storage and mnt_options and restart aovirt-hosted-engine-ha and the hosted-engine?
> I'd first try to solve this. > > What OS do you have on your hosts? Are they all upgraded to 3.6? > > See also: > > https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/ <https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/> > > Best, > >> >> Greetings, >> Paolo >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> >> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> > > Greetings, Paolo _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
Greetings, Paolo --------------17C4B33F916F521113721D9F Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Hi Simone,</p> <p>I'll respond inline<br> </p> <br> <div class="moz-cite-prefix">Il 20/03/2017 11:59, Simone Tiraboschi ha scritto:<br> </div> <blockquote cite="mid:CAN8-ONr2sC7nKPnDKh1j+rGRQ+kmKfODS6AJF9Wk3kYa8ujOcA@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <span dir="ltr"><<a moz-do-not-send="true" href="mailto:stirabos@redhat.com" target="_blank">stirabos@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr"> <div class="gmail_extra"><br> <div class="gmail_quote"> <div> <div class="gmail-m_-4888476689598940363gmail-h5">On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <span dir="ltr"><<a moz-do-not-send="true" href="mailto:paolo.margara@polito.it" target="_blank">paolo.margara@polito.it</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Yedidyah,<br> <span class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-"><br> Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:<br> > On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <<a moz-do-not-send="true" href="mailto:paolo.margara@polito.it" target="_blank">paolo.margara@polito.it</a>> wrote:<br> >> Hi list,<br> >><br> >> I'm working on a system running on oVirt 3.6 and the Engine is reporting<br> >> the warning "The Hosted Engine Storage Domain doesn't exist. It should<br> >> be imported into the setup." repeatedly in the Events tab into the Admin<br> >> Portal.<br> >><br> >> I've read into the list that Hosted Engine Storage Domain should be<br> >> imported automatically into the setup during the upgrade to 3.6<br> >> (original setup was on 3.5), but this not happened while the<br> >> HostedEngine is correctly visible into the VM tab after the upgrade.<br> > Was the upgrade to 3.6 successful and clean?<br> </span>The upgrade from 3.5 to 3.6 was successful, as every subsequent minor<br> release upgrades. I rechecked the upgrade logs I haven't seen any<br> relevant error.<br> One addition information: I'm currently running on CentOS 7 and also the<br> original setup was on this release version.<br> <div> <div class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-h5">><br> >> The Hosted Engine Storage Domain is on a dedicated gluster volume but<br> >> considering that, if I remember correctly, oVirt 3.5 at that time did<br> >> not support gluster as a backend for the HostedEngine at that time I had<br> >> installed the engine using gluster's NFS server using<br> >> 'localhost:/hosted-engine' as a mount point.<br> >><br> >> Currently on every nodes I can read into the log of the<br> >> ovirt-hosted-engine-ha agent the following lines:<br> >><br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:17,773::hosted_engine::4<wbr>62::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)<br> >> Current state EngineUp (score: 3400)<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:17,774::hosted_engine::4<wbr>67::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)<br> >> Best remote host virtnode-0-1 (id: 2<br> >> , score: 3400)<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:27,956::hosted_engine::6<wbr>13::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_vdsm)<br> >> Initializing VDSM<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,055::hosted_engine::6<wbr>58::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_images)<br> >> Connecting the storage<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,078::storage_server::<wbr>218::<a moz-do-not-send="true" href="http://ovirt_hosted_engine_ha.li" target="_blank">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::(connect_storage_server)<br> >> Connecting storage server<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,278::storage_server::<wbr>222::<a moz-do-not-send="true" href="http://ovirt_hosted_engine_ha.li" target="_blank">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::(connect_storage_server)<br> >> Connecting storage server<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,398::storage_server::<wbr>230::<a moz-do-not-send="true" href="http://ovirt_hosted_engine_ha.li" target="_blank">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::(connect_storage_server)<br> >> Refreshing the storage domain<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,822::hosted_engine::6<wbr>85::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_images)<br> >> Preparing images<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:28,822::image::126::ovir<wbr>t_hosted_engine_ha.lib.image.I<wbr>mage::(prepare_images)<br> >> Preparing images<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:29,308::hosted_engine::6<wbr>88::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_images)<br> >> Reloading vm.conf from the<br> >> shared storage domain<br> >> MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-03-17">INFO::2017-03-17</a><br> >> 14:04:29,309::config::206::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(refresh_local_conf_file)<br> >> Trying to get a fresher copy<br> >> of vm configuration from the OVF_STORE<br> >> MainThread::WARNING::2017-03-1<wbr>7<br> >> 14:04:29,567::ovf_store::104::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(scan)<br> >> Unable to find OVF_STORE<br> >> MainThread::ERROR::2017-03-17<br> >> 14:04:29,691::config::235::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(refresh_local_conf_file)<br> >> Unable to get vm.conf from O<br> >> VF_STORE, falling back to initial vm.conf<br> > This is normal at your current state.<br> ><br> >> ...and the following lines into the logfile engine.log inside the Hosted<br> >> Engine:<br> >><br> >> 2017-03-16 07:36:28,087 INFO<br> >> [org.ovirt.engine.core.bll.Imp<wbr>ortHostedEngineStorageDomainCo<wbr>mmand]<br> >> (org.ovirt.thread.pool-8-threa<wbr>d-38) [236d315c] Lock Acquired to object<br> >> 'EngineLock:{exclusiveLocks='[<wbr>]', sharedLocks='null'}'<br> >> 2017-03-16 07:36:28,115 WARN<br> >> [org.ovirt.engine.core.bll.Imp<wbr>ortHostedEngineStorageDomainCo<wbr>mmand]<br> >> (org.ovirt.thread.pool-8-threa<wbr>d-38) [236d315c] CanDoAction of action<br> >> 'ImportHostedEngineStorageDoma<wbr>in' failed for user SYSTEM. Reasons:<br> >> VAR__ACTION__ADD,VAR__TYPE__ST<wbr>ORAGE__DOMAIN,ACTION_TYPE_FAIL<wbr>ED_STORAGE_DOMAIN_NOT_EXIST<br> > That's the thing to debug. Did you check vdsm logs on the hosts, near<br> > the time this happens?<br> </div> </div> Some moments before I saw the following lines into the vdsm.log of the<br> host that execute the hosted engine and that is the SPM, but I see the<br> same lines also on the other nodes:<br> <br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,412::task::595::Stora<wbr>ge.TaskManager.Task::(_updateS<wbr>tate)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state init -><br> state preparing<br> Thread-1746094::<a class="moz-txt-link-freetext" href="INFO::2017-03">INFO::2017-03</a>-<wbr>16<br> 07:36:00,413::logUtils::48::di<wbr>spatcher::(wrapper) Run and protect:<br> getImagesList(sdUUID='3b5db584<wbr>-5d21-41dc-8f8d-712ce9423a27', options=None)<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,413::resourceManager:<wbr>:199::Storage.ResourceManager.<wbr>Request::(__init__)<br> ResName=`Storage.3b5db584-5d21<wbr>-41dc-8f8d-712ce9423a27`ReqID=<wbr>`8ea3c7f3-8ccd-4127-96b1-ec97a<wbr>3c7b8d4`::Request<br> was made in '/usr/share/vdsm/storage/hsm.p<wbr>y' line '3313' at 'getImagesList'<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,413::resourceManager:<wbr>:545::Storage.ResourceManager:<wbr>:(registerResource)<br> Trying to register resource<br> 'Storage.3b5db584-5d21-41dc-8f<wbr>8d-712ce9423a27' for lock type 'shared'<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,414::resourceManager:<wbr>:604::Storage.ResourceManager:<wbr>:(registerResource)<br> Resource 'Storage.3b5db584-5d21-41dc-8f<wbr>8d-712ce9423a27' is free. Now<br> locking as 'shared' (1 active user)<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,414::resourceManager:<wbr>:239::Storage.ResourceManager.<wbr>Request::(grant)<br> ResName=`Storage.3b5db584-5d21<wbr>-41dc-8f8d-712ce9423a27`ReqID=<wbr>`8ea3c7f3-8ccd-4127-96b1-ec97a<wbr>3c7b8d4`::Granted<br> request<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,414::task::827::Stora<wbr>ge.TaskManager.Task::(resource<wbr>Acquired)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::_resourcesAcqui<wbr>red:<br> Storage.3b5db584-5d21-41dc-8f8<wbr>d-712ce9423a27 (shared)<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,414::task::993::Stora<wbr>ge.TaskManager.Task::(_decref)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 1 aborting False<br> Thread-1746094::ERROR::2017-03<wbr>-16<br> 07:36:00,415::task::866::Stora<wbr>ge.TaskManager.Task::(_setErro<wbr>r)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Unexpected error<br> Traceback (most recent call last):<br> File "/usr/share/vdsm/storage/task.<wbr>py", line 873, in _run<br> return fn(*args, **kargs)<br> File "/usr/share/vdsm/logUtils.py", line 49, in wrapper<br> res = f(*args, **kwargs)<br> File "/usr/share/vdsm/storage/hsm.p<wbr>y", line 3315, in getImagesList<br> images = dom.getAllImages()<br> File "/usr/share/vdsm/storage/fileS<wbr>D.py", line 373, in getAllImages<br> self.getPools()[0],<br> IndexError: list index out of range<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,415::task::885::Stora<wbr>ge.TaskManager.Task::(_run)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Task._run:<br> ae5af1a1-207c-432d-acfa-f3e03e<wbr>014ee6<br> ('3b5db584-5d21-41dc-8f8d-712c<wbr>e9423a27',) {} failed - stopping task<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,415::task::1246::Stor<wbr>age.TaskManager.Task::(stop)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::stopping in state preparing<br> (force False)<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,416::task::993::Stora<wbr>ge.TaskManager.Task::(_decref)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 1 aborting True<br> Thread-1746094::<a class="moz-txt-link-freetext" href="INFO::2017-03">INFO::2017-03</a>-<wbr>16<br> 07:36:00,416::task::1171::Stor<wbr>age.TaskManager.Task::(prepare<wbr>)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::aborting: Task is aborted:<br> u'list index out of range' - code 100<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,416::task::1176::Stor<wbr>age.TaskManager.Task::(prepare<wbr>)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Prepare: aborted: list<br> index out of range<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,416::task::993::Stora<wbr>ge.TaskManager.Task::(_decref)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 0 aborting True<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,416::task::928::Stora<wbr>ge.TaskManager.Task::(_doAbort<wbr>)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Task._doAbort: force False<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,416::resourceManager:<wbr>:980::Storage.ResourceManager.<wbr>Owner::(cancelAll)<br> Owner.cancelAll requests {}<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,417::task::595::Stora<wbr>ge.TaskManager.Task::(_updateS<wbr>tate)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state preparing<br> -> state aborting<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,417::task::550::Stora<wbr>ge.TaskManager.Task::(__state_<wbr>aborting)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::_aborting: recover policy none<br> Thread-1746094::DEBUG::2017-03<wbr>-16<br> 07:36:00,417::task::595::Stora<wbr>ge.TaskManager.Task::(_updateS<wbr>tate)<br> Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state aborting<br> -> state failed<br> <br> After that I tried to execute a simple query on storage domains using<br> vdsClient and I got the following information:<br> <br> # vdsClient -s 0 getStorageDomainsList<br> 3b5db584-5d21-41dc-8f8d-712ce9<wbr>423a27<br> 0966f366-b5ae-49e8-b05e-bee189<wbr>5c2d54<br> 35223b83-e0bd-4c8d-91a9-8c6b85<wbr>336e7d<br> 2c3994e3-1f93-4f2a-8a0a-0b5d38<wbr>8a2be7<br> # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9<wbr>423a27<br> uuid = 3b5db584-5d21-41dc-8f8d-712ce9<wbr>423a27<br> version = 3<br> role = Regular<br> remotePath = localhost:/hosted-engine<br> </blockquote> <div><br> </div> </div> </div> <div>Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.</div> <span class="gmail-m_-4888476689598940363gmail-"> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> type = NFS<br> class = Data<br> pool = []<br> name = default<br> # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9<wbr>423a27<br> list index out of range<br> <br> All other storage domains have the pool attribute defined, could be this<br> the issue? How can I assign to a pool the Hosted Engine Storage Domain?<br> </blockquote> <div><br> </div> </span> <div>This will be the result of the auto import process once feasible.</div> <span class="gmail-m_-4888476689598940363gmail-"> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-HOEnZb"> <div class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-h5">><br> >> 2017-03-16 07:36:28,116 INFO<br> >> [org.ovirt.engine.core.bll.Imp<wbr>ortHostedEngineStorageDomainCo<wbr>mmand]<br> >> (org.ovirt.thread.pool-8-threa<wbr>d-38) [236d315c] Lock freed to object<br> >> 'EngineLock:{exclusiveLocks='[<wbr>]', sharedLocks='null'}'<br> >><br> >> How can I safely import the Hosted Engine Storage Domain into my setup?<br> >> In this situation is safe to upgrade to oVirt 4.0?<br> </div> </div> </blockquote> </span></div> </div> </div> </blockquote> <div><br> </div> <div>This could be really tricky because I think that upgrading from an hosted-engine env from 3.5 deployed on hyperconverged env but mounted on NFS over a localhost loopback mount to 4.1 is something that is by far out of the paths we tested so I think you can hit a few surprises there.</div> <div><br> </div> <div>In 4.1 the expected configuration under /etc/ovirt-hosted-engine/<wbr>hosted-engine.conf includes:</div> <div>domainType=glusterfs<br> </div> <div>storage=<FIRST_HOST_ADDR>:/<wbr>path<br> </div> <div>mnt_options=<span class="gmail-il" style="white-space:pre-wrap">backup</span><span style="color:rgb(85,85,85);white-space:pre-wrap">-</span><span class="gmail-il" style="white-space:pre-wrap">volfile</span><span style="color:rgb(85,85,85);white-space:pre-wrap">-</span><wbr style="color:rgb(85,85,85);white-space:pre-wrap"><span class="gmail-il" style="white-space:pre-wrap">servers</span><span style="color:rgb(85,85,85);white-space:pre-wrap">=</span><SECOND_HOST_ADDR><span style="color:rgb(85,85,85);white-space:pre-wrap">:</span><THIRD_HOST_ADDR> </div><div> </div><div>But these requires more recent vdsm and ovirt-hosted-engine-ha versions.</div><div>Then you also have to configure your engine to have both virt and gluster on the same cluster.</div><div>Nothing is going to do them automatically for you on upgrades.</div><div> </div>I see two options here: 1. easy, but with a substantial downtime: shutdown your whole DC, start from scratch with gdeploy from 4.1 to configure a new gluster volume and anew engine over there, once you have a new engine import your existing storage domain and restart your VMs</div><div class="gmail_quote">2. a lot trickier, try to reach 4.1 status manually editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and upgrading everything to 4.1; this could be pretty risky because you are on a path we never tested since hyperconverged hosted-engine at 3.5 wasn't released.</div></div></div></blockquote>I understood, definitely bad news for me. But currently I'm running oVirt 3.6.7 that, if I remember correctly, supports hyperconverged setup, it's not possible to fix this issue with my current version? I've installed vdsm 4.17.32-1.el7 with the vdsm-gluster package and ovirt-hosted-engine-ha 1.3.5.7-1.el7.centos on CentOS 7.2.1511 and my engine it's already configured to have both virt and gluster on the same cluster. I cannot put the cluster in maintenance, stop the hosted-engine, stop ovirt-hosted-engine-ha, edit hosted-engine.conf by changing domainType, storage and mnt_options and restart aovirt-hosted-engine-ha and the hosted-engine? <blockquote cite="mid:CAN8-ONr2sC7nKPnDKh1j+rGRQ+kmKfODS6AJF9Wk3kYa8ujOcA@mail.gmail.com" type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-m_-4888476689598940363gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-HOEnZb"><div class="gmail-m_-4888476689598940363gmail-m_7393503524263847303gmail-h5"> > I'd first try to solve this. > > What OS do you have on your hosts? Are they all upgraded to 3.6? > > See also: > > <a moz-do-not-send="true" href="https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/" rel="noreferrer" target="_blank">https://www.ovirt.org/document<wbr>ation/how-to/hosted-engine-hos<wbr>t-OS-upgrade/</a> > > Best, > >> >> Greetings, >> Paolo >> >> ______________________________<wbr>_________________ >> Users mailing list >> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> >> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a> > > Greetings, Paolo ______________________________<wbr>_________________ Users mailing list <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a> </div></div></blockquote></span></div> </div></div></blockquote></div></div></div></blockquote>Greetings, Paolo </body></html> --------------17C4B33F916F521113721D9F--

On Tue, Mar 21, 2017 at 9:24 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Simone,
I'll respond inline
Il 20/03/2017 11:59, Simone Tiraboschi ha scritto:
On Mon, Mar 20, 2017 at 11:15 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Yedidyah,
On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto: paolo.margara@polito.it> wrote:
Hi list,
I'm working on a system running on oVirt 3.6 and the Engine is reporting the warning "The Hosted Engine Storage Domain doesn't exist. It should be imported into the setup." repeatedly in the Events tab into the Admin Portal.
I've read into the list that Hosted Engine Storage Domain should be imported automatically into the setup during the upgrade to 3.6 (original setup was on 3.5), but this not happened while the HostedEngine is correctly visible into the VM tab after the upgrade. Was the upgrade to 3.6 successful and clean? The upgrade from 3.5 to 3.6 was successful, as every subsequent minor release upgrades. I rechecked the upgrade logs I haven't seen any relevant error. One addition information: I'm currently running on CentOS 7 and also the original setup was on this release version.
The Hosted Engine Storage Domain is on a dedicated gluster volume but considering that, if I remember correctly, oVirt 3.5 at that time did not support gluster as a backend for the HostedEngine at that time I had installed the engine using gluster's NFS server using 'localhost:/hosted-engine' as a mount point.
Currently on every nodes I can read into the log of the ovirt-hosted-engine-ha agent the following lines:
MainThread::INFO::2017-03-17 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) MainThread::INFO::2017-03-17 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) Best remote host virtnode-0-1 (id: 2 , score: 3400) MainThread::INFO::2017-03-17 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2017-03-17 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2017-03-17 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2017-03-17 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2017-03-17 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2017-03-17 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images) Preparing images MainThread::INFO::2017-03-17 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain MainThread::INFO::2017-03-17 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2017-03-17 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2017-03-17 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from O VF_STORE, falling back to initial vm.conf This is normal at your current state.
...and the following lines into the logfile engine.log inside the Hosted Engine:
2017-03-16 07:36:28,087 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2017-03-16 07:36:28,115 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAIL ED_STORAGE_DOMAIN_NOT_EXIST That's the thing to debug. Did you check vdsm logs on the hosts, near the time this happens? Some moments before I saw the following lines into the vdsm.log of the host that execute the hosted engine and that is the SPM, but I see the same lines also on the other nodes:
Thread-1746094::DEBUG::2017-03-16 07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init -> state preparing Thread-1746094::INFO::2017-03-16 07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None) Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::199::Storage.ResourceManager. Request::(__init__) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList' Thread-1746094::DEBUG::2017-03-16 07:36:00,413::resourceManager::545::Storage.ResourceManager: :(registerResource) Trying to register resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared' Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::604::Storage.ResourceManager: :(registerResource) Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now locking as 'shared' (1 active user) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::resourceManager::239::Storage.ResourceManager. Request::(grant) ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID= `8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted request Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired: Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared) Thread-1746094::DEBUG::2017-03-16 07:36:00,414::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False Thread-1746094::ERROR::2017-03-16 07:36:00,415::task::866::Storage.TaskManager.Task::(_setError) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::885::Storage.TaskManager.Task::(_run) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run: ae5af1a1-207c-432d-acfa-f3e03e014ee6 ('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task Thread-1746094::DEBUG::2017-03-16 07:36:00,415::task::1246::Storage.TaskManager.Task::(stop) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing (force False) Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True Thread-1746094::INFO::2017-03-16 07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted: u'list index out of range' - code 100 Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list index out of range Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::993::Storage.TaskManager.Task::(_decref) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True Thread-1746094::DEBUG::2017-03-16 07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False Thread-1746094::DEBUG::2017-03-16 07:36:00,416::resourceManager::980::Storage.ResourceManager. Owner::(cancelAll) Owner.cancelAll requests {} Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing -> state aborting Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none Thread-1746094::DEBUG::2017-03-16 07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState) Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting -> state failed
After that I tried to execute a simple query on storage domains using vdsClient and I got the following information:
# vdsClient -s 0 getStorageDomainsList 3b5db584-5d21-41dc-8f8d-712ce9423a27 0966f366-b5ae-49e8-b05e-bee1895c2d54 35223b83-e0bd-4c8d-91a9-8c6b85336e7d 2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7 # vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9 423a27 uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27 version = 3 role = Regular remotePath = localhost:/hosted-engine
Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.
type = NFS class = Data pool = [] name = default # vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27 list index out of range
All other storage domains have the pool attribute defined, could be this the issue? How can I assign to a pool the Hosted Engine Storage Domain?
This will be the result of the auto import process once feasible.
2017-03-16 07:36:28,116 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
How can I safely import the Hosted Engine Storage Domain into my
setup?
In this situation is safe to upgrade to oVirt 4.0?
This could be really tricky because I think that upgrading from an hosted-engine env from 3.5 deployed on hyperconverged env but mounted on NFS over a localhost loopback mount to 4.1 is something that is by far out of the paths we tested so I think you can hit a few surprises there.
In 4.1 the expected configuration under /etc/ovirt-hosted-engine/hosted-engine.conf includes: domainType=glusterfs storage=<FIRST_HOST_ADDR>:/path mnt_options=backup-volfile-servers=<SECOND_HOST_ADDR>:<THIRD_HOST_ADDR> But these requires more recent vdsm and ovirt-hosted-engine-ha versions. Then you also have to configure your engine to have both virt and gluster on the same cluster. Nothing is going to do them automatically for you on upgrades. I see two options here: 1. easy, but with a substantial downtime: shutdown your whole DC, start from scratch with gdeploy from 4.1 to configure a new gluster volume and anew engine over there, once you have a new engine import your existing storage domain and restart your VMs 2. a lot trickier, try to reach 4.1 status manually editing /etc/ovirt-hosted-engine/hosted-engine.conf and so on and upgrading everything to 4.1; this could be pretty risky because you are on a path we never tested since hyperconverged hosted-engine at 3.5 wasn't released.
I understood, definitely bad news for me. But currently I'm running oVirt 3.6.7 that, if I remember correctly, supports hyperconverged setup, it's not possible to fix this issue with my current version? I've installed vdsm 4.17.32-1.el7 with the vdsm-gluster package and ovirt-hosted-engine-ha 1.3.5.7-1.el7.centos on CentOS 7.2.1511 and my engine it's already configured to have both virt and gluster on the same cluster. I cannot put the cluster in maintenance, stop the hosted-engine, stop ovirt-hosted-engine-ha, edit hosted-engine.conf by changing domainType, storage and mnt_options and restart aovirt-hosted-engine-ha and the hosted-engine?
The support for custom mount options has been introduced here https://gerrit.ovirt.org/#/c/57787/ so it should be available since ovirt-hosted-engine-ha-1.3.5.6 and so you already have it on 3.6.7 In the mean time we had a lot of improvements for the hyperconverged scenario, that's why I'm strongly suggesting you to upgrade to 4.1.
I'd first try to solve this. > > What OS do you have on your hosts? Are they all upgraded to 3.6? > > See also: > > https://www.ovirt.org/documentation/how-to/hosted-engine-hos t-OS-upgrade/ > > Best, > >> >> Greetings, >> Paolo >> >> _______________________________________________ >> Users mailing list
Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > Greetings, Paolo _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman /listinfo/users
Greetings, Paolo
participants (3)
-
Paolo Margara
-
Simone Tiraboschi
-
Yedidyah Bar David