On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <paolo.margara@polito.it> wrote:
Hi Yedidyah,

Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:
> On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <paolo.margara@polito.it> wrote:
>> Hi list,
>>
>> I'm working on a system running on oVirt 3.6 and the Engine is reporting
>> the warning "The Hosted Engine Storage Domain doesn't exist. It should
>> be imported into the setup." repeatedly in the Events tab into the Admin
>> Portal.
>>
>> I've read into the list that Hosted Engine Storage Domain should be
>> imported automatically into the setup during the upgrade to 3.6
>> (original setup was on 3.5), but this not happened while the
>> HostedEngine is correctly visible into the VM tab after the upgrade.
> Was the upgrade to 3.6 successful and clean?
The upgrade from 3.5 to 3.6 was successful, as every subsequent minor
release upgrades. I rechecked the upgrade logs I haven't seen any
relevant error.
One addition information: I'm currently running on CentOS 7 and also the
original setup was on this release version.
>
>> The Hosted Engine Storage Domain is on a dedicated gluster volume but
>> considering that, if I remember correctly, oVirt 3.5 at that time did
>> not support gluster as a backend for the HostedEngine at that time I had
>> installed the engine using gluster's NFS server using
>> 'localhost:/hosted-engine' as a mount point.
>>
>> Currently on every nodes I can read into the log of the
>> ovirt-hosted-engine-ha agent the following lines:
>>
>> MainThread::INFO::2017-03-17
>> 14:04:17,773::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Current state EngineUp (score: 3400)
>> MainThread::INFO::2017-03-17
>> 14:04:17,774::hosted_engine::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Best remote host virtnode-0-1 (id: 2
>> , score: 3400)
>> MainThread::INFO::2017-03-17
>> 14:04:27,956::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>> Initializing VDSM
>> MainThread::INFO::2017-03-17
>> 14:04:28,055::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>> Connecting the storage
>> MainThread::INFO::2017-03-17
>> 14:04:28,078::storage_server::218::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>> MainThread::INFO::2017-03-17
>> 14:04:28,278::storage_server::222::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>> MainThread::INFO::2017-03-17
>> 14:04:28,398::storage_server::230::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Refreshing the storage domain
>> MainThread::INFO::2017-03-17
>> 14:04:28,822::hosted_engine::685::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>> Preparing images
>> MainThread::INFO::2017-03-17
>> 14:04:28,822::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
>> Preparing images
>> MainThread::INFO::2017-03-17
>> 14:04:29,308::hosted_engine::688::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>> Reloading vm.conf from the
>>  shared storage domain
>> MainThread::INFO::2017-03-17
>> 14:04:29,309::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
>> Trying to get a fresher copy
>> of vm configuration from the OVF_STORE
>> MainThread::WARNING::2017-03-17
>> 14:04:29,567::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>> Unable to find OVF_STORE
>> MainThread::ERROR::2017-03-17
>> 14:04:29,691::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
>> Unable to get vm.conf from O
>> VF_STORE, falling back to initial vm.conf
> This is normal at your current state.
>
>> ...and the following lines into the logfile engine.log inside the Hosted
>> Engine:
>>
>> 2017-03-16 07:36:28,087 INFO
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock Acquired to object
>> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>> 2017-03-16 07:36:28,115 WARN
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] CanDoAction of action
>> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons:
>> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST
> That's the thing to debug. Did you check vdsm logs on the hosts, near
> the time this happens?
Some moments before I saw the following lines into the vdsm.log of the
host that execute the hosted engine and that is the SPM, but I see the
same lines also on the other nodes:

Thread-1746094::DEBUG::2017-03-16
07:36:00,412::task::595::Storage.TaskManager.Task::(_updateState)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state init ->
state preparing
Thread-1746094::INFO::2017-03-16
07:36:00,413::logUtils::48::dispatcher::(wrapper) Run and protect:
getImagesList(sdUUID='3b5db584-5d21-41dc-8f8d-712ce9423a27', options=None)
Thread-1746094::DEBUG::2017-03-16
07:36:00,413::resourceManager::199::Storage.ResourceManager.Request::(__init__)
ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Request
was made in '/usr/share/vdsm/storage/hsm.py' line '3313' at 'getImagesList'
Thread-1746094::DEBUG::2017-03-16
07:36:00,413::resourceManager::545::Storage.ResourceManager::(registerResource)
Trying to register resource
'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' for lock type 'shared'
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::resourceManager::604::Storage.ResourceManager::(registerResource)
Resource 'Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27' is free. Now
locking as 'shared' (1 active user)
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::resourceManager::239::Storage.ResourceManager.Request::(grant)
ResName=`Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27`ReqID=`8ea3c7f3-8ccd-4127-96b1-ec97a3c7b8d4`::Granted
request
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::task::827::Storage.TaskManager.Task::(resourceAcquired)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_resourcesAcquired:
Storage.3b5db584-5d21-41dc-8f8d-712ce9423a27 (shared)
Thread-1746094::DEBUG::2017-03-16
07:36:00,414::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting False
Thread-1746094::ERROR::2017-03-16
07:36:00,415::task::866::Storage.TaskManager.Task::(_setError)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3315, in getImagesList
    images = dom.getAllImages()
  File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages
    self.getPools()[0],
IndexError: list index out of range
Thread-1746094::DEBUG::2017-03-16
07:36:00,415::task::885::Storage.TaskManager.Task::(_run)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._run:
ae5af1a1-207c-432d-acfa-f3e03e014ee6
('3b5db584-5d21-41dc-8f8d-712ce9423a27',) {} failed - stopping task
Thread-1746094::DEBUG::2017-03-16
07:36:00,415::task::1246::Storage.TaskManager.Task::(stop)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::stopping in state preparing
(force False)
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 1 aborting True
Thread-1746094::INFO::2017-03-16
07:36:00,416::task::1171::Storage.TaskManager.Task::(prepare)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::aborting: Task is aborted:
u'list index out of range' - code 100
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::1176::Storage.TaskManager.Task::(prepare)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Prepare: aborted: list
index out of range
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::993::Storage.TaskManager.Task::(_decref)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::ref 0 aborting True
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::task::928::Storage.TaskManager.Task::(_doAbort)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::Task._doAbort: force False
Thread-1746094::DEBUG::2017-03-16
07:36:00,416::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state preparing
-> state aborting
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::550::Storage.TaskManager.Task::(__state_aborting)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::_aborting: recover policy none
Thread-1746094::DEBUG::2017-03-16
07:36:00,417::task::595::Storage.TaskManager.Task::(_updateState)
Task=`ae5af1a1-207c-432d-acfa-f3e03e014ee6`::moving from state aborting
-> state failed

After that I tried to execute a simple query on storage domains using
vdsClient and I got the following information:

# vdsClient -s 0 getStorageDomainsList
3b5db584-5d21-41dc-8f8d-712ce9423a27
0966f366-b5ae-49e8-b05e-bee1895c2d54
35223b83-e0bd-4c8d-91a9-8c6b85336e7d
2c3994e3-1f93-4f2a-8a0a-0b5d388a2be7
# vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-712ce9423a27
    uuid = 3b5db584-5d21-41dc-8f8d-712ce9423a27
    version = 3
    role = Regular
    remotePath = localhost:/hosted-engine

Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.
 
    type = NFS
    class = Data
    pool = []
    name = default
# vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-712ce9423a27
list index out of range

All other storage domains have the pool attribute defined, could be this
the issue? How can I assign to a pool the Hosted Engine Storage Domain?

This will be the result of the auto import process once feasible.
 
>
>> 2017-03-16 07:36:28,116 INFO
>> [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand]
>> (org.ovirt.thread.pool-8-thread-38) [236d315c] Lock freed to object
>> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
>>
>> How can I safely import the Hosted Engine Storage Domain into my setup?
>> In this situation is safe to upgrade to oVirt 4.0?
> I'd first try to solve this.
>
> What OS do you have on your hosts? Are they all upgraded to 3.6?
>
> See also:
>
> https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/
>
> Best,
>
>>
>> Greetings,
>>     Paolo
>>
>> _______________________________________________
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
Greetings,
    Paolo
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users