<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 20, 2017 at 10:12 AM, Paolo Margara <span dir="ltr"><<a href="mailto:paolo.margara@polito.it" target="_blank">paolo.margara@polito.it</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Yedidyah,<br>
<span class="gmail-"><br>
Il 19/03/2017 11:55, Yedidyah Bar David ha scritto:<br>
> On Sat, Mar 18, 2017 at 12:25 PM, Paolo Margara <<a href="mailto:paolo.margara@polito.it">paolo.margara@polito.it</a>> wrote:<br>
>> Hi list,<br>
>><br>
>> I'm working on a system running on oVirt 3.6 and the Engine is reporting<br>
>> the warning "The Hosted Engine Storage Domain doesn't exist. It should<br>
>> be imported into the setup." repeatedly in the Events tab into the Admin<br>
>> Portal.<br>
>><br>
>> I've read into the list that Hosted Engine Storage Domain should be<br>
>> imported automatically into the setup during the upgrade to 3.6<br>
>> (original setup was on 3.5), but this not happened while the<br>
>> HostedEngine is correctly visible into the VM tab after the upgrade.<br>
> Was the upgrade to 3.6 successful and clean?<br>
</span>The upgrade from 3.5 to 3.6 was successful, as every subsequent minor<br>
release upgrades. I rechecked the upgrade logs I haven't seen any<br>
relevant error.<br>
One addition information: I'm currently running on CentOS 7 and also the<br>
original setup was on this release version.<br>
<div><div class="gmail-h5">><br>
>> The Hosted Engine Storage Domain is on a dedicated gluster volume but<br>
>> considering that, if I remember correctly, oVirt 3.5 at that time did<br>
>> not support gluster as a backend for the HostedEngine at that time I had<br>
>> installed the engine using gluster's NFS server using<br>
>> 'localhost:/hosted-engine' as a mount point.<br>
>><br>
>> Currently on every nodes I can read into the log of the<br>
>> ovirt-hosted-engine-ha agent the following lines:<br>
>><br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:17,773::hosted_engine::<wbr>462::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(start_<wbr>monitoring)<br>
>> Current state EngineUp (score: 3400)<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:17,774::hosted_engine::<wbr>467::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(start_<wbr>monitoring)<br>
>> Best remote host virtnode-0-1 (id: 2<br>
>> , score: 3400)<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:27,956::hosted_engine::<wbr>613::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>vdsm)<br>
>> Initializing VDSM<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,055::hosted_engine::<wbr>658::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
>> Connecting the storage<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,078::storage_server::<wbr>218::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
>> Connecting storage server<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,278::storage_server::<wbr>222::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
>> Connecting storage server<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,398::storage_server::<wbr>230::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
>> Refreshing the storage domain<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,822::hosted_engine::<wbr>685::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
>> Preparing images<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:28,822::image::126::<wbr>ovirt_hosted_engine_ha.lib.<wbr>image.Image::(prepare_images)<br>
>> Preparing images<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:29,308::hosted_engine::<wbr>688::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
>> Reloading vm.conf from the<br>
>> shared storage domain<br>
>> MainThread::INFO::2017-03-17<br>
>> 14:04:29,309::config::206::<wbr>ovirt_hosted_engine_ha.agent.<wbr>hosted_engine.HostedEngine.<wbr>config::(refresh_local_conf_<wbr>file)<br>
>> Trying to get a fresher copy<br>
>> of vm configuration from the OVF_STORE<br>
>> MainThread::WARNING::2017-03-<wbr>17<br>
>> 14:04:29,567::ovf_store::104::<wbr>ovirt_hosted_engine_ha.lib.<wbr>ovf.ovf_store.OVFStore::(scan)<br>
>> Unable to find OVF_STORE<br>
>> MainThread::ERROR::2017-03-17<br>
>> 14:04:29,691::config::235::<wbr>ovirt_hosted_engine_ha.agent.<wbr>hosted_engine.HostedEngine.<wbr>config::(refresh_local_conf_<wbr>file)<br>
>> Unable to get vm.conf from O<br>
>> VF_STORE, falling back to initial vm.conf<br>
> This is normal at your current state.<br>
><br>
>> ...and the following lines into the logfile engine.log inside the Hosted<br>
>> Engine:<br>
>><br>
>> 2017-03-16 07:36:28,087 INFO<br>
>> [org.ovirt.engine.core.bll.<wbr>ImportHostedEngineStorageDomai<wbr>nCommand]<br>
>> (org.ovirt.thread.pool-8-<wbr>thread-38) [236d315c] Lock Acquired to object<br>
>> 'EngineLock:{exclusiveLocks='[<wbr>]', sharedLocks='null'}'<br>
>> 2017-03-16 07:36:28,115 WARN<br>
>> [org.ovirt.engine.core.bll.<wbr>ImportHostedEngineStorageDomai<wbr>nCommand]<br>
>> (org.ovirt.thread.pool-8-<wbr>thread-38) [236d315c] CanDoAction of action<br>
>> '<wbr>ImportHostedEngineStorageDomai<wbr>n' failed for user SYSTEM. Reasons:<br>
>> VAR__ACTION__ADD,VAR__TYPE__<wbr>STORAGE__DOMAIN,ACTION_TYPE_<wbr>FAILED_STORAGE_DOMAIN_NOT_<wbr>EXIST<br>
> That's the thing to debug. Did you check vdsm logs on the hosts, near<br>
> the time this happens?<br>
</div></div>Some moments before I saw the following lines into the vdsm.log of the<br>
host that execute the hosted engine and that is the SPM, but I see the<br>
same lines also on the other nodes:<br>
<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,412::task::595::<wbr>Storage.TaskManager.Task::(_<wbr>updateState)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state init -><br>
state preparing<br>
Thread-1746094::INFO::2017-03-<wbr>16<br>
07:36:00,413::logUtils::48::<wbr>dispatcher::(wrapper) Run and protect:<br>
getImagesList(sdUUID='<wbr>3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27', options=None)<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,413::resourceManager:<wbr>:199::Storage.ResourceManager.<wbr>Request::(__init__)<br>
ResName=`Storage.3b5db584-<wbr>5d21-41dc-8f8d-712ce9423a27`<wbr>ReqID=`8ea3c7f3-8ccd-4127-<wbr>96b1-ec97a3c7b8d4`::Request<br>
was made in '/usr/share/vdsm/storage/hsm.<wbr>py' line '3313' at 'getImagesList'<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,413::resourceManager:<wbr>:545::Storage.ResourceManager:<wbr>:(registerResource)<br>
Trying to register resource<br>
'Storage.3b5db584-5d21-41dc-<wbr>8f8d-712ce9423a27' for lock type 'shared'<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,414::resourceManager:<wbr>:604::Storage.ResourceManager:<wbr>:(registerResource)<br>
Resource 'Storage.3b5db584-5d21-41dc-<wbr>8f8d-712ce9423a27' is free. Now<br>
locking as 'shared' (1 active user)<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,414::resourceManager:<wbr>:239::Storage.ResourceManager.<wbr>Request::(grant)<br>
ResName=`Storage.3b5db584-<wbr>5d21-41dc-8f8d-712ce9423a27`<wbr>ReqID=`8ea3c7f3-8ccd-4127-<wbr>96b1-ec97a3c7b8d4`::Granted<br>
request<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,414::task::827::<wbr>Storage.TaskManager.Task::(<wbr>resourceAcquired)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::_<wbr>resourcesAcquired:<br>
Storage.3b5db584-5d21-41dc-<wbr>8f8d-712ce9423a27 (shared)<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,414::task::993::<wbr>Storage.TaskManager.Task::(_<wbr>decref)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 1 aborting False<br>
Thread-1746094::ERROR::2017-<wbr>03-16<br>
07:36:00,415::task::866::<wbr>Storage.TaskManager.Task::(_<wbr>setError)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Unexpected error<br>
Traceback (most recent call last):<br>
File "/usr/share/vdsm/storage/task.<wbr>py", line 873, in _run<br>
return fn(*args, **kargs)<br>
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper<br>
res = f(*args, **kwargs)<br>
File "/usr/share/vdsm/storage/hsm.<wbr>py", line 3315, in getImagesList<br>
images = dom.getAllImages()<br>
File "/usr/share/vdsm/storage/<wbr>fileSD.py", line 373, in getAllImages<br>
self.getPools()[0],<br>
IndexError: list index out of range<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,415::task::885::<wbr>Storage.TaskManager.Task::(_<wbr>run)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Task._run:<br>
ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6<br>
('3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27',) {} failed - stopping task<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,415::task::1246::<wbr>Storage.TaskManager.Task::(<wbr>stop)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::stopping in state preparing<br>
(force False)<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,416::task::993::<wbr>Storage.TaskManager.Task::(_<wbr>decref)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 1 aborting True<br>
Thread-1746094::INFO::2017-03-<wbr>16<br>
07:36:00,416::task::1171::<wbr>Storage.TaskManager.Task::(<wbr>prepare)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::aborting: Task is aborted:<br>
u'list index out of range' - code 100<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,416::task::1176::<wbr>Storage.TaskManager.Task::(<wbr>prepare)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Prepare: aborted: list<br>
index out of range<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,416::task::993::<wbr>Storage.TaskManager.Task::(_<wbr>decref)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::ref 0 aborting True<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,416::task::928::<wbr>Storage.TaskManager.Task::(_<wbr>doAbort)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::Task._doAbort: force False<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,416::resourceManager:<wbr>:980::Storage.ResourceManager.<wbr>Owner::(cancelAll)<br>
Owner.cancelAll requests {}<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,417::task::595::<wbr>Storage.TaskManager.Task::(_<wbr>updateState)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state preparing<br>
-> state aborting<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,417::task::550::<wbr>Storage.TaskManager.Task::(__<wbr>state_aborting)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::_aborting: recover policy none<br>
Thread-1746094::DEBUG::2017-<wbr>03-16<br>
07:36:00,417::task::595::<wbr>Storage.TaskManager.Task::(_<wbr>updateState)<br>
Task=`ae5af1a1-207c-432d-acfa-<wbr>f3e03e014ee6`::moving from state aborting<br>
-> state failed<br>
<br>
After that I tried to execute a simple query on storage domains using<br>
vdsClient and I got the following information:<br>
<br>
# vdsClient -s 0 getStorageDomainsList<br>
3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27<br>
0966f366-b5ae-49e8-b05e-<wbr>bee1895c2d54<br>
35223b83-e0bd-4c8d-91a9-<wbr>8c6b85336e7d<br>
2c3994e3-1f93-4f2a-8a0a-<wbr>0b5d388a2be7<br>
# vdsClient -s 0 getStorageDomainInfo 3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27<br>
uuid = 3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27<br>
version = 3<br>
role = Regular<br>
remotePath = localhost:/hosted-engine<br></blockquote><div><br></div><div>Your issue is probably here: by design all the hosts of a single datacenter should be able to see all the storage domains including the hosted-engine one but if try to mount it as localhost:/hosted-engine this will not be possible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
type = NFS<br>
class = Data<br>
pool = []<br>
name = default<br>
# vdsClient -s 0 getImagesList 3b5db584-5d21-41dc-8f8d-<wbr>712ce9423a27<br>
list index out of range<br>
<br>
All other storage domains have the pool attribute defined, could be this<br>
the issue? How can I assign to a pool the Hosted Engine Storage Domain?<br></blockquote><div><br></div><div>This will be the result of the auto import process once feasible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="gmail-HOEnZb"><div class="gmail-h5">><br>
>> 2017-03-16 07:36:28,116 INFO<br>
>> [org.ovirt.engine.core.bll.<wbr>ImportHostedEngineStorageDomai<wbr>nCommand]<br>
>> (org.ovirt.thread.pool-8-<wbr>thread-38) [236d315c] Lock freed to object<br>
>> 'EngineLock:{exclusiveLocks='[<wbr>]', sharedLocks='null'}'<br>
>><br>
>> How can I safely import the Hosted Engine Storage Domain into my setup?<br>
>> In this situation is safe to upgrade to oVirt 4.0?<br>
> I'd first try to solve this.<br>
><br>
> What OS do you have on your hosts? Are they all upgraded to 3.6?<br>
><br>
> See also:<br>
><br>
> <a href="https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/" rel="noreferrer" target="_blank">https://www.ovirt.org/<wbr>documentation/how-to/hosted-<wbr>engine-host-OS-upgrade/</a><br>
><br>
> Best,<br>
><br>
>><br>
>> Greetings,<br>
>> Paolo<br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Users mailing list<br>
>> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
>> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
><br>
><br>
Greetings,<br>
Paolo<br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
</div></div></blockquote></div><br></div></div>