On Tue, Sep 6, 2016 at 4:25 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

I’ve just reinstall the host and have the same issue, here is the ERROR messages from the vdsm logs :

Thread-43::ERROR::2016-09-06 16:02:54,399::hsm::2551::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer

Thread-43::ERROR::2016-09-06 16:02:54,453::hsm::2551::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer

Thread-43::ERROR::2016-09-06 16:02:54,475::hsm::2551::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer

Thread-51::ERROR::2016-09-06 16:05:38,319::hsm::2453::Storage.HSM::(connectStorageServer) Could not connect to storageServer

Thread-52::ERROR::2016-09-06 16:05:38,636::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc

Thread-52::ERROR::2016-09-06 16:05:38,637::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc

Thread-52::ERROR::2016-09-06 16:05:38,756::sdc::143::Storage.StorageDomainCache::(_findDomain) domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc not found

Thread-52::ERROR::2016-09-06 16:05:38,769::task::866::Storage.TaskManager.Task::(_setError) Task=`1cfc5c30-fc82-44d6-a296-223fa9426417`::Unexpected error

Thread-52::ERROR::2016-09-06 16:05:38,788::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc'", 'code': 304}}

This SD is present on the host1 (CentOS 6.8) :

[root@unc-srv-hyp1 ~]$ ll /rhev/data-center/

00000002-0002-0002-0002-000000000193/ mnt/

Not in the host2 (CentOS 7.2) :

[root@unc-srv-hyp2 ~]# ll /rhev/data-center/

total 4

drwxr-xr-x. 5 vdsm kvm 4096 6 sept. 16:03 mnt

The vdsm.log more complete :

Thread-88::DEBUG::2016-09-06 16:22:03,057::iscsi::424::Storage.ISCSI::(rescan) Performing SCSI scan, this will take up to 30 seconds

Thread-88::DEBUG::2016-09-06 16:22:03,057::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m session -R (cwd None)

Thread-88::DEBUG::2016-09-06 16:22:03,078::misc::751::Storage.SamplingMethod::(__call__) Returning last result

Thread-88::DEBUG::2016-09-06 16:22:03,079::misc::741::Storage.SamplingMethod::(__call__) Trying to enter sampling method (storage.hba.rescan)

Thread-88::DEBUG::2016-09-06 16:22:03,079::misc::743::Storage.SamplingMethod::(__call__) Got in to sampling method

Thread-88::DEBUG::2016-09-06 16:22:03,080::hba::53::Storage.HBA::(rescan) Starting scan

Thread-88::DEBUG::2016-09-06 16:22:03,080::utils::755::Storage.HBA::(execCmd) /usr/bin/sudo -n /usr/libexec/vdsm/fc-scan (cwd None)

Thread-88::DEBUG::2016-09-06 16:22:03,134::hba::66::Storage.HBA::(rescan) Scan finished

Thread-88::DEBUG::2016-09-06 16:22:03,134::misc::751::Storage.SamplingMethod::(__call__) Returning last result

Thread-88::DEBUG::2016-09-06 16:22:03,135::multipath::131::Storage.Misc.excCmd::(rescan) /usr/bin/sudo -n /sbin/multipath (cwd None)

Thread-88::DEBUG::2016-09-06 16:22:03,201::multipath::131::Storage.Misc.excCmd::(rescan) SUCCESS: <err> = ''; <rc> = 0

Thread-88::DEBUG::2016-09-06 16:22:03,202::utils::755::root::(execCmd) /sbin/udevadm settle --timeout=5 (cwd None)

Thread-88::DEBUG::2016-09-06 16:22:03,227::utils::775::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0

Thread-88::DEBUG::2016-09-06 16:22:03,228::lvm::498::Storage.OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,228::lvm::500::Storage.OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,229::lvm::509::Storage.OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,229::lvm::511::Storage.OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,229::lvm::529::Storage.OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,230::lvm::531::Storage.OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,230::misc::751::Storage.SamplingMethod::(__call__) Returning last result

Thread-88::ERROR::2016-09-06 16:22:03,230::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc

Thread-88::ERROR::2016-09-06 16:22:03,231::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc

Thread-88::DEBUG::2016-09-06 16:22:03,231::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex

Thread-88::DEBUG::2016-09-06 16:22:03,233::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name cc9ab4b2-9880-427b-8f3b-61f03e520cbc (cwd None)

Thread-88::DEBUG::2016-09-06 16:22:03,275::lvm::291::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n Volume group "cc9ab4b2-9880-427b-8f3b-61f03e520cbc" not found\n Cannot process volume group cc9ab4b2-9880-427b-8f3b-61f03e520cbc\n'; <rc> = 5

Thread-88::WARNING::2016-09-06 16:22:03,277::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', ' Volume group "cc9ab4b2-9880-427b-8f3b-61f03e520cbc" not found', ' Cannot process volume group cc9ab4b2-9880-427b-8f3b-61f03e520cbc']

Thread-88::DEBUG::2016-09-06 16:22:03,277::lvm::416::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex

Thread-90::DEBUG::2016-09-06 16:22:03,293::__init__::318::IOProcessClient::(_run) Starting IOProcess...

Thread-91::DEBUG::2016-09-06 16:22:03,314::__init__::318::IOProcessClient::(_run) Starting IOProcess...

Thread-88::ERROR::2016-09-06 16:22:03,334::sdc::143::Storage.StorageDomainCache::(_findDomain) domain cc9ab4b2-9880-427b-8f3b-61f03e520cbc not found

Traceback (most recent call last):

File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain

    dom = findMethod(sdUUID)

File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain

    raise se.StorageDomainDoesNotExist(sdUUID)

StorageDomainDoesNotExist: Storage domain does not exist: (u'cc9ab4b2-9880-427b-8f3b-61f03e520cbc',)

Thread-88::DEBUG::2016-09-06 16:22:03,335::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.00000002-0002-0002-0002-000000000193'

Thread-88::DEBUG::2016-09-06 16:22:03,336::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.00000002-0002-0002-0002-000000000193' (0 active users)

Thread-88::DEBUG::2016-09-06 16:22:03,336::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.00000002-0002-0002-0002-000000000193' is free, finding out if anyone is waiting for it.

Thread-88::DEBUG::2016-09-06 16:22:03,337::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.00000002-0002-0002-0002-000000000193', Clearing records.

Thread-88::DEBUG::2016-09-06 16:22:03,337::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.HsmDomainMonitorLock'

Thread-88::DEBUG::2016-09-06 16:22:03,338::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.HsmDomainMonitorLock' (0 active users)

Thread-88::DEBUG::2016-09-06 16:22:03,338::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.HsmDomainMonitorLock' is free, finding out if anyone is waiting for it.

Thread-88::DEBUG::2016-09-06 16:22:03,338::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.HsmDomainMonitorLock', Clearing records.

Thread-88::ERROR::2016-09-06 16:22:03,338::task::866::Storage.TaskManager.Task::(_setError) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::Unexpected error

Traceback (most recent call last):

File "/usr/share/vdsm/storage/task.py", line 873, in _run

    return fn(*args, **kargs)

File "/usr/share/vdsm/logUtils.py", line 45, in wrapper

    res = f(*args, **kwargs)

File "/usr/share/vdsm/storage/hsm.py", line 1039, in connectStoragePool

    spUUID, hostID, msdUUID, masterVersion, domainsMap)

File "/usr/share/vdsm/storage/hsm.py", line 1104, in _connectStoragePool

    res = pool.connect(hostID, msdUUID, masterVersion)

File "/usr/share/vdsm/storage/sp.py", line 637, in connect

    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)

File "/usr/share/vdsm/storage/sp.py", line 1179, in __rebuild

    self.setMasterDomain(msdUUID, masterVersion)

File "/usr/share/vdsm/storage/sp.py", line 1390, in setMasterDomain

    raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)

StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc'

Thread-88::DEBUG::2016-09-06 16:22:03,339::task::885::Storage.TaskManager.Task::(_run) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::Task._run: 4898233c-f6c6-45fa-a2ee-42e63e189063 (u'00000002-0002-0002-0002-000000000193', 2, u'cc9ab4b2-9880-427b-8f3b-61f03e520cbc', 7, {u'015799ac-ec66-4a9d-a8d6-6e9ec980972d': u'active', u'2fcd37ce-cb88-4026-88df-d4d472b41ecf': u'active', u'76a1fed9-2e60-4b3e-9f00-efca8acd133d': u'active', u'7e40772a-fe94-4fb2-94c4-6198bed04a6a': u'active', u'cb4c84c1-489a-433f-999a-f1aeec9d62cf': u'active', u'ea05d014-f8f0-4f1d-906e-2e93c8907d7d': u'active', u'cc9ab4b2-9880-427b-8f3b-61f03e520cbc': u'active'}) {} failed - stopping task

Thread-88::DEBUG::2016-09-06 16:22:03,339::task::1217::Storage.TaskManager.Task::(stop) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::stopping in state preparing (force False)

Thread-88::DEBUG::2016-09-06 16:22:03,340::task::993::Storage.TaskManager.Task::(_decref) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::ref 1 aborting True

Thread-88::INFO::2016-09-06 16:22:03,340::task::1171::Storage.TaskManager.Task::(prepare) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::aborting: Task is aborted: 'Cannot find master domain' - code 304

Thread-88::DEBUG::2016-09-06 16:22:03,340::task::1176::Storage.TaskManager.Task::(prepare) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::Prepare: aborted: Cannot find master domain

Thread-88::DEBUG::2016-09-06 16:22:03,341::task::993::Storage.TaskManager.Task::(_decref) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::ref 0 aborting True

Thread-88::DEBUG::2016-09-06 16:22:03,341::task::928::Storage.TaskManager.Task::(_doAbort) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::Task._doAbort: force False

Thread-88::DEBUG::2016-09-06 16:22:03,341::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Thread-88::DEBUG::2016-09-06 16:22:03,341::task::595::Storage.TaskManager.Task::(_updateState) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::moving from state preparing -> state aborting

Thread-88::DEBUG::2016-09-06 16:22:03,342::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::_aborting: recover policy none

Thread-88::DEBUG::2016-09-06 16:22:03,342::task::595::Storage.TaskManager.Task::(_updateState) Task=`4898233c-f6c6-45fa-a2ee-42e63e189063`::moving from state aborting -> state failed

Thread-88::DEBUG::2016-09-06 16:22:03,342::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}

Thread-88::DEBUG::2016-09-06 16:22:03,342::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Thread-88::ERROR::2016-09-06 16:22:03,343::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc'", 'code': 304}}

I ‘ve restart the service lvm2-lvmetad.service and enable it as adviced in the log without success

The engine.log :

2016-09-06 16:20:02,971 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (DefaultQuartzScheduler_Worker-26) [6f596728] Running command: InitVdsOnUpCommand internal: true. Entities affected : ID: 00000002-0002-0002-0002-000000000193 Type: StoragePool

2016-09-06 16:20:02,982 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 00000002-0002-0002-0002-000000000193 Type: StoragePool

2016-09-06 16:20:03,035 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] START, ConnectStorageServerVDSCommand(HostName = unc-srv-hyp2, HostId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, storagePoolId = 00000002-0002-0002-0002-000000000193, storageType = NFS, connectionList = [{ id: 2ceca65a-90a0-4daf-82a3-366de490a71e, connection: unc-srv-oman.cfu.local:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 76a3f706-e7d4-49cd-9e0a-da6061b6b2d6, connection: unc-srv-hyp1.cfu.local:/exports/import_domain, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: ea46418b-68b3-4f9b-9316-b1d57f17ecbc, connection: unc-srv-oman.cfu.local:/data/Master, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 315222a9

2016-09-06 16:20:03,080 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] FINISH, ConnectStorageServerVDSCommand, return: {ea46418b-68b3-4f9b-9316-b1d57f17ecbc=0, 76a3f706-e7d4-49cd-9e0a-da6061b6b2d6=0, 2ceca65a-90a0-4daf-82a3-366de490a71e=0}, log id: 315222a9

2016-09-06 16:20:03,115 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] START, ConnectStorageServerVDSCommand(HostName = unc-srv-hyp2, HostId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, storagePoolId = 00000002-0002-0002-0002-000000000193, storageType = ISCSI, connectionList = [{ id: 3efad56e-a86d-4682-b799-51f42713cda6, connection: 192.168.4.1, iqn: iqn.1984-05.com.dell:powervault.md3600i.690b11c0005592a90000000051c3efc7, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: f9c9ff3

2016-09-06 16:22:03,952 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] FINISH, ConnectStorageServerVDSCommand, return: {3efad56e-a86d-4682-b799-51f42713cda6=465}, log id: f9c9ff3

2016-09-06 16:22:03,976 INFO [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (DefaultQuartzScheduler_Worker-26) [7d7c3300] The lun with id xu5AAG-1FHh-Qxbx-ZXIe-55J0-vscw-kb6fdp was reported as problematic !

2016-09-06 16:22:03,992 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-26) [7d7c3300] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: The error message for connection 192.168.4.1 iqn.1984-05.com.dell:powervault.md3600i.690b11c0005592a90000000051c3efc7 (LUN mpathc) returned by VDSM was: Failed to setup iSCSI subsystem

2016-09-06 16:22:03,997 ERROR [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (DefaultQuartzScheduler_Worker-26) [7d7c3300] The connection with details 192.168.4.1 iqn.1984-05.com.dell:powervault.md3600i.690b11c0005592a90000000051c3efc7 (LUN mpathc) failed because of error code 465 and error message is: failed to setup iscsi subsystem

^^^

Thanks, the issue is here.

Can you please ensure that your host could properly access that iSCSI LUN?

Can you please check VDSM logs for that time frame?

2016-09-06 16:22:03,998 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (DefaultQuartzScheduler_Worker-26) [7d7c3300] Host unc-srv-hyp2 storage connection was failed

2016-09-06 16:22:04,004 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-26) [7d7c3300] Correlation ID: 7d7c3300, Call Stack: null, Custom Event ID: -1, Message: Failed to connect Host unc-srv-hyp2 to Storage Servers

2016-09-06 16:22:04,040 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-20) START, ConnectStoragePoolVDSCommand(HostName = unc-srv-hyp2, HostId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, vdsId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, storagePoolId = 00000002-0002-0002-0002-000000000193, masterVersion = 7), log id: 321780b7

2016-09-06 16:22:04,346 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-20) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value

StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc']]

2016-09-06 16:22:04,347 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-20) HostName = unc-srv-hyp2

2016-09-06 16:22:04,348 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-20) Command ConnectStoragePoolVDSCommand(HostName = unc-srv-hyp2, HostId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, vdsId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, storagePoolId = 00000002-0002-0002-0002-000000000193, masterVersion = 7) execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc'

2016-09-06 16:22:04,350 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-20) FINISH, ConnectStoragePoolVDSCommand, log id: 321780b7

2016-09-06 16:22:04,351 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (org.ovirt.thread.pool-8-thread-20) Could not connect host unc-srv-hyp2 to pool UNICEF with the message: null

2016-09-06 16:22:04,399 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-26) [60403257] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 5bf45c09-41d4-4125-a4bd-81af2a100db8 Type: VDS

2016-09-06 16:22:04,426 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-26) [60403257] START, SetVdsStatusVDSCommand(HostName = unc-srv-hyp2, HostId = 5bf45c09-41d4-4125-a4bd-81af2a100db8, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE, stopSpmFailureLogged=false), log id: 7f47d8b

2016-09-06 16:22:04,435 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-26) [60403257] FINISH, SetVdsStatusVDSCommand, log id: 7f47d8b

2016-09-06 16:22:04,448 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-26) [60403257] Correlation ID: 60403257, Job ID: 4203a573-20f2-45e5-b251-e4e09f1ed951, Call Stack: null, Custom Event ID: -1, Message: Host unc-srv-hyp2 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center UNICEF. Setting Host state to Non-Operational.

2016-09-06 16:22:04,484 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-26) [60403257] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Power Management test failed for Host unc-srv-hyp2.There is no other host in the data center that can be used to test the power management settings.

2016-09-06 16:22:04,488 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-26) [60403257] Correlation ID: 6f596728, Call Stack: null, Custom Event ID: -1, Message: Failed to connect Host unc-srv-hyp2 to Storage Pool UNICEF

2016-09-06 16:22:04,528 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (DefaultQuartzScheduler_Worker-26) [709cb459] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 5bf45c09-41d4-4125-a4bd-81af2a100db8 Type: VDS

2016-09-06 16:22:04,530 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-26) [709cb459] Host 5bf45c09-41d4-4125-a4bd-81af2a100db8 : unc-srv-hyp2 is already in NonOperational status for reason STORAGE_DOMAIN_UNREACHABLE. SetNonOperationalVds command is skipped.

Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr

De : Simone Tiraboschi [mailto:stirabos@redhat.com]
Envoyé : lundi 5 septembre 2016 17:21
À : VONDRA Alain <AVONDRA@unicef.fr>
Cc : Yedidyah Bar David <didi@redhat.com>; users <users@ovirt.org>

Objet : Re: [ovirt-users] HELP Upgrade hypervisors from CentOS 6.8 to CentOS 7

On Mon, Sep 5, 2016 at 5:09 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

To resume the situation and be the most clear I can.

I have :

-          2 hypervisors (physical) with CentOS 6.8 : HYP1 and HYP2

-          1 oVirt manager 3.5 (physical and not hosted-engine) with CentOS 7.2

And to be able to upgrade to oVirt 3.6 or more I need to upgrade the 2 hypervisors to CentOS 7.2.

So I began with HYP2, I’ve removed it from the Cluster, and installed it with CentOS 7.2, but when I wanted to join it to a new cluster as shown in the doc, HYP2 stayed always unresponsive.

We need vdsm logs to understand the issue here.

Could you please reproduce it?

So I’ve restored the HYP2 to Centos 6.8, reinstalled it on the previous cluster and everything works well, but I really want to upgrade to CentOS 7.2 to be up-to-date, because I will have soonly more than two versions late.

Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr

De : Yedidyah Bar David [mailto:didi@redhat.com]
Envoyé : lundi 5 septembre 2016 15:53

À : VONDRA Alain <AVONDRA@unicef.fr>
Cc : Simone Tiraboschi <stirabos@redhat.com>; users <users@ovirt.org>
Objet : Re: [ovirt-users] HELP Upgrade hypervisors from CentOS 6.8 to CentOS 7

On Mon, Sep 5, 2016 at 4:34 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

As you say, “This should”, but is there anybody there who tried this operation ?

When I wrote [1], it worked for me.

If something does not work for you, I think it's best to open a bug

and attach all relevant logs, including hosted-engine-setup and vdsm

logs from the host and all engine logs from the engine vm.

I can't tell from your log snippet why your new host failed to attach

to the storage domain. If it's reproducible, please check/post vdsm logs.

[1] https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/

Is there any other option to upgrade the hypervisors ?

Even if there is another option, the fact that you can't add a host

is probably problematic in itself, no? What if you actually need to

add a host?

Did you try adding a el6 host and it did work?

Best,

Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr

De : Yedidyah Bar David [mailto:didi@redhat.com]
Envoyé : lundi 5 septembre 2016 12:22

À : VONDRA Alain <AVONDRA@unicef.fr>
Cc : Simone Tiraboschi <stirabos@redhat.com>; users <users@ovirt.org>
Objet : Re: [ovirt-users] HELP Upgrade hypervisors from CentOS 6.8 to CentOS 7

On Mon, Sep 5, 2016 at 1:11 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

Unlikely, I didn’t save them, I had to rollback the host quickly before the Week-end.

All that I can tell, that is all seemed to work well during the installation of the host, all the networks were connected to the SAN, but the host didn’t want to go UP, still staying unresponsive with the message below from the oVirt engine.

Can you assure me that the installation of a physical hypervisor with CentOS 7, is possible if I put it on a different cluster than the other host using CentOS 6.8 ?

Yes, this should work.

Thanks

Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr

De : Yedidyah Bar David [mailto:didi@redhat.com]
Envoyé : dimanche 4 septembre 2016 08:50
À : VONDRA Alain <AVONDRA@unicef.fr>
Cc : Simone Tiraboschi <stirabos@redhat.com>; users <users@ovirt.org>

Objet : Re: [ovirt-users] HELP Upgrade hypervisors from CentOS 6.8 to CentOS 7

On Fri, Sep 2, 2016 at 7:31 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

Hi,

I’ve followed this doc, and hoped to find a solution with it, so I did’nt use the hosted-engine –deploy command, I’ve added a new host and at the end of the installation, the host stays unresponsive because unable to be attached to the same Storage volume .

Engine log :

2016-09-02 16:57:01,780 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (org.ovirt.thread.pool-8-thread-28) Could not connect host unc-srv-hyp2 to pool UNICEF with the message: null

2016-09-02 17:00:01,634 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-8-thread-3) Command ConnectStoragePoolVDSCommand(HostName = unc-srv-hyp2, HostId = ee1c57ce-1c77-47b1-b466-7bf99382dd77, vdsId = ee1c57ce-1c77-47b1-b466-7bf99382dd77, storagePoolId = 00000002-0002-0002-0002-000000000193, masterVersion = 7) execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=00000002-0002-0002-0002-000000000193, msdUUID=cc9ab4b2-9880-427b-8f3b-61f03e520cbc'

Please check/post vdsm logs from the host. Thanks.

Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr

De : Simone Tiraboschi [mailto:stirabos@redhat.com]
Envoyé : vendredi 2 septembre 2016 18:20
À : VONDRA Alain <AVONDRA@unicef.fr>
Cc : Nir Soffer <nsoffer@redhat.com>; users <users@ovirt.org>
Objet : Re: [ovirt-users] HELP Upgrade hypervisors from CentOS 6.8 to CentOS 7

On Fri, Sep 2, 2016 at 5:21 PM, VONDRA Alain <AVONDRA@unicef.fr> wrote:

Hi,
I'd like to upgrade my oVirt environment from 3.5 to 3.6 and maybe 4, actually the manager oVirt is un 3.5 version, installed on a CentOS 7.2 and the two hypervisors are installed on CentOS 6.8.
I need anyway to upgrade the hosts to be able to move to 3.6.
I've tried to upgrade the first host, but I had of course issues telling that it can't be possible to mix different OS in the same cluster, I've also tried to create another cluster to put this host without success.
What is the best way to upgrade cleanly and safely ?
Thank you in advance for your advices

Follow this:

https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/

Simply ignore the steps that refers to the engine VM if you are with the engine on a physical system.

Alain VONDRA

Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76

UNICEF France
3 rue Duguay Trouin 75006
PARIS
www.unicef.fr

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--

Didi

--

Didi

--

Didi