[ovirt-users] Hosted engine FCP SAN can not activate data domain

Fred Rolland frolland at redhat.com
Tue Apr 25 12:17:52 UTC 2017


Hi,

Do you see the LUN in the host ?
Can you share pvs and lvs output ?

Thanks,

Fred

On Mon, Apr 24, 2017 at 1:05 PM, Jens Oechsler <joe at avaleo.net> wrote:

> Hello
> I have a problem with oVirt Hosted Engine Setup version:
> 4.0.5.5-1.el7.centos.
> Setup is using FCP SAN for data and engine.
> Cluster has worked fine for a while. It has two hosts with VMs running.
> I extended storage with an additional LUN recently. This LUN seems to
> be gone from data domain and one VM is paused which I assume has data
> on that device.
>
> Got these errors in events:
>
> Apr 24, 2017 10:26:05 AM
> Failed to activate Storage Domain SD (Data Center DC) by
> admin at internal-authz
> Apr 10, 2017 3:38:08 PM
> Status of host cl01 was set to Up.
> Apr 10, 2017 3:38:03 PM
> Host cl01 does not enforce SELinux. Current status: DISABLED
> Apr 10, 2017 3:37:58 PM
> Host cl01 is initializing. Message: Recovering from crash or Initializing
> Apr 10, 2017 3:37:58 PM
> VDSM cl01 command failed: Recovering from crash or Initializing
> Apr 10, 2017 3:37:46 PM
> Failed to Reconstruct Master Domain for Data Center DC.
> Apr 10, 2017 3:37:46 PM
> Host cl01 is not responding. Host cannot be fenced automatically
> because power management for the host is disabled.
> Apr 10, 2017 3:37:46 PM
> VDSM cl01 command failed: Broken pipe
> Apr 10, 2017 3:37:46 PM
> VDSM cl01 command failed: Broken pipe
> Apr 10, 2017 3:32:45 PM
> Invalid status on Data Center DC. Setting Data Center status to Non
> Responsive (On host cl01, Error: General Exception).
> Apr 10, 2017 3:32:45 PM
> VDSM cl01 command failed: [Errno 19] Could not find dm device named
> `[unknown]`
> Apr 7, 2017 1:28:04 PM
> VM HostedEngine is down with error. Exit message: resource busy:
> Failed to acquire lock: error -243.
> Apr 7, 2017 1:28:02 PM
> Storage Pool Manager runs on Host cl01 (Address: cl01).
> Apr 7, 2017 1:27:59 PM
> Invalid status on Data Center DC. Setting status to Non Responsive.
> Apr 7, 2017 1:27:53 PM
> Host cl02 does not enforce SELinux. Current status: DISABLED
> Apr 7, 2017 1:27:52 PM
> Host cl01 does not enforce SELinux. Current status: DISABLED
> Apr 7, 2017 1:27:49 PM
> Affinity Rules Enforcement Manager started.
> Apr 7, 2017 1:27:34 PM
> ETL Service Started
> Apr 7, 2017 1:26:01 PM
> ETL Service Stopped
> Apr 3, 2017 1:22:54 PM
> Shutdown of VM HostedEngine failed.
> Apr 3, 2017 1:22:52 PM
> Storage Pool Manager runs on Host cl01 (Address: cl01).
> Apr 3, 2017 1:22:49 PM
> Invalid status on Data Center DC. Setting status to Non Responsive.
>
>
> Master data domain is inactive.
>
>
> vdsm.log:
>
> jsonrpc.Executor/5::INFO::2017-04-20
> 07:01:26,796::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['ids']
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:26,796::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1
> wait_for_locks=1  use_lvmetad=
> 0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh
> bd616961-6da7-4eb0-939e-330b0a3fea6e/ids (cwd None)
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:26,880::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> WARNING: To avoid corruption, rescan devices to make changes
>  visible (pvscan --cache).\n  Couldn't find device with uuid
> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> jsonrpc.Executor/5::INFO::2017-04-20
> 07:01:26,881::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['leases']
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:26,881::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1
> wait_for_locks=1  use_lvmetad=
> 0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh
> bd616961-6da7-4eb0-939e-330b0a3fea6e/leases (cwd None)
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:26,973::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> WARNING: To avoid corruption, rescan devices to make changes
>  visible (pvscan --cache).\n  Couldn't find device with uuid
> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> jsonrpc.Executor/5::INFO::2017-04-20
> 07:01:26,973::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['metadata', 'leases',
> 'ids', 'inbox', 'outbox', 'master']
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:26,974::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1
> wait_for_locks=1  use_lvmetad=
> 0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh
> bd616961-6da7-4eb0-939e-330b0a3fea6e/metadata
> bd616961-6da7-4eb0-939e-330b0a3fea6e/leases
> bd616961-6da7-4eb0-939e-330b0a3fea6e/ids
> bd616961-6da7-4eb0-939e-330b0a3fea6e/inbox b
> d616961-6da7-4eb0-939e-330b0a3fea6e/outbox
> bd616961-6da7-4eb0-939e-330b0a3fea6e/master (cwd None)
> Reactor thread::INFO::2017-04-20
> 07:01:27,069::protocoldetector::72::ProtocolDetector.AcceptorImpl:
> :(handle_accept)
> Accepting connection from ::1:44692
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,070::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> WARNING: To avoid corruption, rescan devices to make changes
>  visible (pvscan --cache).\n  Couldn't find device with uuid
> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,070::sp::662::Storage.StoragePool::(_stopWatchingDomainsState)
> Stop watching domains state
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,070::resourceManager::628::Storage.ResourceManager:
> :(releaseResource)
> Trying to release resource
> 'Storage.58493e81-01dc-01d8-0390-000000000032'
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::647::Storage.ResourceManager:
> :(releaseResource)
> Released resource 'Storage.58493e81-01dc-01d8-0390-000000000032' (0
> active users)
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::653::Storage.ResourceManager:
> :(releaseResource)
> Resource 'Storage.58493e81-01dc-01d8-0390-000000000032' is free,
> finding out if anyone is waiting for it.
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::661::Storage.ResourceManager:
> :(releaseResource)
> No one is waiting for resource
> 'Storage.58493e81-01dc-01d8-0390-000000000032', Clearing records.
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::628::Storage.ResourceManager:
> :(releaseResource)
> Trying to release resource 'Storage.HsmDomainMonitorLock'
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::647::Storage.ResourceManager:
> :(releaseResource)
> Released resource 'Storage.HsmDomainMonitorLock' (0 active users)
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::653::Storage.ResourceManager:
> :(releaseResource)
> Resource 'Storage.HsmDomainMonitorLock' is free, finding out if anyone
> is waiting for it.
> jsonrpc.Executor/5::DEBUG::2017-04-20
> 07:01:27,071::resourceManager::661::Storage.ResourceManager:
> :(releaseResource)
> No one is waiting for resource 'Storage.HsmDomainMonitorLock',
> Clearing records.
> jsonrpc.Executor/5::ERROR::2017-04-20
> 07:01:27,072::task::868::Storage.TaskManager.Task::(_setError)
> Task=`15122a21-4fb7-45bf-9a9a-4b97f27bc1e1`::Unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 875, in _run
>     return fn(*args, **kargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
> wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 988, in connectStoragePool
>     spUUID, hostID, msdUUID, masterVersion, domainsMap)
>   File "/usr/share/vdsm/storage/hsm.py", line 1053, in _connectStoragePool
>     res = pool.connect(hostID, msdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 646, in connect
>     self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1219, in __rebuild
>     self.setMasterDomain(msdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1427, in setMasterDomain
>     domain = sdCache.produce(msdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 101, in produce
>     domain.getRealDomain()
>   File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
>     return self._cache._realProduce(self._sdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce
>     domain = self._findDomain(sdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
>     dom = findMethod(sdUUID)
>   File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
>     return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
>   File "/usr/share/vdsm/storage/blockSD.py", line 814, in __init__
>     lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))
>   File "/usr/share/vdsm/storage/lvm.py", line 1056, in checkVGBlockSizes
>     _checkpvsblksize(pvs, vgBlkSize)
>  File "/usr/share/vdsm/storage/lvm.py", line 1033, in _checkpvsblksize
>     pvBlkSize = _getpvblksize(pv)
>   File "/usr/share/vdsm/storage/lvm.py", line 1027, in _getpvblksize
>     dev = devicemapper.getDmId(os.path.basename(pv))
>   File "/usr/share/vdsm/storage/devicemapper.py", line 40, in getDmId
>     deviceMultipathName)
> OSError: [Errno 19] Could not find dm device named `[unknown]`
>
>
> Any input how to diagnose or troubleshoot would be appreciated.
>
> --
> Best Regards
>
> Jens Oechsler
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170425/efee8a7f/attachment-0001.html>


More information about the Users mailing list