[ovirt-users] Hosted engine FCP SAN can not activate data domain
Fred Rolland
frolland at redhat.com
Sun Apr 30 08:50:22 UTC 2017
Hi,
Can you provide the vdsm and engine logs ?
Thanks,
Fred
On Wed, Apr 26, 2017 at 5:30 PM, Jens Oechsler <joe at avaleo.net> wrote:
> Greetings,
>
> Is there any way to get the oVirt Data Center described below active again?
>
> On Tue, Apr 25, 2017 at 4:11 PM, Jens Oechsler <joe at avaleo.net> wrote:
> > Hi,
> >
> > LUN is not in pvs output, but I found it in lsblk output without any
> > partions on it apparently.
> >
> > $ sudo pvs
> > PV VG
> > Fmt Attr PSize PFree
> > /dev/mapper/360050768018182b6c000000000000990 data
> > lvm2 a-- 200.00g 180.00g
> > /dev/mapper/360050768018182b6c000000000000998
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc lvm2 a-- 499.62g 484.50g
> > /dev/sda2 system
> > lvm2 a-- 278.78g 208.41g
> >
> > $ sudo lvs
> > LV VG
> > Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync
> > Convert
> > 34a9328f-87fe-4190-96e9-a3580b0734fc
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 1.00g
> > 506ff043-1058-448c-bbab-5c864adb2bfc
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 10.00g
> > 65449c88-bc28-4275-bbbb-5fc75b692cbc
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 128.00m
> > e2ee95ce-8105-4a20-8e1f-9f6dfa16bf59
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-ao---- 128.00m
> > ids
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-ao---- 128.00m
> > inbox
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 128.00m
> > leases
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 2.00g
> > master
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 1.00g
> > metadata
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 512.00m
> > outbox
> > 9f10e00f-ae39-46a0-86da-8b157c6de7bc -wi-a----- 128.00m
> > data data
> > -wi-ao---- 20.00g
> > home system
> > -wi-ao---- 1000.00m
> > prod system
> > -wi-ao---- 4.88g
> > root system
> > -wi-ao---- 7.81g
> > swap system
> > -wi-ao---- 4.00g
> > swap7 system
> > -wi-ao---- 20.00g
> > tmp system
> > -wi-ao---- 4.88g
> > var system
> > -wi-ao---- 27.81g
> >
> > $ sudo lsblk
> > <output trimmed>
> > sdq
> > 65:0 0 500G 0 disk
> > └─360050768018182b6c0000000000009d7
> > 253:33 0 500G 0 mpath
> >
> > Data domain was made with one 500 GB LUN and extended with 500 GB more.
> >
> > On Tue, Apr 25, 2017 at 2:17 PM, Fred Rolland <frolland at redhat.com>
> wrote:
> >> Hi,
> >>
> >> Do you see the LUN in the host ?
> >> Can you share pvs and lvs output ?
> >>
> >> Thanks,
> >>
> >> Fred
> >>
> >> On Mon, Apr 24, 2017 at 1:05 PM, Jens Oechsler <joe at avaleo.net> wrote:
> >>>
> >>> Hello
> >>> I have a problem with oVirt Hosted Engine Setup version:
> >>> 4.0.5.5-1.el7.centos.
> >>> Setup is using FCP SAN for data and engine.
> >>> Cluster has worked fine for a while. It has two hosts with VMs running.
> >>> I extended storage with an additional LUN recently. This LUN seems to
> >>> be gone from data domain and one VM is paused which I assume has data
> >>> on that device.
> >>>
> >>> Got these errors in events:
> >>>
> >>> Apr 24, 2017 10:26:05 AM
> >>> Failed to activate Storage Domain SD (Data Center DC) by
> >>> admin at internal-authz
> >>> Apr 10, 2017 3:38:08 PM
> >>> Status of host cl01 was set to Up.
> >>> Apr 10, 2017 3:38:03 PM
> >>> Host cl01 does not enforce SELinux. Current status: DISABLED
> >>> Apr 10, 2017 3:37:58 PM
> >>> Host cl01 is initializing. Message: Recovering from crash or
> Initializing
> >>> Apr 10, 2017 3:37:58 PM
> >>> VDSM cl01 command failed: Recovering from crash or Initializing
> >>> Apr 10, 2017 3:37:46 PM
> >>> Failed to Reconstruct Master Domain for Data Center DC.
> >>> Apr 10, 2017 3:37:46 PM
> >>> Host cl01 is not responding. Host cannot be fenced automatically
> >>> because power management for the host is disabled.
> >>> Apr 10, 2017 3:37:46 PM
> >>> VDSM cl01 command failed: Broken pipe
> >>> Apr 10, 2017 3:37:46 PM
> >>> VDSM cl01 command failed: Broken pipe
> >>> Apr 10, 2017 3:32:45 PM
> >>> Invalid status on Data Center DC. Setting Data Center status to Non
> >>> Responsive (On host cl01, Error: General Exception).
> >>> Apr 10, 2017 3:32:45 PM
> >>> VDSM cl01 command failed: [Errno 19] Could not find dm device named
> >>> `[unknown]`
> >>> Apr 7, 2017 1:28:04 PM
> >>> VM HostedEngine is down with error. Exit message: resource busy:
> >>> Failed to acquire lock: error -243.
> >>> Apr 7, 2017 1:28:02 PM
> >>> Storage Pool Manager runs on Host cl01 (Address: cl01).
> >>> Apr 7, 2017 1:27:59 PM
> >>> Invalid status on Data Center DC. Setting status to Non Responsive.
> >>> Apr 7, 2017 1:27:53 PM
> >>> Host cl02 does not enforce SELinux. Current status: DISABLED
> >>> Apr 7, 2017 1:27:52 PM
> >>> Host cl01 does not enforce SELinux. Current status: DISABLED
> >>> Apr 7, 2017 1:27:49 PM
> >>> Affinity Rules Enforcement Manager started.
> >>> Apr 7, 2017 1:27:34 PM
> >>> ETL Service Started
> >>> Apr 7, 2017 1:26:01 PM
> >>> ETL Service Stopped
> >>> Apr 3, 2017 1:22:54 PM
> >>> Shutdown of VM HostedEngine failed.
> >>> Apr 3, 2017 1:22:52 PM
> >>> Storage Pool Manager runs on Host cl01 (Address: cl01).
> >>> Apr 3, 2017 1:22:49 PM
> >>> Invalid status on Data Center DC. Setting status to Non Responsive.
> >>>
> >>>
> >>> Master data domain is inactive.
> >>>
> >>>
> >>> vdsm.log:
> >>>
> >>> jsonrpc.Executor/5::INFO::2017-04-20
> >>> 07:01:26,796::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> >>> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['ids']
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:26,796::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> >>> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> >>> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> >>> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> >>> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> >>> '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1
> >>> wait_for_locks=1 use_lvmetad=
> >>> 0 } backup { retain_min = 50 retain_days = 0 } ' --refresh
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/ids (cwd None)
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:26,880::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> >>> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> >>> WARNING: To avoid corruption, rescan devices to make changes
> >>> visible (pvscan --cache).\n Couldn't find device with uuid
> >>> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> >>> jsonrpc.Executor/5::INFO::2017-04-20
> >>> 07:01:26,881::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> >>> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['leases']
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:26,881::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> >>> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> >>> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> >>> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> >>> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> >>> '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1
> >>> wait_for_locks=1 use_lvmetad=
> >>> 0 } backup { retain_min = 50 retain_days = 0 } ' --refresh
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/leases (cwd None)
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:26,973::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> >>> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> >>> WARNING: To avoid corruption, rescan devices to make changes
> >>> visible (pvscan --cache).\n Couldn't find device with uuid
> >>> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> >>> jsonrpc.Executor/5::INFO::2017-04-20
> >>> 07:01:26,973::lvm::1226::Storage.LVM::(activateLVs) Refreshing lvs:
> >>> vg=bd616961-6da7-4eb0-939e-330b0a3fea6e lvs=['metadata', 'leases',
> >>> 'ids', 'inbox', 'outbox', 'master']
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:26,974::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> >>> --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvchange --config '
> >>> devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_d
> >>> evices=1 write_cache_state=0 disable_after_error_count=3 filter = [
> >>> '\''a|/dev/mapper/360050768018182b6c00000000000099e|[unknown]|'\'',
> >>> '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1
> >>> wait_for_locks=1 use_lvmetad=
> >>> 0 } backup { retain_min = 50 retain_days = 0 } ' --refresh
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/metadata
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/leases
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/ids
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/inbox b
> >>> d616961-6da7-4eb0-939e-330b0a3fea6e/outbox
> >>> bd616961-6da7-4eb0-939e-330b0a3fea6e/master (cwd None)
> >>> Reactor thread::INFO::2017-04-20
> >>>
> >>> 07:01:27,069::protocoldetector::72::ProtocolDetector.AcceptorImpl:
> :(handle_accept)
> >>> Accepting connection from ::1:44692
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:27,070::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = "
> >>> WARNING: Not using lvmetad because config setting use_lvmetad=0.\n
> >>> WARNING: To avoid corruption, rescan devices to make changes
> >>> visible (pvscan --cache).\n Couldn't find device with uuid
> >>> jDB9VW-bNqY-UIKc-XxXp-xnyK-ZTlt-7Cpa1U.\n"; <rc> = 0
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>> 07:01:27,070::sp::662::Storage.StoragePool::(_
> stopWatchingDomainsState)
> >>> Stop watching domains state
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,070::resourceManager::628::Storage.ResourceManager:
> :(releaseResource)
> >>> Trying to release resource
> >>> 'Storage.58493e81-01dc-01d8-0390-000000000032'
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::647::Storage.ResourceManager:
> :(releaseResource)
> >>> Released resource 'Storage.58493e81-01dc-01d8-0390-000000000032' (0
> >>> active users)
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::653::Storage.ResourceManager:
> :(releaseResource)
> >>> Resource 'Storage.58493e81-01dc-01d8-0390-000000000032' is free,
> >>> finding out if anyone is waiting for it.
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::661::Storage.ResourceManager:
> :(releaseResource)
> >>> No one is waiting for resource
> >>> 'Storage.58493e81-01dc-01d8-0390-000000000032', Clearing records.
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::628::Storage.ResourceManager:
> :(releaseResource)
> >>> Trying to release resource 'Storage.HsmDomainMonitorLock'
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::647::Storage.ResourceManager:
> :(releaseResource)
> >>> Released resource 'Storage.HsmDomainMonitorLock' (0 active users)
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::653::Storage.ResourceManager:
> :(releaseResource)
> >>> Resource 'Storage.HsmDomainMonitorLock' is free, finding out if anyone
> >>> is waiting for it.
> >>> jsonrpc.Executor/5::DEBUG::2017-04-20
> >>>
> >>> 07:01:27,071::resourceManager::661::Storage.ResourceManager:
> :(releaseResource)
> >>> No one is waiting for resource 'Storage.HsmDomainMonitorLock',
> >>> Clearing records.
> >>> jsonrpc.Executor/5::ERROR::2017-04-20
> >>> 07:01:27,072::task::868::Storage.TaskManager.Task::(_setError)
> >>> Task=`15122a21-4fb7-45bf-9a9a-4b97f27bc1e1`::Unexpected error
> >>> Traceback (most recent call last):
> >>> File "/usr/share/vdsm/storage/task.py", line 875, in _run
> >>> return fn(*args, **kargs)
> >>> File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50,
> in
> >>> wrapper
> >>> res = f(*args, **kwargs)
> >>> File "/usr/share/vdsm/storage/hsm.py", line 988, in
> connectStoragePool
> >>> spUUID, hostID, msdUUID, masterVersion, domainsMap)
> >>> File "/usr/share/vdsm/storage/hsm.py", line 1053, in
> _connectStoragePool
> >>> res = pool.connect(hostID, msdUUID, masterVersion)
> >>> File "/usr/share/vdsm/storage/sp.py", line 646, in connect
> >>> self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
> >>> File "/usr/share/vdsm/storage/sp.py", line 1219, in __rebuild
> >>> self.setMasterDomain(msdUUID, masterVersion)
> >>> File "/usr/share/vdsm/storage/sp.py", line 1427, in setMasterDomain
> >>> domain = sdCache.produce(msdUUID)
> >>> File "/usr/share/vdsm/storage/sdc.py", line 101, in produce
> >>> domain.getRealDomain()
> >>> File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
> >>> return self._cache._realProduce(self._sdUUID)
> >>> File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce
> >>> domain = self._findDomain(sdUUID)
> >>> File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
> >>> dom = findMethod(sdUUID)
> >>> File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
> >>> return BlockStorageDomain(BlockStorageDomain.
> findDomainPath(sdUUID))
> >>> File "/usr/share/vdsm/storage/blockSD.py", line 814, in __init__
> >>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))
> >>> File "/usr/share/vdsm/storage/lvm.py", line 1056, in
> checkVGBlockSizes
> >>> _checkpvsblksize(pvs, vgBlkSize)
> >>> File "/usr/share/vdsm/storage/lvm.py", line 1033, in _checkpvsblksize
> >>> pvBlkSize = _getpvblksize(pv)
> >>> File "/usr/share/vdsm/storage/lvm.py", line 1027, in _getpvblksize
> >>> dev = devicemapper.getDmId(os.path.basename(pv))
> >>> File "/usr/share/vdsm/storage/devicemapper.py", line 40, in getDmId
> >>> deviceMultipathName)
> >>> OSError: [Errno 19] Could not find dm device named `[unknown]`
> >>>
> >>>
> >>> Any input how to diagnose or troubleshoot would be appreciated.
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jens Oechsler
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >>
> >
> >
> >
> > --
> > Med Venlig Hilsen / Best Regards
> >
> > Jens Oechsler
> > System administrator
> > KMD Nexus
> > +45 51 82 62 13
>
>
>
> --
> Med Venlig Hilsen / Best Regards
>
> Jens Oechsler
> System administrator
> KMD Nexus
> +45 51 82 62 13
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170430/1214f84f/attachment-0001.html>
More information about the Users
mailing list