[ovirt-users] One host failed to attach one of Storage Domains after reboot of all hosts
Giuseppe Berellini
Giuseppe.Berellini at ptvgroup.com
Thu Feb 25 11:10:17 UTC 2016
Hi,
At the beginning of february I successfully installed oVirt 3.6.2 (with hosted engine) on 3 hosts, which are using 1 storage server with GlusterFS.
2 hosts (with Intel CPU) are using HA and are hosting the engine; the 3rd host (AMD CPU) was added as host from oVirt web administration panel, without hosted engine deployment (I don't want the engine running on this host).
About 10 days ago I tried to reboot my oVirt environment (i.e. going to global maintenance, shutting down the engine, turning off all the hosts, starting them again, then setting maintenance mode to "none").
After the reboot, everything was fine with the Intel hosts and the hosted engine, but AMD host (the one without HA) was not operational.
I tryed to activate it, buti t failed with the following error:
"Host failed to attach one of Storage Domains attached to it."
If I log into my AMD host and I check the logs, I see that the storage domain which is not mounted is the one of the hosted engine (but this could be correct, since this host won't run the hosted engine).
>From /var/log/vdsm/vdsm.log:
Thread-29::DEBUG::2016-02-25 11:44:01,157::monitor::322::Storage.Monitor::(_produceDomain) Producing domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 11:44:01,158::sdc::139::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 11:44:01,158::sdc::156::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-63 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd (cwd None)
Thread-29::DEBUG::2016-02-25 11:44:01,223::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found\n Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd\n'; <rc> = 5
Thread-29::WARNING::2016-02-25 11:44:01,225::lvm::375::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', ' Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found', ' Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd']
Thread-29::DEBUG::2016-02-25 11:44:01,225::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-29::ERROR::2016-02-25 11:44:01,245::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd not found
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
Thread-29::ERROR::2016-02-25 11:44:01,246::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 264, in _monitorDomain
self._produceDomain()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 767, in wrapper
value = meth(self, *a, **kw)
File "/usr/share/vdsm/storage/monitor.py", line 323, in _produceDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 100, in produce
domain.getRealDomain()
File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,292::task::595::Storage.TaskManager.Task::(_updateState) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state init -> state preparing
jsonrpc.Executor/0::INFO::2016-02-25 11:44:03,293::logUtils::48::dispatcher::(wrapper) Run and protect: repoStats(options=None)
jsonrpc.Executor/0::INFO::2016-02-25 11:44:03,293::logUtils::51::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865', 'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::task::1191::Storage.TaskManager.Task::(prepare) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::finished: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865', 'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::task::595::Storage.TaskManager.Task::(_updateState) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state preparing -> state finished
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,295::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,295::task::993::Storage.TaskManager.Task::(_decref) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::ref 0 aborting False
Thread-30::DEBUG::2016-02-25 11:44:04,603::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie/e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-30::DEBUG::2016-02-25 11:44:04,630::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n336 bytes (336 B) copied, 0.000286148 s, 1.2 MB/s\n'; <rc> = 0
Thread-31::DEBUG::2016-02-25 11:44:04,925::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-31::DEBUG::2016-02-25 11:44:04,950::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n339 bytes (339 B) copied, 0.0005884 s, 576 kB/s\n'; <rc> = 0
Thread-28::DEBUG::2016-02-25 11:44:05,583::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines/5f7991ba-fdf8-4b40-9974-c7adcd4da879/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-28::DEBUG::2016-02-25 11:44:05,606::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n482 bytes (482 B) copied, 0.000637557 s, 756 kB/s\n'; <rc> = 0
Other commands (executed on the host having the problem) which probably give useful information:
# vdsClient -s 0 getConnectedStoragePoolsList
00000001-0001-0001-0001-00000000020e
# vdsClient -s 0 getStoragePoolInfo 00000001-0001-0001-0001-00000000020e
name = No Description
isoprefix = /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111
pool_status = connected
lver = 6
spm_id = 2
master_uuid = 5f7991ba-fdf8-4b40-9974-c7adcd4da879
version = 3
domains = 5f7991ba-fdf8-4b40-9974-c7adcd4da879:Active,6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd:Active,5efea9c7-c4ec-44d4-a283-060d4c83303c:Active,e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a:Active
type = GLUSTERFS
master_ver = 1
5f7991ba-fdf8-4b40-9974-c7adcd4da879 = {'status': 'Active', 'diskfree': '6374172262400', 'isoprefix': '', 'alerts': [], 'disktotal': '6995436371968', 'version': 3}
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd = {'status': 'Active', 'isoprefix': '', 'alerts': [], 'version': -1}
e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a = {'status': 'Active', 'diskfree': '224145833984', 'isoprefix': '', 'alerts': [], 'disktotal': '236317179904', 'version': 3}
5efea9c7-c4ec-44d4-a283-060d4c83303c = {'status': 'Active', 'diskfree': '6374172262400', 'isoprefix': '/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111', 'alerts': [], 'disktotal': '6995436371968', 'version': 0}
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Storage domain does not exist: ('6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
If I run this last command on one of the working hosts:
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
uuid = 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
version = 3
role = Regular
remotePath = srv-stor-01-ib0:/ovirtengine
type = GLUSTERFS
class = Data
pool = ['00000001-0001-0001-0001-00000000020e']
name = hosted_storage
(please note: this is the storage domain used for my hosted engine)
If I run "mount" on my AMD host (the one with the problem):
# mount
...
srv-stor-01-ib0:/virtualmachines on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=13185796k,mode=700)
If I run "mount" on one of the Intel hosts (currently working):
# mount
...
srv-stor-01-ib0:/ovirtengine on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ovirtengine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/virtualmachines on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3272288k,mode=700)
The only difference in "mount" is that the hosted-engine storage domain is not loaded on the host which should not run the engine. The other domains are mounted correctly.
What could I do to solve this issue?
Best regards,
Giuseppe
--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160225/5e0aa592/attachment-0001.html>
More information about the Users
mailing list