[ovirt-users] R: One host failed to attach one of Storage Domains after reboot of all hosts

Giuseppe Berellini Giuseppe.Berellini at ptvgroup.com
Thu Feb 25 08:02:56 EST 2016


Hi,

about one hour ago my AMD host came back up, after more than 10 days being down.
Apart from checking the logs (which I suppose didn't help in solving my problem), I enabled NFS share on my ISO domain.

I'm still not able to understand how that could help.
I would be really happy to better understand what happened! :-)
If anyone has ideas/explanations to share, you are welcome! :-)

Best regards,
        Giuseppe

--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->

Da: users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] Per conto di Giuseppe Berellini
Inviato: giovedì 25 febbraio 2016 12:10
A: users at ovirt.org
Oggetto: [ovirt-users] One host failed to attach one of Storage Domains after reboot of all hosts

Hi,

At the beginning of february I successfully installed oVirt 3.6.2 (with hosted engine) on 3 hosts, which are using 1 storage server with GlusterFS.
2 hosts (with Intel CPU) are using HA and are hosting the engine; the 3rd host (AMD CPU) was added as host from oVirt web administration panel, without hosted engine deployment (I don't want the engine running on this host).

About 10 days ago I tried to reboot my oVirt environment (i.e. going to global maintenance, shutting down the engine, turning off all the hosts, starting them again, then setting maintenance mode to "none").
After the reboot, everything was fine with the Intel hosts and the hosted engine, but AMD host (the one without HA) was not operational.
I tryed to activate it, buti t failed with the following error:
        "Host failed to attach one of Storage Domains attached to it."

If I log into my AMD host and I check the logs, I see that the storage domain which is not mounted is the one of the hosted engine (but this could be correct, since this host won't run the hosted engine).

>From /var/log/vdsm/vdsm.log:

Thread-29::DEBUG::2016-02-25 11:44:01,157::monitor::322::Storage.Monitor::(_produceDomain) Producing domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 11:44:01,158::sdc::139::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 11:44:01,158::sdc::156::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-63 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd (cwd None)
Thread-29::DEBUG::2016-02-25 11:44:01,223::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found\n  Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd\n'; <rc> = 5
Thread-29::WARNING::2016-02-25 11:44:01,225::lvm::375::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', '  Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found', '  Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd']
Thread-29::DEBUG::2016-02-25 11:44:01,225::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-29::ERROR::2016-02-25 11:44:01,245::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
Thread-29::ERROR::2016-02-25 11:44:01,246::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 264, in _monitorDomain
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 767, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 323, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 100, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,292::task::595::Storage.TaskManager.Task::(_updateState) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state init -> state preparing
jsonrpc.Executor/0::INFO::2016-02-25 11:44:03,293::logUtils::48::dispatcher::(wrapper) Run and protect: repoStats(options=None)
jsonrpc.Executor/0::INFO::2016-02-25 11:44:03,293::logUtils::51::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865', 'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::task::1191::Storage.TaskManager.Task::(prepare) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::finished: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865', 'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::task::595::Storage.TaskManager.Task::(_updateState) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state preparing -> state finished
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,294::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,295::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
jsonrpc.Executor/0::DEBUG::2016-02-25 11:44:03,295::task::993::Storage.TaskManager.Task::(_decref) Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::ref 0 aborting False
Thread-30::DEBUG::2016-02-25 11:44:04,603::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie/e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-30::DEBUG::2016-02-25 11:44:04,630::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n336 bytes (336 B) copied, 0.000286148 s, 1.2 MB/s\n'; <rc> = 0
Thread-31::DEBUG::2016-02-25 11:44:04,925::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-31::DEBUG::2016-02-25 11:44:04,950::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n339 bytes (339 B) copied, 0.0005884 s, 576 kB/s\n'; <rc> = 0
Thread-28::DEBUG::2016-02-25 11:44:05,583::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset --cpu-list 0-63 /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines/5f7991ba-fdf8-4b40-9974-c7adcd4da879/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-28::DEBUG::2016-02-25 11:44:05,606::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n482 bytes (482 B) copied, 0.000637557 s, 756 kB/s\n'; <rc> = 0



Other commands (executed on the host having the problem) which probably give useful information:
# vdsClient -s 0 getConnectedStoragePoolsList
00000001-0001-0001-0001-00000000020e

# vdsClient -s 0 getStoragePoolInfo 00000001-0001-0001-0001-00000000020e
        name = No Description
        isoprefix = /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111
        pool_status = connected
        lver = 6
        spm_id = 2
        master_uuid = 5f7991ba-fdf8-4b40-9974-c7adcd4da879
        version = 3
        domains = 5f7991ba-fdf8-4b40-9974-c7adcd4da879:Active,6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd:Active,5efea9c7-c4ec-44d4-a283-060d4c83303c:Active,e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a:Active
        type = GLUSTERFS
        master_ver = 1
        5f7991ba-fdf8-4b40-9974-c7adcd4da879 = {'status': 'Active', 'diskfree': '6374172262400', 'isoprefix': '', 'alerts': [], 'disktotal': '6995436371968', 'version': 3}
        6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd = {'status': 'Active', 'isoprefix': '', 'alerts': [], 'version': -1}
        e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a = {'status': 'Active', 'diskfree': '224145833984', 'isoprefix': '', 'alerts': [], 'disktotal': '236317179904', 'version': 3}
        5efea9c7-c4ec-44d4-a283-060d4c83303c = {'status': 'Active', 'diskfree': '6374172262400', 'isoprefix': '/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111', 'alerts': [], 'disktotal': '6995436371968', 'version': 0}

# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Storage domain does not exist: ('6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)

If I run this last command on one of the working hosts:
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
        uuid = 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
        version = 3
        role = Regular
        remotePath = srv-stor-01-ib0:/ovirtengine
        type = GLUSTERFS
        class = Data
        pool = ['00000001-0001-0001-0001-00000000020e']
        name = hosted_storage

(please note: this is the storage domain used for my hosted engine)



If I run "mount" on my AMD host (the one with the problem):
# mount
...
srv-stor-01-ib0:/virtualmachines on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=13185796k,mode=700)

If I run "mount" on one of the Intel hosts (currently working):
# mount
...
srv-stor-01-ib0:/ovirtengine on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ovirtengine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/virtualmachines on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on /rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3272288k,mode=700)

The only difference in "mount" is that the hosted-engine storage domain is not loaded on the host which should not run the engine. The other domains are mounted correctly.

What could I do to solve this issue?

Best regards,
        Giuseppe


--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160225/18cc2d8e/attachment-0001.html>


More information about the Users mailing list