[ovirt-users] Storage domain issue

Nathanaël Blanchet blanchet at abes.fr
Mon Mar 23 16:13:40 UTC 2015


Thnak you for reporting this issue because, I met exactly the same : FC 
storage domain and sometimes, many of my hosts (15 ) become sometimes 
unavailable without any apparent action on them.
The issue message is : storage domain is unvailable. So it is a desaster 
when power management is activated because hosts reboot at the same time 
and all VMs go down without migrating.
It happened to me two times, and the second time it was less a pity 
because I desactivated the power management.
It may be a serious issue because host stay reacheable and lun is still 
okay when doing a lvs command.
The workaround in this case is to restart the engine (restarting vdsm 
gives nothing) and then, all the hosts come up.

  * el6 engine on a separate KVM
  * implied el7 and el6 hosts
  * ovirt 3.5.1 and vdsm 4.16.10-8
  * 2 FC datacenter on two remote sites with the same engine and both
    are impacted


Le 23/03/2015 16:54, Jonas Israelsson a écrit :
> Greetings.
>
> Running oVirt 3.5 with a mix of NFS and FC Storage.
>
> Engine running on a seperate KVM VM and Node installed with a pre 3.5 
> ovirt-node "ovirt-node-iso-3.5.0.ovirt35.20140912.el6 (Edited)"
>
> I had some problems with my FC-Storage where the LUNS for a while 
> became unavailable to my Ovirt-host. Everything is now up and running 
> and those luns again are accessible by the host. The NFS domains goes 
> back online but the FC does not.
>
> Thread-22::DEBUG::2015-03-23 
> 14:53:02,706::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n 
> /sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] 
> ignore_suspended_devices=1 write_cache_state=0 
> disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 
> '\''r|.*|'\'' ] }  global {  locking_type=1 prioritise_write_locks=1  
> wait_for_locks=1  use_lvmetad=0 } backup {  retain_min = 50  
> retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' 
> --ignoreskippedcluster -o 
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 
> 29f9b165-3674-4384-a1d4-7aa87d923d56 (cwd None)
>
> Thread-24::DEBUG::2015-03-23 
> 14:53:02,981::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  
> Volume group "29f9b165-3674-4384-a1d4-7aa87d923d56" not found\n  
> Skipping volume group 29f9b165-3674-4384-a1d4-7aa87d923d56\n'; <rc> = 5
>
> Thread-24::WARNING::2015-03-23 
> 14:53:02,986::lvm::372::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] 
> ['  Volume group "29f9b165-3674-4384-a1d4-7aa87d923d56" not found', '  
> Skipping volume group 29f9b165-3674-4384-a1d4-7aa87d923d56']
>
>
> Running the command above manually does indeed give the same output:
>
> # /sbin/lvm vgs --config ' devices { preferred_names = 
> ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 
> disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 
> '\''r|.*|'\'' ] }  global {  locking_type=1 prioritise_write_locks=1  
> wait_for_locks=1  use_lvmetad=0 } backup {  retain_min = 50  
> retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' 
> --ignoreskippedcluster -o 
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 
> 29f9b165-3674-4384-a1d4-7aa87d923d56
>
>   Volume group "29f9b165-3674-4384-a1d4-7aa87d923d56" not found
>   Skipping volume group 29f9b165-3674-4384-a1d4-7aa87d923d56
>
> What puzzles me is that those volume does exist.
>
> lvm vgs
>   VG                                   #PV #LV #SN Attr   VSize VFree
>   22cf06d1-faca-4e17-ac78-d38b7fc300b1   1  13   0 wz--n- 999.62g 986.50g
>   29f9b165-3674-4384-a1d4-7aa87d923d56   1   8   0 wz--n-  99.62g 95.50g
>   HostVG                                 1   4   0 wz--n-  13.77g 52.00m
>
>
>   --- Volume group ---
>   VG Name               29f9b165-3674-4384-a1d4-7aa87d923d56
>   System ID
>   Format                lvm2
>   Metadata Areas        2
>   Metadata Sequence No  20
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                8
>   Open LV               0
>   Max PV                0
>   Cur PV                1
>   Act PV                1
>   VG Size               99.62 GiB
>   PE Size               128.00 MiB
>   Total PE              797
>   Alloc PE / Size       33 / 4.12 GiB
>   Free  PE / Size       764 / 95.50 GiB
>   VG UUID               aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk
>
> lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] 
> ignore_suspended_devices=1 write_cache_state=0 
> disable_after_error_count=3 obtain_device_list_from_udev=0 } global {  
> locking_type=1  prioritise_write_locks=1 wait_for_locks=1 
> use_lvmetad=0 }  backup {  retain_min = 50 retain_days = 0 } ' 
> --noheadings --units b --nosuffix --separator '|' 
> --ignoreskippedcluster -o 
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 
> 29f9b165-3674-4384-a1d4-7aa87d923d56
>
>
> aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk|29f9b165-3674-4384-a1d4-7aa87d923d56|wz--n-|106971529216|102542344192|134217728|797|764|MDT_LEASETIMESEC=60,MDT_CLASS=Data,MDT_VERSION=3,MDT_SDUUID=29f9b165-3674-4384-a1d4-7aa87d923d56,MDT_PV0=pv:36001405c94d80be2ed0482c91a1841b8&44&uuid:muHcYl-sobG-3LyY-jjfg-3fGf-1cHO-uDk7da&44&pestart:0&44&pecount:797&44&mapoffset:0,MDT_LEASERETRIES=3,MDT_VGUUID=aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk,MDT_IOOPTIMEOUTSEC=10,MDT_LOCKRENEWALINTERVALSEC=5,MDT_PHYBLKSIZE=512,MDT_LOGBLKSIZE=512,MDT_TYPE=FCP,MDT_LOCKPOLICY=,MDT_DESCRIPTION=Master,RHAT_storage_domain,MDT_POOL_SPM_ID=-1,MDT_POOL_DESCRIPTION=Elementary,MDT_POOL_SPM_LVER=-1,MDT_POOL_UUID=8c3c5df9-e8ff-4313-99c9-385b6c7d896b,MDT_MASTER_VERSION=10,MDT_POOL_DOMAINS=22cf06d1-faca-4e17-ac78-d38b7fc300b1:Active&44&c434ab5a-9d21-42eb-ba1b-dbd716ba3ed1:Active&44&96e62d18-652d-401a-b4b5-b54ecefa331c:Active&44&29f9b165-3674-4384-a1d4-7aa87d923d56:Active&44&1a0d3e5a-d2ad-4829-8ebd-ad3ff5463062:Active,MDT__SH 
>
> A_CKSUM=7ea9af890755d96563cb7a736f8e3f46ea986f67,MDT_ROLE=Regular|134217728|67103744|8|1|/dev/sda 
>
>
>
> [root at patty vdsm]# vdsClient -s 0 getStorageDomainsList (Returns all 
> but only the NFS-Domains)
> c434ab5a-9d21-42eb-ba1b-dbd716ba3ed1
> 1a0d3e5a-d2ad-4829-8ebd-ad3ff5463062
> a8fd9df0-48f2-40a2-88d4-7bf47fef9b07
>
>
> engine=# select id,storage,storage_name,storage_domain_type from 
> storage_domain_static ;
>                   id                  | storage |      
> storage_name      | storage_domain_type
> --------------------------------------+----------------------------------------+------------------------+--------------------- 
>
>  072fbaa1-08f3-4a40-9f34-a5ca22dd1d74 | 
> ceab03af-7220-4d42-8f5c-9b557f5d29af   | ovirt-image-repository |      4
>  1a0d3e5a-d2ad-4829-8ebd-ad3ff5463062 | 
> 6564a0b2-2f92-48de-b986-e92de7e28885   | ISO |              2
>  c434ab5a-9d21-42eb-ba1b-dbd716ba3ed1 | 
> bb54b2b8-00a2-4b84-a886-d76dd70c3cb0   | Export |         3
>  22cf06d1-faca-4e17-ac78-d38b7fc300b1 | 
> e43eRZ-HACv-YscJ-KNZh-HVwe-tAd2-0oGNHh | Hinken | 1          <---- 'GONE'
>  29f9b165-3674-4384-a1d4-7aa87d923d56 | 
> aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk | Master |      1 <---- 'GONE'
>  a8fd9df0-48f2-40a2-88d4-7bf47fef9b07 | 
> 0299ca61-d68e-4282-b6c3-f6e14aef2688   | NFS-DATA |            0
>
> When manually trying to activate one of the above domains the 
> following is written to the engine.log
>
> 2015-03-23 16:37:27,193 INFO 
> [org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand] 
> (org.ovirt.thread.pool-8-thread-42) [5f2bcbf9] Running command: 
> SyncLunsInfoForBlockStorageDomainCommand internal: true. Entities 
> affected :  ID: 29f9b165-3674-4384-a1d4-7aa87d923d56 Type: Storage
> 2015-03-23 16:37:27,202 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-42) [5f2bcbf9] START, 
> GetVGInfoVDSCommand(HostName = patty.elemementary.se, HostId = 
> 38792a69-76f3-46d8-8620-9d4b9a5ec21f, 
> VGID=aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk), log id: 6e6f6792
> 2015-03-23 16:37:27,404 ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] Failed in GetVGInfoVDS 
> method
> 2015-03-23 16:37:27,404 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] Command 
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand return 
> value
>
> OneVGReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=506, 
> mMessage=Volume Group does not exist: (u'vg_uuid: 
> aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk',)]]
>
> 2015-03-23 16:37:27,406 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] HostName = 
> patty.elemementary.se
> 2015-03-23 16:37:27,407 ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] Command 
> GetVGInfoVDSCommand(HostName = patty.elemementary.se, HostId = 
> 38792a69-76f3-46d8-8620-9d4b9a5ec21f, 
> VGID=aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk) execution failed. 
> Exception: VDSErrorException: VDSGenericException: VDSErrorException: 
> Failed to GetVGInfoVDS, error = Volume Group does not exist: 
> (u'vg_uuid: aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk',), code = 506
> 2015-03-23 16:37:27,409 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] FINISH, 
> GetVGInfoVDSCommand, log id: 2edb7c0d
> 2015-03-23 16:37:27,410 ERROR 
> [org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] Command 
> org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand 
> throw Vdc Bll exception. With error message VdcBLLException: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: 
> VDSGenericException: VDSErrorException: Failed to GetVGInfoVDS, error 
> = Volume Group does not exist: (u'vg_uuid: 
> aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk',), code = 506 (Failed with 
> error VolumeGroupDoesNotExist and code 506)
> 2015-03-23 16:37:27,413 INFO 
> [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] 
> (org.ovirt.thread.pool-8-thread-28) [3258de6d] START, 
> ActivateStorageDomainVDSCommand( storagePoolId = 
> 8c3c5df9-e8ff-4313-99c9-385b6c7d896b, ignoreFailoverLimit = false, 
> storageDomainId = 29f9b165-3674-4384-a1d4-7aa87d923d56), log id: 795253ee
> 2015-03-23 16:37:27,482 ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-42) [5f2bcbf9] Failed in GetVGInfoVDS 
> method
> 2015-03-23 16:37:27,482 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] 
> (org.ovirt.thread.pool-8-thread-42) [5f2bcbf9] Command 
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand return 
> value
> OneVGReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=506, 
> mMessage=Volume Group does not exist: (u'vg_uuid: 
> aAoOcw-d9YB-y9gP-Tp4M-S0UE-Aqpx-y6Z2Uk',)]]
>
>
> Could someone (pretty please with sugar on top) point me in the right 
> direction ?
>
> Brgds Jonas
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150323/fc1e7aa2/attachment-0001.html>


More information about the Users mailing list