[ovirt-users] Hosts lost the communication with FC Storage Domains

Anantha Raghava raghav at exzatechconsulting.com
Fri Dec 9 05:07:21 UTC 2016


Hello Nir,

Thanks for anlysing the log.

As we upgraded to 4.0.5 and also re-installed one of the Hosts afresh. 
Removed all other hosts from cluster. We created a new LUN and added it 
as new master domain. Now when we try to import the other existing 
domains, no storage LUN is visible. May be it is because, the volumes 
are having NTFS format instead of LVM2, Partition Table replaced with 
NTFS table and OVF_STORE is deleted.

Same Chassis had a blade with Windows Server 2016 (Bare Metal OS) as 
well. Now doing a postmortem as to how did these volumes convert to NTFS 
all of a sudden at 4:58 PM IST on 3rd December 2016. (This timing is 
extracted from storage logs). Has someone added these LUNs to Windows 
Server and formatted ignoring the warning!!! However, now that nothing 
else can be done, we tried to add the same existing LUN as new Domain, 
it fails to add. It will not format the LUN as needed to add it as 
storage domain. Wondering as to why is this?

Even if some one added the existing LUNs ignoring warning, without 
realising danger, how the running VMs continued to work till we shut 
them down? I can understand the refusal to start a shutdown VM or 
failure to migrate from one to another host. What happens to data in 
those running VMs?

++looping Yaniv Kaul

-- 

Thanks & Regards,


Anantha Raghava

eXza Technology Consulting & Services


Do not print this e-mail unless required. Save Paper & trees.

On Friday 09 December 2016 04:08 AM, Nir Soffer wrote:
> On Wed, Dec 7, 2016 at 2:00 PM, Anantha Raghava
> <raghav at exzatechconsulting.com> wrote:
>> Hello,
>>
>> No luck with this? Awaiting urgent response. Also attached the vdsm and
>> supervdsm logs from one of the hosts.
>>
>> Please provide guidance to solve this issue.
>>
>> --
>>
>> Thanks & Regards,
>>
>>
>> Anantha Raghava eXza Technology Consulting & Services Ph: +91-9538849179,
>> E-mail: raghav at exzatechconsulting.com
>>
>> Do not print this e-mail unless required. Save Paper & trees.
>>
>> On Monday 05 December 2016 11:16 AM, Anantha Raghava wrote:
>>
>> Hi,
>>
>> We have a single cluster with 6 Nodes in a single DC and added 4 FC Storage
>> domains. All the while it was working fine, migrations, creation of new VMs
>> everything were working fine. Now, all of a sudden we see the error message
>> "vdsm is unable to communicate with Master domain ......." and all storage
>> domains, including DC are down. But all Hosts are up, all VMs are running
>> without any issues. But migrations stopped, we cannot create new VMs, we
>> cannot start a shutdown VM.
>>
>> Can someone help us trouble shoot the issue?
> According to your log, vdsm cannot access the master domain:
>
> Thread-35::ERROR::2016-12-07
> 17:18:10,354::sdc::146::Storage.StorageDomainCache::(_findDomain)
> domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3 not found
> Traceback (most recent call last):
>    File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
>      dom = findMethod(sdUUID)
>    File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
>      return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
>    File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
>      raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
> Thread-35::ERROR::2016-12-07
> 17:18:10,354::monitor::425::Storage.Monitor::(_checkDomainStatus)
> Error checking domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
> Traceback (most recent call last):
>    File "/usr/share/vdsm/storage/monitor.py", line 406, in _checkDomainStatus
>      self.domain.selftest()
>    File "/usr/share/vdsm/storage/sdc.py", line 50, in __getattr__
>      return getattr(self.getRealDomain(), attrName)
>    File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
>      return self._cache._realProduce(self._sdUUID)
>    File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce
>      domain = self._findDomain(sdUUID)
>    File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
>      dom = findMethod(sdUUID)
>    File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
>      return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
>    File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
>      raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
>
> Thread-35::DEBUG::2016-12-07
> 17:18:10,279::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-31 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices
> { preferred_names = ["^
> /dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3 filter = [
> '\''a|/dev/mapper/36005076300808e51e80000000000002c|/dev/mapper/36005076300808e51e8000000
> 0000002d|/dev/mapper/36005076300808e51e80000000000002e|/dev/mapper/36005076300808e51e80000000000002f|/dev/mapper/36005076300808e51e800000000000030|/dev/mapper/36005076300808e51e8000000000000
> 31|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1
> prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {
> retain_min = 50  retain_days = 0 } ' --noheadings --units b --nos
> uffix --separator '|' --ignoreskippedcluster -o
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> 6d25efc2-b056-4c43-9a82-82
> f0c8a5ebc3 (cwd None)
> Thread-35::DEBUG::2016-12-07
> 17:18:10,351::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '
> WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!\n  Volume gro
> up "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found\n  Cannot process
> volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3\n'; <rc> = 5
> Thread-35::WARNING::2016-12-07
> 17:18:10,354::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 []
> ['  WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!', '  V
> olume group "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found', '
> Cannot process volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3']
>
> But the monitoring system is accessing this domain just fine:
>
> Thread-12::DEBUG::2016-12-07
> 17:18:13,048::check::296::storage.check::(_start_process) START check
> '/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata'
> cmd=['/usr/bin/taskset', '--cpu-list',
>   '0-31', '/usr/bin/dd',
> 'if=/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata',
> 'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct'] delay=0.00
> Thread-12::DEBUG::2016-12-07
> 17:18:13,069::check::327::storage.check::(_check_completed) FINISH
> check '/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata' rc=0
> err=bytearray(b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB)
> copied, 0.000367523 s, 11.1 MB/s\n') elapsed=0.02
>
> I suggest to file a bug about this.
>
> I would try to restart vdsm, maybe there is some issue with vdsm lvm cache.
>
> It can also be useful to see the output of:
>
> pvscan --cache
> vgs -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
> vgs -o name,pv_name -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
>
>> By the way, we are running oVirt Version 4.0.1.
> Running 4.0.1 not a good idea, you should upgrade to latest version.
>
> Cheers,
> Nir
>
>> --
>>
>> Thanks & Regards,
>>
>>
>> Anantha Raghava eXza Technology Consulting & Services
>>
>> Do not print this e-mail unless required. Save Paper & trees.
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161209/e895fb77/attachment-0001.html>


More information about the Users mailing list