This is a multi-part message in MIME format.
--------------A5835634A2D475CAD4EE784D
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Hello Nir,
LVM backup & restore...risky, but have no options. Let's try this. VMs
contain some sensitive data that cannot exist outside these VMs. Even VM
snapshots are on the same storage. But we did not take a Volume Snapshot
at the storage level :-(
We will make an attempt & post the results.
--
Thanks & Regards,
Anantha Raghava
eXza Technology Consulting & Services
Do not print this e-mail unless required. Save Paper & trees.
On Saturday 10 December 2016 12:40 AM, Nir Soffer wrote:
On Fri, Dec 9, 2016 at 7:07 AM, Anantha Raghava
<raghav(a)exzatechconsulting.com> wrote:
> Hello Nir,
>
> Thanks for anlysing the log.
>
> As we upgraded to 4.0.5 and also re-installed one of the Hosts afresh.
> Removed all other hosts from cluster. We created a new LUN and added it as
> new master domain. Now when we try to import the other existing domains, no
> storage LUN is visible. May be it is because, the volumes are having NTFS
> format instead of LVM2, Partition Table replaced with NTFS table and
> OVF_STORE is deleted.
:-)
> Same Chassis had a blade with Windows Server 2016 (Bare Metal OS) as well.
> Now doing a postmortem as to how did these volumes convert to NTFS all of a
> sudden at 4:58 PM IST on 3rd December 2016. (This timing is extracted from
> storage logs). Has someone added these LUNs to Windows Server and formatted
> ignoring the warning!!! However, now that nothing else can be done, we tried
> to add the same existing LUN as new Domain, it fails to add. It will not
> format the LUN as needed to add it as storage domain. Wondering as to why is
> this?
>
> Even if some one added the existing LUNs ignoring warning, without realising
> danger, how the running VMs continued to work till we shut them down? I can
> understand the refusal to start a shutdown VM or failure to migrate from one
> to another host. What happens to data in those running VMs?
I guess that lv data was not affected, only the lvm metadata was destroyed.
When an lv is active, you have access to certain segments on the pvs
backing the lv. Deleting the lvm metadata should affect running vms.
The data of the vms is probably still on storage and probably can be restored
by using the lvm backup information. You should be able to create a device
mapper device mapped to the same segments on storage as the original
lvs and restore the data, but I never tried to do this.
> ++looping Yaniv Kaul
>
> --
>
> Thanks & Regards,
>
>
> Anantha Raghava eXza Technology Consulting & Services
>
> Do not print this e-mail unless required. Save Paper & trees.
>
> On Friday 09 December 2016 04:08 AM, Nir Soffer wrote:
>
> On Wed, Dec 7, 2016 at 2:00 PM, Anantha Raghava
> <raghav(a)exzatechconsulting.com> wrote:
>
> Hello,
>
> No luck with this? Awaiting urgent response. Also attached the vdsm and
> supervdsm logs from one of the hosts.
>
> Please provide guidance to solve this issue.
>
> --
>
> Thanks & Regards,
>
>
> Anantha Raghava eXza Technology Consulting & Services Ph: +91-9538849179,
> E-mail: raghav(a)exzatechconsulting.com
>
> Do not print this e-mail unless required. Save Paper & trees.
>
> On Monday 05 December 2016 11:16 AM, Anantha Raghava wrote:
>
> Hi,
>
> We have a single cluster with 6 Nodes in a single DC and added 4 FC Storage
> domains. All the while it was working fine, migrations, creation of new VMs
> everything were working fine. Now, all of a sudden we see the error message
> "vdsm is unable to communicate with Master domain ......." and all storage
> domains, including DC are down. But all Hosts are up, all VMs are running
> without any issues. But migrations stopped, we cannot create new VMs, we
> cannot start a shutdown VM.
>
> Can someone help us trouble shoot the issue?
>
> According to your log, vdsm cannot access the master domain:
>
> Thread-35::ERROR::2016-12-07
> 17:18:10,354::sdc::146::Storage.StorageDomainCache::(_findDomain)
> domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3 not found
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
> dom = findMethod(sdUUID)
> File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
> return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
> File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
> Thread-35::ERROR::2016-12-07
> 17:18:10,354::monitor::425::Storage.Monitor::(_checkDomainStatus)
> Error checking domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/monitor.py", line 406, in
_checkDomainStatus
> self.domain.selftest()
> File "/usr/share/vdsm/storage/sdc.py", line 50, in __getattr__
> return getattr(self.getRealDomain(), attrName)
> File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
> File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce
> domain = self._findDomain(sdUUID)
> File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
> dom = findMethod(sdUUID)
> File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
> return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
> File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
>
> Thread-35::DEBUG::2016-12-07
> 17:18:10,279::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-31 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices
> { preferred_names = ["^
> /dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3 filter = [
>
'\''a|/dev/mapper/36005076300808e51e80000000000002c|/dev/mapper/36005076300808e51e8000000
>
0000002d|/dev/mapper/36005076300808e51e80000000000002e|/dev/mapper/36005076300808e51e80000000000002f|/dev/mapper/36005076300808e51e800000000000030|/dev/mapper/36005076300808e51e8000000000000
> 31|'\'', '\''r|.*|'\'' ] } global {
locking_type=1
> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup {
> retain_min = 50 retain_days = 0 } ' --noheadings --units b --nos
> uffix --separator '|' --ignoreskippedcluster -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> 6d25efc2-b056-4c43-9a82-82
> f0c8a5ebc3 (cwd None)
> Thread-35::DEBUG::2016-12-07
> 17:18:10,351::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '
> WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!\n Volume gro
> up "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found\n Cannot process
> volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3\n'; <rc> = 5
> Thread-35::WARNING::2016-12-07
> 17:18:10,354::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 []
> [' WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!', ' V
> olume group "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found', '
> Cannot process volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3']
>
> But the monitoring system is accessing this domain just fine:
>
> Thread-12::DEBUG::2016-12-07
> 17:18:13,048::check::296::storage.check::(_start_process) START check
> '/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata'
> cmd=['/usr/bin/taskset', '--cpu-list',
> '0-31', '/usr/bin/dd',
> 'if=/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata',
> 'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct']
delay=0.00
> Thread-12::DEBUG::2016-12-07
> 17:18:13,069::check::327::storage.check::(_check_completed) FINISH
> check '/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata' rc=0
> err=bytearray(b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB)
> copied, 0.000367523 s, 11.1 MB/s\n') elapsed=0.02
>
> I suggest to file a bug about this.
>
> I would try to restart vdsm, maybe there is some issue with vdsm lvm cache.
>
> It can also be useful to see the output of:
>
> pvscan --cache
> vgs -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
> vgs -o name,pv_name -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
>
> By the way, we are running oVirt Version 4.0.1.
>
> Running 4.0.1 not a good idea, you should upgrade to latest version.
>
> Cheers,
> Nir
>
> --
>
> Thanks & Regards,
>
>
> Anantha Raghava eXza Technology Consulting & Services
>
> Do not print this e-mail unless required. Save Paper & trees.
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
>
--------------A5835634A2D475CAD4EE784D
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><font face="Liberation Serif">Hello
Nir,</font></p>
<p><font face="Liberation Serif">LVM backup &
restore...risky,
but have no options. Let's try this. VMs contain some sensitive
data that cannot exist outside these VMs. Even VM snapshots are
on the same storage. But we did not take a Volume Snapshot at
the storage level :-(</font></p>
<p><font face="Liberation Serif">We will make an attempt
& post
the results.<br>
</font></p>
<div class="moz-signature">
<meta http-equiv="content-type" content="text/html;
charset=utf-8">
<title></title>
<meta name="generator" content="LibreOffice 5.0.3.2
(Linux)">
<meta name="created" content="00:00:00">
<meta name="changedby" content="Anantha Raghava">
<meta name="changed"
content="2016-01-05T17:20:50.677541300">
<meta name="created" content="00:00:00">
<meta name="changedby" content="Anantha Raghava">
<meta name="changed"
content="2015-12-20T09:03:26.251763811">
<meta name="created" content="2015-02-21T00:00:00">
<meta name="changedby" content="Anantha Raghava">
<meta name="changed"
content="2015-12-20T09:02:11.666821134">
<style type="text/css">
@page { margin: 2cm }
p { margin-bottom: 0.25cm; color: #000000; line-height: 120% }
address { color: #000000 }
a:link { so-language: zxx }
</style>
<p>-- </p>
<p style="margin-bottom: 0cm; line-height: 100%"><font
face="Times
New Roman, serif">Thanks
& Regards,</font></p>
<p style="margin-bottom: 0cm; line-height: 100%"><br>
</p>
<address style="line-height: 100%"><font face="Times New
Roman,
serif">Anantha
Raghava</font></address>
<address style="line-height: 100%"><font face="Times New
Roman,
serif">eXza
Technology Consulting & Services</font></address>
<font color="#66cc00"><font face="Times New Roman,
serif"><br>
Do
not print this e-mail unless required. Save Paper &
trees.</font></font>
</div>
<div class="moz-cite-prefix">On Saturday 10 December 2016 12:40 AM,
Nir Soffer wrote:<br>
</div>
<blockquote
cite="mid:CAMRbyysFxqrtTQmCgMfVuNUguDv2td7z99T=eSmUoAcODHbDPw@mail.gmail.com"
type="cite">
<pre wrap="">On Fri, Dec 9, 2016 at 7:07 AM, Anantha Raghava
<a class="moz-txt-link-rfc2396E"
href="mailto:raghav@exzatechconsulting.com"><raghav@exzatechconsulting.com></a>
wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hello Nir,
Thanks for anlysing the log.
As we upgraded to 4.0.5 and also re-installed one of the Hosts afresh.
Removed all other hosts from cluster. We created a new LUN and added it as
new master domain. Now when we try to import the other existing domains, no
storage LUN is visible. May be it is because, the volumes are having NTFS
format instead of LVM2, Partition Table replaced with NTFS table and
OVF_STORE is deleted.
</pre>
</blockquote>
<pre wrap="">
:-)
</pre>
<blockquote type="cite">
<pre wrap="">
Same Chassis had a blade with Windows Server 2016 (Bare Metal OS) as well.
Now doing a postmortem as to how did these volumes convert to NTFS all of a
sudden at 4:58 PM IST on 3rd December 2016. (This timing is extracted from
storage logs). Has someone added these LUNs to Windows Server and formatted
ignoring the warning!!! However, now that nothing else can be done, we tried
to add the same existing LUN as new Domain, it fails to add. It will not
format the LUN as needed to add it as storage domain. Wondering as to why is
this?
Even if some one added the existing LUNs ignoring warning, without realising
danger, how the running VMs continued to work till we shut them down? I can
understand the refusal to start a shutdown VM or failure to migrate from one
to another host. What happens to data in those running VMs?
</pre>
</blockquote>
<pre wrap="">
I guess that lv data was not affected, only the lvm metadata was destroyed.
When an lv is active, you have access to certain segments on the pvs
backing the lv. Deleting the lvm metadata should affect running vms.
The data of the vms is probably still on storage and probably can be restored
by using the lvm backup information. You should be able to create a device
mapper device mapped to the same segments on storage as the original
lvs and restore the data, but I never tried to do this.
</pre>
<blockquote type="cite">
<pre wrap="">
++looping Yaniv Kaul
--
Thanks & Regards,
Anantha Raghava eXza Technology Consulting & Services
Do not print this e-mail unless required. Save Paper & trees.
On Friday 09 December 2016 04:08 AM, Nir Soffer wrote:
On Wed, Dec 7, 2016 at 2:00 PM, Anantha Raghava
<a class="moz-txt-link-rfc2396E"
href="mailto:raghav@exzatechconsulting.com"><raghav@exzatechconsulting.com></a>
wrote:
Hello,
No luck with this? Awaiting urgent response. Also attached the vdsm and
supervdsm logs from one of the hosts.
Please provide guidance to solve this issue.
--
Thanks & Regards,
Anantha Raghava eXza Technology Consulting & Services Ph: +91-9538849179,
E-mail: <a class="moz-txt-link-abbreviated"
href="mailto:raghav@exzatechconsulting.com">raghav@exzatechconsulting.com</a>
Do not print this e-mail unless required. Save Paper & trees.
On Monday 05 December 2016 11:16 AM, Anantha Raghava wrote:
Hi,
We have a single cluster with 6 Nodes in a single DC and added 4 FC Storage
domains. All the while it was working fine, migrations, creation of new VMs
everything were working fine. Now, all of a sudden we see the error message
"vdsm is unable to communicate with Master domain ......." and all storage
domains, including DC are down. But all Hosts are up, all VMs are running
without any issues. But migrations stopped, we cannot create new VMs, we
cannot start a shutdown VM.
Can someone help us trouble shoot the issue?
According to your log, vdsm cannot access the master domain:
Thread-35::ERROR::2016-12-07
17:18:10,354::sdc::146::Storage.StorageDomainCache::(_findDomain)
domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3 not found
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
Thread-35::ERROR::2016-12-07
17:18:10,354::monitor::425::Storage.Monitor::(_checkDomainStatus)
Error checking domain 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 406, in _checkDomainStatus
self.domain.selftest()
File "/usr/share/vdsm/storage/sdc.py", line 50, in __getattr__
return getattr(self.getRealDomain(), attrName)
File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/blockSD.py", line 1441, in findDomain
return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
File "/usr/share/vdsm/storage/blockSD.py", line 1404, in findDomainPath
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'6d25efc2-b056-4c43-9a82-82f0c8a5ebc3',)
Thread-35::DEBUG::2016-12-07
17:18:10,279::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
--cpu-list 0-31 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices
{ preferred_names = ["^
/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3 filter = [
'\''a|/dev/mapper/36005076300808e51e80000000000002c|/dev/mapper/36005076300808e51e8000000
0000002d|/dev/mapper/36005076300808e51e80000000000002e|/dev/mapper/36005076300808e51e80000000000002f|/dev/mapper/36005076300808e51e800000000000030|/dev/mapper/36005076300808e51e8000000000000
31|'\'', '\''r|.*|'\'' ] } global { locking_type=1
prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup {
retain_min = 50 retain_days = 0 } ' --noheadings --units b --nos
uffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
6d25efc2-b056-4c43-9a82-82
f0c8a5ebc3 (cwd None)
Thread-35::DEBUG::2016-12-07
17:18:10,351::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n Volume gro
up "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found\n Cannot process
volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3\n'; <rc> = 5
Thread-35::WARNING::2016-12-07
17:18:10,354::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 []
[' WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!', ' V
olume group "6d25efc2-b056-4c43-9a82-82f0c8a5ebc3" not found', '
Cannot process volume group 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3']
But the monitoring system is accessing this domain just fine:
Thread-12::DEBUG::2016-12-07
17:18:13,048::check::296::storage.check::(_start_process) START check
'/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata'
cmd=['/usr/bin/taskset', '--cpu-list',
'0-31', '/usr/bin/dd',
'if=/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata',
'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct']
delay=0.00
Thread-12::DEBUG::2016-12-07
17:18:13,069::check::327::storage.check::(_check_completed) FINISH
check '/dev/6d25efc2-b056-4c43-9a82-82f0c8a5ebc3/metadata' rc=0
err=bytearray(b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB)
copied, 0.000367523 s, 11.1 MB/s\n') elapsed=0.02
I suggest to file a bug about this.
I would try to restart vdsm, maybe there is some issue with vdsm lvm cache.
It can also be useful to see the output of:
pvscan --cache
vgs -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
vgs -o name,pv_name -vvvv 6d25efc2-b056-4c43-9a82-82f0c8a5ebc3
By the way, we are running oVirt Version 4.0.1.
Running 4.0.1 not a good idea, you should upgrade to latest version.
Cheers,
Nir
--
Thanks & Regards,
Anantha Raghava eXza Technology Consulting & Services
Do not print this e-mail unless required. Save Paper & trees.
_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
</blockquote>
<br>
</body>
</html>
--------------A5835634A2D475CAD4EE784D--