<div dir="ltr">I am seeing the errors appear even without any VM activity:<div><br></div><div><div>Apr 10 23:24:24 bufferoverflow vdsm TaskManager.Task ERROR Task=`04370dcb-a823-4485-8d7e-b4cfc75905a0`::Unexpected error</div>
<div>Apr 10 23:24:24 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Unknown pool id, pool not connected: ('5849b030-626e-47cb-ad90-3ce782d831b3',)", 'code': 309}}</div>
<div>Apr 10 23:24:24 bufferoverflow vdsm TaskManager.Task ERROR Task=`354009d2-e7c1-4558-b947-0e3a19ab5490`::Unexpected error</div><div>Apr 10 23:24:24 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Unknown pool id, pool not connected: ('5849b030-626e-47cb-ad90-3ce782d831b3',)", 'code': 309}}</div>
<div>Apr 10 23:24:25 bufferoverflow kernel: [ 9136.829062] ata1: hard resetting link</div><div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.135381] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)</div><div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.146797] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)</div>
<div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.146805] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d70), AE_NOT_FOUND (20120711/psparse-536)</div><div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.156747] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)</div>
<div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.156755] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d70), AE_NOT_FOUND (20120711/psparse-536)</div><div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.157350] ata1.00: configured for UDMA/133</div>
<div>Apr 10 23:24:25 bufferoverflow kernel: [ 9137.157355] ata1: EH complete</div><div>Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856010] device-mapper: table: 253:0: multipath: error getting device</div><div>Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856013] device-mapper: ioctl: error adding target to table</div>
<div>Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856534] device-mapper: table: 253:0: multipath: error getting device</div><div>Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856536] device-mapper: ioctl: error adding target to table</div>
<div>Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent)</div><div>Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent)</div><div>Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent)</div>
<div>Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent)</div><div>Apr 10 23:24:29 bufferoverflow vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found']</div>
<div>Apr 10 23:24:29 bufferoverflow vdsm TaskManager.Task ERROR Task=`0248943a-6acd-496b-932e-b236920932f0`::Unexpected error</div><div>Apr 10 23:24:29 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Cannot find master domain: 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3, msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'", 'code': 304}}</div>
<div>Apr 10 23:24:29 bufferoverflow vdsm Storage.HSM WARNING disconnect sp: 5849b030-626e-47cb-ad90-3ce782d831b3 failed. Known pools {}</div><div>Apr 10 23:24:29 bufferoverflow vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume group "a8286508-db45-40d7-8645-e573f6bacdc7" not found']</div>
<div>Apr 10 23:24:29 bufferoverflow vdsm TaskManager.Task ERROR Task=`c1ea06bf-8046-4cb4-88c7-00337051d713`::Unexpected error</div><div>Apr 10 23:24:29 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Storage domain does not exist: ('a8286508-db45-40d7-8645-e573f6bacdc7',)", 'code': 358}}</div>
</div><div><br></div><div><br></div><div><br></div><div style>I have attached the full logs.</div><div style><br></div><div style>I suspect this is some Fedora <--> SSD bug that may be unrelated to ovirt.</div><div style>
I'm going to work around that for now by moving my VM's storage to a magnetic disk on another server.</div><div style><br></div><div style><br></div><div style>Yuval Meir</div><div><br></div><div><br></div></div>
<div class="gmail_extra">
<br><br><div class="gmail_quote">On Tue, Apr 9, 2013 at 11:31 AM, Federico Simoncelli <span dir="ltr"><<a href="mailto:fsimonce@redhat.com" target="_blank">fsimonce@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">----- Original Message -----<br>
> From: "Yuval M" <<a href="mailto:yuvalme@gmail.com">yuvalme@gmail.com</a>><br>
> To: "Dan Kenigsberg" <<a href="mailto:danken@redhat.com">danken@redhat.com</a>><br>
</div><div class="im">> Cc: <a href="mailto:users@ovirt.org">users@ovirt.org</a>, "Nezer Zaidenberg" <<a href="mailto:nzaidenberg@mac.com">nzaidenberg@mac.com</a>><br>
> Sent: Friday, March 29, <a href="tel:2013" value="+9722013">2013</a> 2:19:43 PM<br>
> Subject: Re: [Users] VM crashes and doesn't recover<br>
><br>
</div><div class="im">> Any ideas on what can cause that storage crash?<br>
> could it be related to using a SSD?<br>
<br>
</div>What the logs say is that the IO on the storage domain are failing (both<br>
the oop timeouts and the sanlock log) and this triggers the VDSM restart.<br>
<div class="im"><br>
> On Sun, Mar 24, <a href="tel:2013" value="+9722013">2013</a> at 09:50:02PM +0200, Yuval M wrote:<br>
> > I am running vdsm from packages as my interest is in developing for the<br>
</div><div class="im">> > I noticed that when the storage domain crashes I can't even do "df -h"<br>
> > (hangs)<br>
<br>
</div>This is also consistent with the unreachable domain.<br>
<br>
The dmesg log that you attached doesn't contain timestamps so it's hard to<br>
correlate with the rest.<br>
<br>
If you want you can try to reproduce the issue and resubmit the logs:<br>
<br>
/var/log/vdsm/vdsm.log<br>
/var/log/sanlock.log<br>
/var/log/messages<br>
<br>
(Maybe stating also the exact time when the issue begins to appear)<br>
<br>
In the logs I noticed that you're using only one NFS domain, and I think that<br>
the SSD (on the storage side) shouldn't be a problem. When you experience such<br>
failure are you able to read/write from/to the SSD on machine that is serving<br>
the share? (If it's the same machine check that using the "real" path where<br>
it's mounted, not the nfs share)<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Federico<br>
</font></span></blockquote></div><br></div>