[Users] VM crashes and doesn't recover

Yuval M yuvalme at gmail.com
Sun Mar 31 16:47:21 EDT 2013


Attached sanlock+vdsm+dmesg.
I recreated the crash with clean logfiles.

Yuval


On Sun, Mar 31, 2013 at 10:55 PM, Ayal Baron <abaron at redhat.com> wrote:

> Can you attach the sanlock log and the full vdsm log? (compress it if it's
> too big and not xz yet)
> Thanks.
>
> ----- Original Message -----
> > Any ideas on what can cause that storage crash?
> > could it be related to using a SSD?
> >
> > Thanks,
> >
> > Yuval Meir
> >
> >
> > On Wed, Mar 27, 2013 at 6:08 PM, Yuval M < yuvalme at gmail.com > wrote:
> >
> >
> >
> > Still getting crashes with the patch:
> > # rpm -q vdsm
> > vdsm-4.10.3-0.281.git97db188.fc18.x86_64
> >
> > attached excerpts from vdsm.log and from dmesg.
> >
> > Yuval
> >
> >
> > On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg < danken at redhat.com >
> wrote:
> >
> >
> >
> > On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
> > > I am running vdsm from packages as my interest is in developing for the
> > > engine and not vdsm.
> > > I updated the vdsm package in an attempt to solve this, now I have:
> > > # rpm -q vdsm
> > > vdsm-4.10.3-10.fc18.x86_64
> >
> > I'm afraid that this build still does not have the patch mentioned
> > earlier.
> >
> > >
> > > I noticed that when the storage domain crashes I can't even do "df -h"
> > > (hangs)
> >
> > That's expectable, since the master domain is still mounted (due to that
> > patch missing), but unreachable.
> >
> > Would you be kind to try out my little patch, in order to advance a bit
> > in the research to solve the bug?
> >
> >
> > > I'm also getting some errors in /var/log/messages:
> > >
> > > Mar 24 19 :57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > > svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > > 7412
> > > [4759 ]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
> > > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > > 7412
> > > [4759 ]: 1083422e close_task_aio 1 0x7ff374000910 busy
> > > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > > 7412
> > > [4759 ]: 1083422e close_task_aio 2 0x7ff374000960 busy
> > > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > > 7412
> > > [4759 ]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
> > > Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> svdsm
> > > failed [Errno 2] No such file or directory
> > > Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
> > > connect to supervdsm
> > > Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm'
> died,
> > > respawning slave
> > > Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
> > > /rhev/data-center/mnt already exists
> > > Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the
> json rpc
> > > server module. Please make sure it is installed.
> > > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found,
> device:
> > > '{'device': u'unix', 'alias': u'channel0', 'type': u'channel',
> 'address':
> > > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> > > u'1'}}' found
> > > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found,
> device:
> > > '{'device': u'unix', 'alias': u'channel1', 'type': u'channel',
> 'address':
> > > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> > > u'2'}}' found
> > > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode
> unsupported by
> > > libvirt vm
> > > Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard
> resetting
> > > link
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link
> up
> > > 6.0 Gbps (SStatus 133 SControl 300)
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error:
> [DSSP]
> > > Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error:
> Method
> > > parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node
> ffff880407c74d48),
> > > AE_NOT_FOUND (20120711/psparse-536)
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error:
> [DSSP]
> > > Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error:
> Method
> > > parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node
> ffff880407c74d48),
> > > AE_NOT_FOUND (20120711/psparse-536)
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00:
> configured
> > > for UDMA/133
> > > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete
> > > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
> 7422
> > > [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
> > > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
> 7422
> > > [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
> > > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
> 7422
> > > [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
> > > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
> 7422
> > > [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
> > > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper:
> table:
> > > 253:0: multipath: error getting device
> > > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper:
> ioctl:
> > > error adding target to table
> > > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper:
> table:
> > > 253:0: multipath: error getting device
> > > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper:
> ioctl:
> > > error adding target to table
> > >
> > > ata1 is a 500GB SSD. (only SATA device on the system except a DVD
> drive)
> > >
> > > Yuval
> > >
> > >
> > > On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg < danken at redhat.com >
> > > wrote:
> > >
> > > > On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
> > > > > Hello,
> > > > >
> > > > > I am using Ovirt 3.2 on Fedora 18:
> > > > > [wil at bufferoverflow ~]$ rpm -q vdsm
> > > > > vdsm-4.10.3-7.fc18.x86_64
> > > > >
> > > > > (the engine is built from sources).
> > > > >
> > > > > I seem to have hit this bug:
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=922515
> > > >
> > > > This bug is only one part of the problem, but it's nasty enough that
> I
> > > > have just suggested it as a fix to the ovirt-3.2 branch of vdsm:
> > > > http://gerrit.ovirt.org/13303
> > > >
> > > > Could you test if with it, vdsm relinquishes its spm role, and
> recovers
> > > > as operational?
> > > >
> > > > >
> > > > > in the following configuration:
> > > > > Single host (no migrations)
> > > > > Created a VM, installed an OS inside (Fedora18)
> > > > > stopped the VM.
> > > > > created template from it.
> > > > > Created an additional VM from the template using thin provision.
> > > > > Started the second VM.
> > > > >
> > > > > in addition to the errors in the logs the storage domains (both
> data
> > > > > and
> > > > > ISO) crashed, i.e went to "unknown" and "inactive" states
> respectively.
> > > > > (see the attached engine.log)
> > > > >
> > > > > I attached the VDSM and engine logs.
> > > > >
> > > > > is there a way to work around this problem?
> > > > > It happens repeatedly.
> > > > >
> > > > > Yuval Meir
> > > >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130331/6b436694/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs20130331.zip
Type: application/zip
Size: 195442 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130331/6b436694/attachment-0001.zip>


More information about the Users mailing list