[Users] VM crashes and doesn't recover

Yuval M yuvalme at gmail.com
Sun Mar 24 19:50:02 UTC 2013


I am running vdsm from packages as my interest is in developing for the
engine and not vdsm.
I updated the vdsm package in an attempt to solve this, now I have:
# rpm -q vdsm
vdsm-4.10.3-10.fc18.x86_64

I noticed that when the storage domain crashes I can't even do "df -h"
(hangs)
I'm also getting some errors in /var/log/messages:

Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
connect to supervdsm
Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died,
respawning slave
Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
/rhev/data-center/mnt already exists
Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc
server module. Please make sure it is installed.
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
'{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address':
{u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
u'1'}}' found
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
'{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address':
{u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
u'2'}}' found
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by
libvirt vm
Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting
link
Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
6.0 Gbps (SStatus 133 SControl 300)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method
parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48),
AE_NOT_FOUND (20120711/psparse-536)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP]
Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method
parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48),
AE_NOT_FOUND (20120711/psparse-536)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured
for UDMA/133
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table:
253:0: multipath: error getting device
Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl:
error adding target to table
Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table:
253:0: multipath: error getting device
Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl:
error adding target to table

ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)

Yuval


On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken at redhat.com> wrote:

> On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
> > Hello,
> >
> > I am using Ovirt 3.2 on Fedora 18:
> > [wil at bufferoverflow ~]$ rpm -q vdsm
> > vdsm-4.10.3-7.fc18.x86_64
> >
> > (the engine is built from sources).
> >
> > I seem to have hit this bug:
> > https://bugzilla.redhat.com/show_bug.cgi?id=922515
>
> This bug is only one part of the problem, but it's nasty enough that I
> have just suggested it as a fix to the ovirt-3.2 branch of vdsm:
> http://gerrit.ovirt.org/13303
>
> Could you test if with it, vdsm relinquishes its spm role, and recovers
> as operational?
>
> >
> > in the following configuration:
> > Single host (no migrations)
> > Created a VM, installed an OS inside (Fedora18)
> > stopped the VM.
> > created template from it.
> > Created an additional VM from the template using thin provision.
> > Started the second VM.
> >
> > in addition to the errors in the logs the storage domains (both data and
> > ISO) crashed, i.e went to "unknown" and "inactive" states respectively.
> > (see the attached engine.log)
> >
> > I attached the VDSM and engine logs.
> >
> > is there a way to work around this problem?
> > It happens repeatedly.
> >
> > Yuval Meir
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130324/75c94a98/attachment-0001.html>


More information about the Users mailing list