[Users] VM crashes and doesn't recover
Limor Gavish
lgavish at gmail.com
Thu Mar 28 16:36:22 UTC 2013
Concerning the following error in dmesg:
[ 2235.638814] device-mapper: table: 253:0: multipath: error getting device
[ 2235.638816] device-mapper: ioctl: error adding target to table
I tried to debug it but mutipath gives me some problems
[wil at bufferoverflow vdsm]$ sudo multipath -l
Mar 28 18:28:19 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:19 | multipath.conf +18, invalid keyword: getuid_callout
[wil at bufferoverflow vdsm]$ sudo multipath -F
Mar 28 18:28:30 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:30 | multipath.conf +18, invalid keyword: getuid_callout
[wil at bufferoverflow vdsm]$ sudo multipath -v2
Mar 28 18:28:35 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:35 | multipath.conf +18, invalid keyword: getuid_callout
Mar 28 18:28:35 | sda: rport id not found
Mar 28 18:28:35 | Corsair_Force_GS_130579140000977000C3: ignoring map
Any idea if those mutipath errors are related to the storage crash?
Here is the mutipath.conf:
[wil at bufferoverflow vdsm]$ sudo cat /etc/multipath.conf
*# RHEV REVISION 1.0*
*
*
*defaults {*
* polling_interval 5*
* getuid_callout "/usr/lib/udev/scsi_id --whitelisted
--replace-whitespace --device=/dev/%n"*
* no_path_retry fail*
* user_friendly_names no*
* flush_on_last_del yes*
* fast_io_fail_tmo 5*
* dev_loss_tmo 30*
* max_fds 4096*
*}*
*
*
*devices {*
*device {*
* vendor "HITACHI"*
* product "DF.*"*
* getuid_callout "/usr/lib/udev/scsi_id --whitelisted
--replace-whitespace --device=/dev/%n"*
*}*
*device {*
* vendor "COMPELNT"*
* product "Compellent Vol"*
* no_path_retry fail*
*}*
*}*
Thanks,
Limor G
On Wed, Mar 27, 2013 at 6:08 PM, Yuval M <yuvalme at gmail.com> wrote:
> Still getting crashes with the patch:
> # rpm -q vdsm
> vdsm-4.10.3-0.281.git97db188.fc18.x86_64
>
> attached excerpts from vdsm.log and from dmesg.
>
> Yuval
>
>
> On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg <danken at redhat.com>wrote:
>
>> On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
>> > I am running vdsm from packages as my interest is in developing for the
>> > engine and not vdsm.
>> > I updated the vdsm package in an attempt to solve this, now I have:
>> > # rpm -q vdsm
>> > vdsm-4.10.3-10.fc18.x86_64
>>
>> I'm afraid that this build still does not have the patch mentioned
>> earlier.
>>
>> >
>> > I noticed that when the storage domain crashes I can't even do "df -h"
>> > (hangs)
>>
>> That's expectable, since the master domain is still mounted (due to that
>> patch missing), but unreachable.
>>
>> Would you be kind to try out my little patch, in order to advance a bit
>> in the research to solve the bug?
>>
>>
>> > I'm also getting some errors in /var/log/messages:
>> >
>> > Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
>> > Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
>> > connect to supervdsm
>> > Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm'
>> died,
>> > respawning slave
>> > Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
>> > /rhev/data-center/mnt already exists
>> > Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json
>> rpc
>> > server module. Please make sure it is installed.
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
>> > '{'device': u'unix', 'alias': u'channel0', 'type': u'channel',
>> 'address':
>> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
>> > u'1'}}' found
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
>> > '{'device': u'unix', 'alias': u'channel1', 'type': u'channel',
>> 'address':
>> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
>> > u'2'}}' found
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported
>> by
>> > libvirt vm
>> > Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard
>> resetting
>> > link
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
>> > 6.0 Gbps (SStatus 133 SControl 300)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
>> > Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method
>> > parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node
>> ffff880407c74d48),
>> > AE_NOT_FOUND (20120711/psparse-536)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP]
>> > Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method
>> > parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node
>> ffff880407c74d48),
>> > AE_NOT_FOUND (20120711/psparse-536)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00:
>> configured
>> > for UDMA/133
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete
>> > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
>> 7422
>> > [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
>> > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
>> 7422
>> > [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
>> > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
>> 7422
>> > [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
>> > Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200
>> 7422
>> > [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
>> > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper:
>> table:
>> > 253:0: multipath: error getting device
>> > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper:
>> ioctl:
>> > error adding target to table
>> > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper:
>> table:
>> > 253:0: multipath: error getting device
>> > Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper:
>> ioctl:
>> > error adding target to table
>> >
>> > ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
>> >
>> > Yuval
>> >
>> >
>> > On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken at redhat.com>
>> wrote:
>> >
>> > > On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
>> > > > Hello,
>> > > >
>> > > > I am using Ovirt 3.2 on Fedora 18:
>> > > > [wil at bufferoverflow ~]$ rpm -q vdsm
>> > > > vdsm-4.10.3-7.fc18.x86_64
>> > > >
>> > > > (the engine is built from sources).
>> > > >
>> > > > I seem to have hit this bug:
>> > > > https://bugzilla.redhat.com/show_bug.cgi?id=922515
>> > >
>> > > This bug is only one part of the problem, but it's nasty enough that I
>> > > have just suggested it as a fix to the ovirt-3.2 branch of vdsm:
>> > > http://gerrit.ovirt.org/13303
>> > >
>> > > Could you test if with it, vdsm relinquishes its spm role, and
>> recovers
>> > > as operational?
>> > >
>> > > >
>> > > > in the following configuration:
>> > > > Single host (no migrations)
>> > > > Created a VM, installed an OS inside (Fedora18)
>> > > > stopped the VM.
>> > > > created template from it.
>> > > > Created an additional VM from the template using thin provision.
>> > > > Started the second VM.
>> > > >
>> > > > in addition to the errors in the logs the storage domains (both
>> data and
>> > > > ISO) crashed, i.e went to "unknown" and "inactive" states
>> respectively.
>> > > > (see the attached engine.log)
>> > > >
>> > > > I attached the VDSM and engine logs.
>> > > >
>> > > > is there a way to work around this problem?
>> > > > It happens repeatedly.
>> > > >
>> > > > Yuval Meir
>> > >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130328/9b3dc047/attachment-0001.html>
More information about the Users
mailing list