[Users] VM crashes and doesn't recover

Hello, I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64 (the engine is built from sources). I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515 in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM. in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log) I attached the VDSM and engine logs. is there a way to work around this problem? It happens repeatedly. Yuval Meir

From the VDSM log, it seems that the master storage domain was not responding.
Thread-23::DEBUG::2013-03-22 18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain) Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid .... Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest fileSD.FileStorageDomain.selftest(self) File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll raise Timeout() Timeout ..... I'm also see a san lock issue, but I think that is because the storage could not be reached: ReleaseHostIdFailure: Cannot release host id: ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) Can you try to see if the ip tables are running on your host, and if so, please check if it is blocking the storage server by any chance? Can you try to manually mount this NFS and see if it works? Is it possible the storage server got connectivity issues? Regards, Maor On 03/22/2013 08:24 PM, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

sanlock is at the latest version (this solved another problem we had a few days ago): $ rpm -q sanlock sanlock-2.6-7.fc18.x86_64 the storage is on the same machine as the engine and vdsm. iptables is up but there is a rule to allow all localhost traffic. On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk <mlipchuk@redhat.com> wrote:
From the VDSM log, it seems that the master storage domain was not responding.
Thread-23::DEBUG::2013-03-22
18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain) Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid .... Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest fileSD.FileStorageDomain.selftest(self) File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll raise Timeout() Timeout .....
I'm also see a san lock issue, but I think that is because the storage could not be reached: ReleaseHostIdFailure: Cannot release host id: ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
Can you try to see if the ip tables are running on your host, and if so, please check if it is blocking the storage server by any chance? Can you try to manually mount this NFS and see if it works? Is it possible the storage server got connectivity issues?
Regards, Maor
On 03/22/2013 08:24 PM, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

https://bugzilla.redhat.com/show_bug.cgi?id=890365 try restarting the vdsm service. you had a problem with the storage and the vdsm did not recover properly. On 03/24/2013 11:40 AM, Yuval M wrote:
sanlock is at the latest version (this solved another problem we had a few days ago):
$ rpm -q sanlock sanlock-2.6-7.fc18.x86_64
the storage is on the same machine as the engine and vdsm. iptables is up but there is a rule to allow all localhost traffic.
On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk <mlipchuk@redhat.com <mailto:mlipchuk@redhat.com>> wrote:
From the VDSM log, it seems that the master storage domain was not responding.
Thread-23::DEBUG::2013-03-22 18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain) Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid .... Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest fileSD.FileStorageDomain.selftest(self) File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll raise Timeout() Timeout .....
I'm also see a san lock issue, but I think that is because the storage could not be reached: ReleaseHostIdFailure: Cannot release host id: ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
Can you try to see if the ip tables are running on your host, and if so, please check if it is blocking the storage server by any chance? Can you try to manually mount this NFS and see if it works? Is it possible the storage server got connectivity issues?
Regards, Maor
On 03/22/2013 08:24 PM, Limor Gavish wrote: > Hello, > > I am using Ovirt 3.2 on Fedora 18: > [wil@bufferoverflow ~]$ rpm -q vdsm > vdsm-4.10.3-7.fc18.x86_64 > > (the engine is built from sources). > > I seem to have hit this bug: > https://bugzilla.redhat.com/show_bug.cgi?id=922515 > > in the following configuration: > Single host (no migrations) > Created a VM, installed an OS inside (Fedora18) > stopped the VM. > created template from it. > Created an additional VM from the template using thin provision. > Started the second VM. > > in addition to the errors in the logs the storage domains (both data and > ISO) crashed, i.e went to "unknown" and "inactive" states respectively. > (see the attached engine.log) > > I attached the VDSM and engine logs. > > is there a way to work around this problem? > It happens repeatedly. > > Yuval Meir > > > > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron

On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303 Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64 I noticed that when the storage domain crashes I can't even do "df -h" (hangs) I'm also getting some errors in /var/log/messages: Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive) Yuval On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable. Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

Still getting crashes with the patch: # rpm -q vdsm vdsm-4.10.3-0.281.git97db188.fc18.x86_64 attached excerpts from vdsm.log and from dmesg. Yuval On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable.
Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data
and
ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

Concerning the following error in dmesg: [ 2235.638814] device-mapper: table: 253:0: multipath: error getting device [ 2235.638816] device-mapper: ioctl: error adding target to table I tried to debug it but mutipath gives me some problems [wil@bufferoverflow vdsm]$ sudo multipath -l Mar 28 18:28:19 | multipath.conf +5, invalid keyword: getuid_callout Mar 28 18:28:19 | multipath.conf +18, invalid keyword: getuid_callout [wil@bufferoverflow vdsm]$ sudo multipath -F Mar 28 18:28:30 | multipath.conf +5, invalid keyword: getuid_callout Mar 28 18:28:30 | multipath.conf +18, invalid keyword: getuid_callout [wil@bufferoverflow vdsm]$ sudo multipath -v2 Mar 28 18:28:35 | multipath.conf +5, invalid keyword: getuid_callout Mar 28 18:28:35 | multipath.conf +18, invalid keyword: getuid_callout Mar 28 18:28:35 | sda: rport id not found Mar 28 18:28:35 | Corsair_Force_GS_130579140000977000C3: ignoring map Any idea if those mutipath errors are related to the storage crash? Here is the mutipath.conf: [wil@bufferoverflow vdsm]$ sudo cat /etc/multipath.conf *# RHEV REVISION 1.0* * * *defaults {* * polling_interval 5* * getuid_callout "/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/%n"* * no_path_retry fail* * user_friendly_names no* * flush_on_last_del yes* * fast_io_fail_tmo 5* * dev_loss_tmo 30* * max_fds 4096* *}* * * *devices {* *device {* * vendor "HITACHI"* * product "DF.*"* * getuid_callout "/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/%n"* *}* *device {* * vendor "COMPELNT"* * product "Compellent Vol"* * no_path_retry fail* *}* *}* Thanks, Limor G On Wed, Mar 27, 2013 at 6:08 PM, Yuval M <yuvalme@gmail.com> wrote:
Still getting crashes with the patch: # rpm -q vdsm vdsm-4.10.3-0.281.git97db188.fc18.x86_64
attached excerpts from vdsm.log and from dmesg.
Yuval
On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg <danken@redhat.com>wrote:
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable.
Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both
data and
ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

Any ideas on what can cause that storage crash? could it be related to using a SSD? Thanks, Yuval Meir On Wed, Mar 27, 2013 at 6:08 PM, Yuval M <yuvalme@gmail.com> wrote:
Still getting crashes with the patch: # rpm -q vdsm vdsm-4.10.3-0.281.git97db188.fc18.x86_64
attached excerpts from vdsm.log and from dmesg.
Yuval
On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg <danken@redhat.com>wrote:
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable.
Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both
data and
ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir

Can you attach the sanlock log and the full vdsm log? (compress it if it's too big and not xz yet) Thanks. ----- Original Message -----
Any ideas on what can cause that storage crash? could it be related to using a SSD?
Thanks,
Yuval Meir
On Wed, Mar 27, 2013 at 6:08 PM, Yuval M < yuvalme@gmail.com > wrote:
Still getting crashes with the patch: # rpm -q vdsm vdsm-4.10.3-0.281.git97db188.fc18.x86_64
attached excerpts from vdsm.log and from dmesg.
Yuval
On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg < danken@redhat.com > wrote:
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable.
Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19 :57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper: table: 253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg < danken@redhat.com > wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both data and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Attached sanlock+vdsm+dmesg. I recreated the crash with clean logfiles. Yuval On Sun, Mar 31, 2013 at 10:55 PM, Ayal Baron <abaron@redhat.com> wrote:
Can you attach the sanlock log and the full vdsm log? (compress it if it's too big and not xz yet) Thanks.
Any ideas on what can cause that storage crash? could it be related to using a SSD?
Thanks,
Yuval Meir
On Wed, Mar 27, 2013 at 6:08 PM, Yuval M < yuvalme@gmail.com > wrote:
Still getting crashes with the patch: # rpm -q vdsm vdsm-4.10.3-0.281.git97db188.fc18.x86_64
attached excerpts from vdsm.log and from dmesg.
Yuval
On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg < danken@redhat.com > wrote:
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the engine and not vdsm. I updated the vdsm package in an attempt to solve this, now I have: # rpm -q vdsm vdsm-4.10.3-10.fc18.x86_64
I'm afraid that this build still does not have the patch mentioned earlier.
I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
That's expectable, since the master domain is still mounted (due to that patch missing), but unreachable.
Would you be kind to try out my little patch, in order to advance a bit in the research to solve the bug?
I'm also getting some errors in /var/log/messages:
Mar 24 19 :57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412 [4759 ]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm failed [Errno 2] No such file or directory Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't connect to supervdsm Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by libvirt vm Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting link Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d48), AE_NOT_FOUND (20120711/psparse-536) Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured for UDMA/133 Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422 [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper:
253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714148] device-mapper: ioctl: error adding target to table Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715051] device-mapper:
----- Original Message ----- table: table:
253:0: multipath: error getting device Mar 24 19:58:01 bufferoverflow kernel: [ 7405.715053] device-mapper: ioctl: error adding target to table
ata1 is a 500GB SSD. (only SATA device on the system except a DVD drive)
Yuval
On Sun, Mar 24, 2013 at 2:52 PM, Dan Kenigsberg < danken@redhat.com > wrote:
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
Hello,
I am using Ovirt 3.2 on Fedora 18: [wil@bufferoverflow ~]$ rpm -q vdsm vdsm-4.10.3-7.fc18.x86_64
(the engine is built from sources).
I seem to have hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=922515
This bug is only one part of the problem, but it's nasty enough that I have just suggested it as a fix to the ovirt-3.2 branch of vdsm: http://gerrit.ovirt.org/13303
Could you test if with it, vdsm relinquishes its spm role, and recovers as operational?
in the following configuration: Single host (no migrations) Created a VM, installed an OS inside (Fedora18) stopped the VM. created template from it. Created an additional VM from the template using thin provision. Started the second VM.
in addition to the errors in the logs the storage domains (both
data
and ISO) crashed, i.e went to "unknown" and "inactive" states respectively. (see the attached engine.log)
I attached the VDSM and engine logs.
is there a way to work around this problem? It happens repeatedly.
Yuval Meir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Yuval M" <yuvalme@gmail.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: users@ovirt.org, "Nezer Zaidenberg" <nzaidenberg@mac.com> Sent: Friday, March 29, 2013 2:19:43 PM Subject: Re: [Users] VM crashes and doesn't recover
Any ideas on what can cause that storage crash? could it be related to using a SSD?
What the logs say is that the IO on the storage domain are failing (both the oop timeouts and the sanlock log) and this triggers the VDSM restart.
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
This is also consistent with the unreachable domain. The dmesg log that you attached doesn't contain timestamps so it's hard to correlate with the rest. If you want you can try to reproduce the issue and resubmit the logs: /var/log/vdsm/vdsm.log /var/log/sanlock.log /var/log/messages (Maybe stating also the exact time when the issue begins to appear) In the logs I noticed that you're using only one NFS domain, and I think that the SSD (on the storage side) shouldn't be a problem. When you experience such failure are you able to read/write from/to the SSD on machine that is serving the share? (If it's the same machine check that using the "real" path where it's mounted, not the nfs share) -- Federico

I am seeing the errors appear even without any VM activity: Apr 10 23:24:24 bufferoverflow vdsm TaskManager.Task ERROR Task=`04370dcb-a823-4485-8d7e-b4cfc75905a0`::Unexpected error Apr 10 23:24:24 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Unknown pool id, pool not connected: ('5849b030-626e-47cb-ad90-3ce782d831b3',)", 'code': 309}} Apr 10 23:24:24 bufferoverflow vdsm TaskManager.Task ERROR Task=`354009d2-e7c1-4558-b947-0e3a19ab5490`::Unexpected error Apr 10 23:24:24 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Unknown pool id, pool not connected: ('5849b030-626e-47cb-ad90-3ce782d831b3',)", 'code': 309}} Apr 10 23:24:25 bufferoverflow kernel: [ 9136.829062] ata1: hard resetting link Apr 10 23:24:25 bufferoverflow kernel: [ 9137.135381] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 10 23:24:25 bufferoverflow kernel: [ 9137.146797] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Apr 10 23:24:25 bufferoverflow kernel: [ 9137.146805] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d70), AE_NOT_FOUND (20120711/psparse-536) Apr 10 23:24:25 bufferoverflow kernel: [ 9137.156747] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359) Apr 10 23:24:25 bufferoverflow kernel: [ 9137.156755] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880407c74d70), AE_NOT_FOUND (20120711/psparse-536) Apr 10 23:24:25 bufferoverflow kernel: [ 9137.157350] ata1.00: configured for UDMA/133 Apr 10 23:24:25 bufferoverflow kernel: [ 9137.157355] ata1: EH complete Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856010] device-mapper: table: 253:0: multipath: error getting device Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856013] device-mapper: ioctl: error adding target to table Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856534] device-mapper: table: 253:0: multipath: error getting device Apr 10 23:24:29 bufferoverflow kernel: [ 9140.856536] device-mapper: ioctl: error adding target to table Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent) Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent) Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent) Apr 10 23:24:29 bufferoverflow multipathd: dm-0: remove map (uevent) Apr 10 23:24:29 bufferoverflow vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found'] Apr 10 23:24:29 bufferoverflow vdsm TaskManager.Task ERROR Task=`0248943a-6acd-496b-932e-b236920932f0`::Unexpected error Apr 10 23:24:29 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Cannot find master domain: 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3, msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'", 'code': 304}} Apr 10 23:24:29 bufferoverflow vdsm Storage.HSM WARNING disconnect sp: 5849b030-626e-47cb-ad90-3ce782d831b3 failed. Known pools {} Apr 10 23:24:29 bufferoverflow vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume group "a8286508-db45-40d7-8645-e573f6bacdc7" not found'] Apr 10 23:24:29 bufferoverflow vdsm TaskManager.Task ERROR Task=`c1ea06bf-8046-4cb4-88c7-00337051d713`::Unexpected error Apr 10 23:24:29 bufferoverflow vdsm Storage.Dispatcher.Protect ERROR {'status': {'message': "Storage domain does not exist: ('a8286508-db45-40d7-8645-e573f6bacdc7',)", 'code': 358}} I have attached the full logs. I suspect this is some Fedora <--> SSD bug that may be unrelated to ovirt. I'm going to work around that for now by moving my VM's storage to a magnetic disk on another server. Yuval Meir On Tue, Apr 9, 2013 at 11:31 AM, Federico Simoncelli <fsimonce@redhat.com>wrote:
----- Original Message -----
From: "Yuval M" <yuvalme@gmail.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: users@ovirt.org, "Nezer Zaidenberg" <nzaidenberg@mac.com> Sent: Friday, March 29, 2013 2:19:43 PM Subject: Re: [Users] VM crashes and doesn't recover
Any ideas on what can cause that storage crash? could it be related to using a SSD?
What the logs say is that the IO on the storage domain are failing (both the oop timeouts and the sanlock log) and this triggers the VDSM restart.
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
I am running vdsm from packages as my interest is in developing for the I noticed that when the storage domain crashes I can't even do "df -h" (hangs)
This is also consistent with the unreachable domain.
The dmesg log that you attached doesn't contain timestamps so it's hard to correlate with the rest.
If you want you can try to reproduce the issue and resubmit the logs:
/var/log/vdsm/vdsm.log /var/log/sanlock.log /var/log/messages
(Maybe stating also the exact time when the issue begins to appear)
In the logs I noticed that you're using only one NFS domain, and I think that the SSD (on the storage side) shouldn't be a problem. When you experience such failure are you able to read/write from/to the SSD on machine that is serving the share? (If it's the same machine check that using the "real" path where it's mounted, not the nfs share)
-- Federico
participants (7)
-
Ayal Baron
-
Dafna Ron
-
Dan Kenigsberg
-
Federico Simoncelli
-
Limor Gavish
-
Maor Lipchuk
-
Yuval M