[Users] VM crashes and doesn't recover

Maor Lipchuk mlipchuk at redhat.com
Sun Mar 24 09:34:13 UTC 2013


>From the VDSM log, it seems that the master storage domain was not
responding.

Thread-23::DEBUG::2013-03-22
18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid
....
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
_monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
    fileSD.FileStorageDomain.selftest(self)
  File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
    self.oop.os.statvfs(self.domaindir)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
callCrabRPCFunction
    *args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
callCrabRPCFunction
    rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll
    raise Timeout()
Timeout
.....

I'm also see a san lock issue, but I think that is because the storage
could not be reached:
ReleaseHostIdFailure: Cannot release host id:
('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
lockspace remove failure', 'Device or resource busy'))

Can you try to see if the ip tables are running on your host, and if so,
please check if it is blocking the storage server by any chance?
Can you try to manually mount this NFS and see if it works?
Is it possible the storage server got connectivity issues?


Regards,
Maor

On 03/22/2013 08:24 PM, Limor Gavish wrote:
> Hello,
> 
> I am using Ovirt 3.2 on Fedora 18:
> [wil at bufferoverflow ~]$ rpm -q vdsm
> vdsm-4.10.3-7.fc18.x86_64
> 
> (the engine is built from sources).
> 
> I seem to have hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=922515
> 
> in the following configuration:
> Single host (no migrations)
> Created a VM, installed an OS inside (Fedora18)
> stopped the VM.
> created template from it.
> Created an additional VM from the template using thin provision.
> Started the second VM.
> 
> in addition to the errors in the logs the storage domains (both data and
> ISO) crashed, i.e went to "unknown" and "inactive" states respectively.
> (see the attached engine.log)
> 
> I attached the VDSM and engine logs.
> 
> is there a way to work around this problem?
> It happens repeatedly.
> 
> Yuval Meir
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 





More information about the Users mailing list