[Users] VM crashes and doesn't recover

Sun Mar 24 09:56:33 UTC 2013

https://bugzilla.redhat.com/show_bug.cgi?id=890365

try restarting the vdsm service.
you had a problem with the storage and the vdsm did not recover properly.

On 03/24/2013 11:40 AM, Yuval M wrote:
> sanlock is at the latest version (this solved another problem we had a 
> few days ago):
>
> $ rpm -q sanlock
> sanlock-2.6-7.fc18.x86_64
>
> the storage is on the same machine as the engine and vdsm.
> iptables is up but there is a rule to allow all localhost traffic.
>
>
> On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk <mlipchuk at redhat.com 
> <mailto:mlipchuk at redhat.com>> wrote:
>
>     From the VDSM log, it seems that the master storage domain was not
>     responding.
>
>     Thread-23::DEBUG::2013-03-22
>     18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
>     Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to
>     Invalid
>     ....
>     Traceback (most recent call last):
>       File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
>     _monitorDomain
>         self.domain.selftest()
>       File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
>         fileSD.FileStorageDomain.selftest(self)
>       File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
>         self.oop.os.statvfs(self.domaindir)
>       File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
>     callCrabRPCFunction
>         *args, **kwargs)
>       File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
>     callCrabRPCFunction
>         rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
>       File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146,
>     in _recvAll
>         raise Timeout()
>     Timeout
>     .....
>
>     I'm also see a san lock issue, but I think that is because the storage
>     could not be reached:
>     ReleaseHostIdFailure: Cannot release host id:
>     ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
>     lockspace remove failure', 'Device or resource busy'))
>
>     Can you try to see if the ip tables are running on your host, and
>     if so,
>     please check if it is blocking the storage server by any chance?
>     Can you try to manually mount this NFS and see if it works?
>     Is it possible the storage server got connectivity issues?
>
>
>     Regards,
>     Maor
>
>     On 03/22/2013 08:24 PM, Limor Gavish wrote:
>     > Hello,
>     >
>     > I am using Ovirt 3.2 on Fedora 18:
>     > [wil at bufferoverflow ~]$ rpm -q vdsm
>     > vdsm-4.10.3-7.fc18.x86_64
>     >
>     > (the engine is built from sources).
>     >
>     > I seem to have hit this bug:
>     > https://bugzilla.redhat.com/show_bug.cgi?id=922515
>     >
>     > in the following configuration:
>     > Single host (no migrations)
>     > Created a VM, installed an OS inside (Fedora18)
>     > stopped the VM.
>     > created template from it.
>     > Created an additional VM from the template using thin provision.
>     > Started the second VM.
>     >
>     > in addition to the errors in the logs the storage domains (both
>     data and
>     > ISO) crashed, i.e went to "unknown" and "inactive" states
>     respectively.
>     > (see the attached engine.log)
>     >
>     > I attached the VDSM and engine logs.
>     >
>     > is there a way to work around this problem?
>     > It happens repeatedly.
>     >
>     > Yuval Meir
>     >
>     >
>     >
>     > _______________________________________________
>     > Users mailing list
>     > Users at ovirt.org <mailto:Users at ovirt.org>
>     > http://lists.ovirt.org/mailman/listinfo/users
>     >
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-- 
Dafna Ron