[Users] VM crashes and doesn't recover
Dafna Ron
dron at redhat.com
Sun Mar 24 09:56:33 UTC 2013
https://bugzilla.redhat.com/show_bug.cgi?id=890365
try restarting the vdsm service.
you had a problem with the storage and the vdsm did not recover properly.
On 03/24/2013 11:40 AM, Yuval M wrote:
> sanlock is at the latest version (this solved another problem we had a
> few days ago):
>
> $ rpm -q sanlock
> sanlock-2.6-7.fc18.x86_64
>
> the storage is on the same machine as the engine and vdsm.
> iptables is up but there is a rule to allow all localhost traffic.
>
>
> On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk <mlipchuk at redhat.com
> <mailto:mlipchuk at redhat.com>> wrote:
>
> From the VDSM log, it seems that the master storage domain was not
> responding.
>
> Thread-23::DEBUG::2013-03-22
> 18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
> Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to
> Invalid
> ....
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
> _monitorDomain
> self.domain.selftest()
> File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
> fileSD.FileStorageDomain.selftest(self)
> File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
> self.oop.os.statvfs(self.domaindir)
> File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
> callCrabRPCFunction
> *args, **kwargs)
> File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
> callCrabRPCFunction
> rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
> File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146,
> in _recvAll
> raise Timeout()
> Timeout
> .....
>
> I'm also see a san lock issue, but I think that is because the storage
> could not be reached:
> ReleaseHostIdFailure: Cannot release host id:
> ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
> lockspace remove failure', 'Device or resource busy'))
>
> Can you try to see if the ip tables are running on your host, and
> if so,
> please check if it is blocking the storage server by any chance?
> Can you try to manually mount this NFS and see if it works?
> Is it possible the storage server got connectivity issues?
>
>
> Regards,
> Maor
>
> On 03/22/2013 08:24 PM, Limor Gavish wrote:
> > Hello,
> >
> > I am using Ovirt 3.2 on Fedora 18:
> > [wil at bufferoverflow ~]$ rpm -q vdsm
> > vdsm-4.10.3-7.fc18.x86_64
> >
> > (the engine is built from sources).
> >
> > I seem to have hit this bug:
> > https://bugzilla.redhat.com/show_bug.cgi?id=922515
> >
> > in the following configuration:
> > Single host (no migrations)
> > Created a VM, installed an OS inside (Fedora18)
> > stopped the VM.
> > created template from it.
> > Created an additional VM from the template using thin provision.
> > Started the second VM.
> >
> > in addition to the errors in the logs the storage domains (both
> data and
> > ISO) crashed, i.e went to "unknown" and "inactive" states
> respectively.
> > (see the attached engine.log)
> >
> > I attached the VDSM and engine logs.
> >
> > is there a way to work around this problem?
> > It happens repeatedly.
> >
> > Yuval Meir
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
--
Dafna Ron
More information about the Users
mailing list