[ovirt-users] VM get stuck randomly
Christophe TREFOIS
christophe.trefois at uni.lu
Thu Mar 24 05:45:34 EDT 2016
Hi,
We finally upgraded to 3.6.3 across the whole data center and will now see if this issue reappears.
The upgrade went quite smooth, first from 3.5.4 to 3.5.6 and then to 3.6.3.
Thank you,
--
Christophe
> -----Original Message-----
> From: Nir Soffer [mailto:nsoffer at redhat.com]
> Sent: dimanche 13 mars 2016 12:51
> To: Christophe TREFOIS <christophe.trefois at uni.lu>
> Cc: users <users at ovirt.org>
> Subject: Re: [ovirt-users] VM get stuck randomly
>
> On Sun, Mar 13, 2016 at 9:46 AM, Christophe TREFOIS
> <christophe.trefois at uni.lu> wrote:
> > Dear all,
> >
> > I have a problem since couple of weeks, where randomly 1 VM (not always
> the same) becomes completely unresponsive.
> > We find this out because our Icinga server complains that host is down.
> >
> > Upon inspection, we find we can’t open a console to the VM, nor can we
> login.
> >
> > In oVirt engine, the VM looks like “up”. The only weird thing is that RAM
> usage shows 0% and CPU usage shows 100% or 75% depending on number of
> cores.
> > The only way to recover is to force shutdown the VM via 2-times shutdown
> from the engine.
> >
> > Could you please help me to start debugging this?
> > I can provide any logs, but I’m not sure which ones, because I couldn’t see
> anything with ERROR in the vdsm logs on the host.
>
> I would inspect this vm on the host when it happens.
>
> What is vdsm cpu usage? what is the qemu process (for this vm) cpu usage?
>
> strace output of this qemu process (all threads) or a core dump can help
> qemu developers to understand this issue.
>
> >
> > The host is running
> >
> > OS Version: RHEL - 7 - 1.1503.el7.centos.2.8
> > Kernel Version: 3.10.0 - 229.14.1.el7.x86_64
> > KVM Version: 2.1.2 - 23.el7_1.8.1
> > LIBVIRT Version: libvirt-1.2.8-16.el7_1.4
> > VDSM Version: vdsm-4.16.26-0.el7.centos
> > SPICE Version: 0.12.4 - 9.el7_1.3
> > GlusterFS Version: glusterfs-3.7.5-1.el7
>
> You are running old versions, missing lot of fixes. Nothing specific to your
> problem but this lower the chance to get a working system.
>
> It would be nice if you can upgrade to ovirt-3.6 and report if it made any
> change.
> Or at lest latest ovirt-3.5.
>
> > We use a locally exported gluster as storage domain (eg, storage is on the
> same machine exposed via gluster). No replica.
> > We run around 50 VMs on that host.
>
> Why use gluster for this? Do you plan to add more gluster servers in the
> future?
>
> Nir
More information about the Users
mailing list