On Sun, Mar 13, 2016 at 1:14 PM, Christophe TREFOIS <
christophe.trefois(a)uni.lu> wrote:
Hi Yaniv,
See my answers / questions below under [CT].
*From:* Yaniv Kaul [mailto:ykaul@redhat.com]
*Sent:* dimanche 13 mars 2016 12:08
*To:* Christophe TREFOIS <christophe.trefois(a)uni.lu>
*Cc:* users <users(a)ovirt.org>
*Subject:* Re: [ovirt-users] VM get stuck randomly
On Sun, Mar 13, 2016 at 9:46 AM, Christophe TREFOIS <
christophe.trefois(a)uni.lu> wrote:
Dear all,
I have a problem since couple of weeks, where randomly 1 VM (not always
the same) becomes completely unresponsive.
We find this out because our Icinga server complains that host is down.
Upon inspection, we find we can’t open a console to the VM, nor can we
login.
I assume 3.6's console feature, or is it Spice/VNC?
*[CT] *
This is 3.5, VNC/Spice yes. Sometimes we can connect, but there’s no way
to do anything, eg type or so on.
In oVirt engine, the VM looks like “up”. The only weird thing is that RAM
usage shows 0% and CPU usage shows 100% or 75% depending on number of cores.
Any chance there's really something bad going on within the VM? Anything
in its journal or /var/log/messages or ... depending on the OS?
Y.
*[CT] *
*It is possible. It seems to be mostly VMs with Ubuntu 14.04 and latest
kernels. I read somewhere, I couldn’t find now, that there’s perhaps a bug
in 3.x kernel with regards to libvirt / vdsm. But my knowledge is too
limited to even know where to begin the investigation **J*
*On the VM logs, we just see normal VM stuff, then nothing, and then when
the VM was rebooted, there’s a couple of lines of ^@^@^@ characters
repeating. But nothing else really.*
*Initially we thought it’s a bug with aufs on Docker, but the machines
getting stuck now don’t run either.*
*From your answer, I deduce that if vdsm or libvirt or spm would see a
problem with storage / memory / cpu, it would suspend the VM and provide
that info to ovirt-engine? *
*Since this is not happening, you think it could be related to the inside
of the VM rather than the oVirt environment, correct?*
Either that, or to libvirt/QEMU.
I suggest, if possible, to upgrade the components first to newer versions
(as Nir suggested).
Y.
*Thank you for your help (especially on a Sunday) **J*
The only way to recover is to force shutdown the VM via 2-times shutdown
from the engine.
Could you please help me to start debugging this?
I can provide any logs, but I’m not sure which ones, because I couldn’t
see anything with ERROR in the vdsm logs on the host.
The host is running
OS Version: RHEL - 7 - 1.1503.el7.centos.2.8
Kernel Version: 3.10.0 - 229.14.1.el7.x86_64
KVM Version: 2.1.2 - 23.el7_1.8.1
LIBVIRT Version: libvirt-1.2.8-16.el7_1.4
VDSM Version: vdsm-4.16.26-0.el7.centos
SPICE Version: 0.12.4 - 9.el7_1.3
GlusterFS Version: glusterfs-3.7.5-1.el7
We use a locally exported gluster as storage domain (eg, storage is on the
same machine exposed via gluster). No replica.
We run around 50 VMs on that host.
Thank you for your help in this,
—
Christophe
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users