Re: [ovirt-users] VM get stuck randomly

24 Mar 2016

      Dear list,

An Ubuntu 14.04 got stuck again on latest 3.6.4 with all patches applied.

Do you have any advice for me now, to try and figure out what could be wrong?

Does anybody else face issues with Ubuntu 14.04 and kernel 3.13.0-79-generic ?

Thank you,

—
Christophe

Dr Christophe Trefois, Dipl.-Ing.  
Technical Specialist / Post-Doc

UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
Campus Belval | House of Biomedicine  
6, avenue du Swing 
L-4367 Belvaux  
T: +352 46 66 44 6124 
F: +352 46 66 44 6949  
http://www.uni.lu/lcsb

----
This message is confidential and may contain privileged information. 
It is intended for the named recipient only. 
If you receive it in error please notify me and permanently delete the original message and any copies. 
----
...
On 24 Mar 2016, at 10:45, Christophe TREFOIS <christophe.trefois@uni.lu> wrote:
Hi,
We finally upgraded to 3.6.3 across the whole data center and will now see if this issue reappears.
The upgrade went quite smooth, first from 3.5.4 to 3.5.6 and then to 3.6.3.
Thank you,
--
Christophe
...
-----Original Message-----
From: Nir Soffer [mailto:nsoffer@redhat.com]
Sent: dimanche 13 mars 2016 12:51
To: Christophe TREFOIS <christophe.trefois@uni.lu>
Cc: users <users@ovirt.org>
Subject: Re: [ovirt-users] VM get stuck randomly
...
Dear all,
I have a problem since couple of weeks, where randomly 1 VM (not always
On Sun, Mar 13, 2016 at 9:46 AM, Christophe TREFOIS
<christophe.trefois@uni.lu> wrote:
the same) becomes completely unresponsive.
...
We find this out because our Icinga server complains that host is down.
Upon inspection, we find we can’t open a console to the VM, nor can we
login.
In oVirt engine, the VM looks like “up”. The only weird thing is that RAM
usage shows 0% and CPU usage shows 100% or 75% depending on number of
cores.
The only way to recover is to force shutdown the VM via 2-times shutdown
from the engine.
Could you please help me to start debugging this?
I can provide any logs, but I’m not sure which ones, because I couldn’t see
anything with ERROR in the vdsm logs on the host.
I would inspect this vm on the host when it happens.
What is vdsm cpu usage? what is the qemu process (for this vm) cpu usage?
strace output of this qemu process (all threads) or a core dump can help
qemu developers to understand this issue.
...
The host is running
OS Version:             RHEL - 7 - 1.1503.el7.centos.2.8
Kernel Version: 3.10.0 - 229.14.1.el7.x86_64
KVM Version:            2.1.2 - 23.el7_1.8.1
LIBVIRT Version:        libvirt-1.2.8-16.el7_1.4
VDSM Version:   vdsm-4.16.26-0.el7.centos
SPICE Version:  0.12.4 - 9.el7_1.3
GlusterFS Version:      glusterfs-3.7.5-1.el7
You are running old versions, missing lot of fixes. Nothing specific to your
problem but this lower the chance to get a working system.
It would be nice if you can upgrade to ovirt-3.6 and report if it made any
change.
Or at lest latest ovirt-3.5.
...
We use a locally exported gluster as storage domain (eg, storage is on the
same machine exposed via gluster). No replica.
We run around 50 VMs on that host.
Why use gluster for this? Do you plan to add more gluster servers in the
future?
Nir

Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users