[ovirt-users] Some VMs in status "not responding" in oVirt interface

Christian Hailer christian at hailer.eu
Wed Sep 9 11:24:14 UTC 2015


Hello,

 

unfortunately I still have this problem… 

Last week I checked all the hardware components. It’s a HP DL580 Gen8 Server, 128GB RAM, 4TB storage.

The firmware of all components is up to date.

I ran a full check of all harddrives, CPUs etc., no problems detected.

 

This night 3 VMs stopped responding again, so I had to reboot the server this morning to regain access. Some minutes ago 2 VMs stopped responding…

 

The logs just show that the VMs aren’t responding anymore, nothing else… does anybody have an idea how I can debug this issue any further?

 

OS: CentOS Linux release 7.1.1503

 

>rpm -qa|grep ovirt

ovirt-iso-uploader-3.5.2-1.el7.centos.noarch

ovirt-engine-setup-3.5.4.2-1.el7.centos.noarch

ovirt-guest-tools-iso-3.5-7.noarch

ovirt-log-collector-3.5.4-2.el7.centos.noarch

ovirt-engine-userportal-3.5.4.2-1.el7.centos.noarch

ovirt-engine-cli-3.5.0.6-1.el7.centos.noarch

ovirt-engine-tools-3.5.4.2-1.el7.centos.noarch

ovirt-release35-005-1.noarch

ovirt-engine-lib-3.5.4.2-1.el7.centos.noarch

ovirt-engine-setup-plugin-ovirt-engine-common-3.5.4.2-1.el7.centos.noarch

ovirt-host-deploy-java-1.3.2-1.el7.centos.noarch

ovirt-engine-extensions-api-impl-3.5.4.2-1.el7.centos.noarch

ovirt-engine-webadmin-portal-3.5.4.2-1.el7.centos.noarch

ovirt-engine-restapi-3.5.4.2-1.el7.centos.noarch

ovirt-engine-setup-base-3.5.4.2-1.el7.centos.noarch

ovirt-engine-backend-3.5.4.2-1.el7.centos.noarch

ovirt-engine-setup-plugin-websocket-proxy-3.5.4.2-1.el7.centos.noarch

ovirt-host-deploy-1.3.2-1.el7.centos.noarch

ovirt-engine-websocket-proxy-3.5.4.2-1.el7.centos.noarch

ovirt-engine-dbscripts-3.5.4.2-1.el7.centos.noarch

ovirt-engine-jboss-as-7.1.1-1.el7.x86_64

ovirt-engine-sdk-python-3.5.4.0-1.el7.centos.noarch

ovirt-engine-setup-plugin-ovirt-engine-3.5.4.2-1.el7.centos.noarch

ovirt-image-uploader-3.5.1-1.el7.centos.noarch

ovirt-engine-3.5.4.2-1.el7.centos.noarch

 

>rpm -qa|grep vdsm

vdsm-python-4.16.26-0.el7.centos.noarch

vdsm-jsonrpc-java-1.0.15-1.el7.noarch

vdsm-jsonrpc-4.16.26-0.el7.centos.noarch

vdsm-yajsonrpc-4.16.26-0.el7.centos.noarch

vdsm-xmlrpc-4.16.26-0.el7.centos.noarch

vdsm-cli-4.16.26-0.el7.centos.noarch

vdsm-4.16.26-0.el7.centos.x86_64

vdsm-python-zombiereaper-4.16.26-0.el7.centos.noarch

 

>rpm -qa|grep kvm

qemu-kvm-ev-2.1.2-23.el7_1.8.1.x86_64

qemu-kvm-common-ev-2.1.2-23.el7_1.8.1.x86_64

libvirt-daemon-kvm-1.2.8-16.el7_1.3.x86_64

qemu-kvm-tools-ev-2.1.2-23.el7_1.8.1.x86_64

 

>uname -a 

Linux ovirt 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

 

Any feedback is much appreciated!!

 

Best regards, Christian

 

Von: users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] Im Auftrag von Christian Hailer
Gesendet: Samstag, 29. August 2015 22:48
An: users at ovirt.org
Betreff: [ovirt-users] Some VMs in status "not responding" in oVirt interface

 

Hello,

 

last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a single Centos 7 server, so I started by suspending the VMs in order to set the oVirt engine host to maintenance mode. During the process of suspending the VMs the server crashed, kernel panic…

After restarting the server I installed the updates via yum an restarted the server again. Afterwards, all the VMs could be started again. Some hours later my monitoring system registered some unresponsive hosts, I had a look in the oVirt interface, 3 of the VMs were in the state “not responding”, marked by a question mark. 

I tried to shut down the VMs, but oVirt wasn’t able to do so. I tried to reset the status in the database with the sql statement

 

update vm_dynamic set status = 0 where vm_guid = (select vm_guid from vm_static where vm_name = 'MYVMNAME');

 

but that didn’t help, either. Only rebooting the whole hypervisor helped… afterwards everything worked again. But only for a few hours, then one of the VMs entered the “not responding” state again… again only a reboot helped. Yesterday it happened again:

 

2015-08-28 17:44:22,664 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC 0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding

2015-08-28 17:44:22,692 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM DC is not responding.

 

Does anybody know what I can do? Where should I have a look? Hints are greatly appreciated!

 

Thanks,

Christian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150909/424d6331/attachment-0001.html>


More information about the Users mailing list