Re: [ovirt-users] Some VMs in status "not responding" in oVirt interface

15 Sep 2015

      Hello Daniel,

this is exactly what I experienced in the past weeks. 
I switched the NIC and the HDD from e1000 and IDE to VirtIO NIC and VirtIO disk for all Windows Server 2012 R2 VMs, they are running for 2 days now without problems.
Additionally 2 of my CentOS VMs stopped responding today, and this was a bit scary: the VM itself was running and I could connect to the console (I intentionally logged in as root yesterday and didn't log out, so I could have a look today what was the problem) . Network was down (pinging anything was unsuccessful), and every action concerning reading from or writing to the harddisk immediately hung. So I reset the VM (oVirt wasn't able to shut it down, so I had to kill the process) and had a look at /var/log/messages after booting up.
The last entry was this night at 03:01, cron.daily. Nothing else until I rebooted this morning at 08:30.

The VM is and always was configured with both VirtIO NIC and HDD.

So what could be the reason that the VM couldn't access the harddisk anymore?

Best regards, Christian 

-----Ursprüngliche Nachricht-----
Von: Daniel Helgenberger [mailto:daniel.helgenberger@m-box.de] 
Gesendet: Dienstag, 15. September 2015 16:15
An: Christian Hailer <christian@hailer.eu>; users@ovirt.org; ydary@redhat.com
Betreff: Re: [ovirt-users] Some VMs in status "not responding" in oVirt interface

Hello,

I do not want to hijack the thread but maybe my issue is related?

It might have started with ovirt 3.5.3; but I cannot tell for sure.

For me, one vm (foreman) is affected; the second time in 14 days. I can confirm this as I also loose any network connection to the VM and the ability to connect a console.
Also, the only thing witch 'fixes' the issue is right now 'kill -9 <pid of qemu-kvm process>'

As far as I can tell the VM became unresponsive at around Sep 15 12:30:01; engine logged this at 12:34. Nothing obvious in VDSM logs (see attached).

Below the engine.log part.

Versions:
ovirt-engine-3.5.4.2-1.el7.centos.noarch

vdsm-4.16.26-0.el7.centos
libvirt-1.2.8-16.el7_1.3

engine.log (1200 - 1300:
2015-09-15 12:03:47,949 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-56) [264d502a] HA reservation status for cluster Default is OK
2015-09-15 12:08:02,708 INFO  [org.ovirt.engine.core.bll.OvfDataUpdater] (DefaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update VMs/Templates Ovf.
2015-09-15 12:08:02,709 INFO  [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] (DefaultQuartzScheduler_Worker-89)
[5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities affected :  ID:
00000002-0002-0002-0002-000000000088 Type: l
2015-09-15 12:08:02,780 INFO  [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] (DefaultQuartzScheduler_Worker-89)
[5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks= key: 00000002-0002-0002-0002-000000000088 value: OVF_UPDATE
2015-09-15 12:08:47,997 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-21) [3fc854a2] HA reservation status for cluster Default is OK
2015-09-15 12:13:06,998 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand] (org.ovirt.thread.pool-8-thread-48)
[50221cdc] START, GetFileStatsVDSCommand( storagePoolId = 00000002-0002-0002-0002-000000000088, ignoreFailoverLimit = false), log id: 1503968
2015-09-15 12:13:07,137 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand] (org.ovirt.thread.pool-8-thread-48)
[50221cdc] FINISH, GetFileStatsVDSCommand, return: {pfSense-2.0-RELEASE-i386.iso={status=0, ctime=1432286887.0, size=115709952},
Fedora-15-i686-Live8
2015-09-15 12:13:07,178 INFO  [org.ovirt.engine.core.bll.IsoDomainListSyncronizer] (org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished automatic refresh process for ISO file type with success, for storage domain id 84dcb2fc-fb63-442f-aa77-3e84dc7d5a72.
2015-09-15 12:13:48,043 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA reservation status for cluster Default is OK
2015-09-15 12:18:48,088 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-44) [6345e698] HA reservation status for cluster Default is OK
2015-09-15 12:23:48,137 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-13) HA reservation status for cluster Default is OK
2015-09-15 12:28:48,183 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-76) [154c91d5] HA reservation status for cluster Default is OK
2015-09-15 12:33:48,229 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-36) [27c73ac6] HA reservation status for cluster Default is OK
2015-09-15 12:34:49,432 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM foreman 8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up --> NotResponding
2015-09-15 12:34:49,578 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-41)
[5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM foreman is not responding.
2015-09-15 12:38:48,273 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-10) [7a800766] HA reservation status for cluster Default is OK
2015-09-15 12:43:48,320 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-42) [440f1c40] HA reservation status for cluster Default is OK
2015-09-15 12:48:48,366 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-70) HA reservation status for cluster Default is OK
2015-09-15 12:53:48,412 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-12) [50221cdc] HA reservation status for cluster Default is OK
2015-09-15 12:58:48,459 INFO  [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-3) HA reservation status for cluster Default is OK

On 29.08.2015 22:48, Christian Hailer wrote:
...
Hello,
last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a 
single Centos
7 server, so I started by suspending the VMs in order to set the oVirt 
engine host to maintenance mode. During the process of suspending the 
VMs the server crashed, kernel panic…
After restarting the server I installed the updates via yum an 
restarted the server again. Afterwards, all the VMs could be started 
again. Some hours later my monitoring system registered some 
unresponsive hosts, I had a look in the oVirt interface, 3 of the VMs 
were in the state “not responding”, marked by a question mark.
I tried to shut down the VMs, but oVirt wasn’t able to do so. I tried 
to reset the status in the database with the sql statement
update vm_dynamic set status = 0 where vm_guid = (select vm_guid from 
vm_static where vm_name = 'MYVMNAME');
but that didn’t help, either. Only rebooting the whole hypervisor 
helped… afterwards everything worked again. But only for a few hours, 
then one of the VMs entered the “not responding” state again… again only a reboot helped.
Yesterday it happened again:
2015-08-28 17:44:22,664 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC
0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding
2015-08-28 17:44:22,692 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM DC is not responding.
Does anybody know what I can do? Where should I have a look? Hints are 
greatly appreciated!
Thanks,
Christian
--
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN

www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767