Hello Daniel,
this is exactly what I experienced in the past weeks.
I switched the NIC and the HDD from e1000 and IDE to VirtIO NIC and VirtIO disk for all
Windows Server 2012 R2 VMs, they are running for 2 days now without problems.
Additionally 2 of my CentOS VMs stopped responding today, and this was a bit scary: the VM
itself was running and I could connect to the console (I intentionally logged in as root
yesterday and didn't log out, so I could have a look today what was the problem) .
Network was down (pinging anything was unsuccessful), and every action concerning reading
from or writing to the harddisk immediately hung. So I reset the VM (oVirt wasn't able
to shut it down, so I had to kill the process) and had a look at /var/log/messages after
booting up.
The last entry was this night at 03:01, cron.daily. Nothing else until I rebooted this
morning at 08:30.
The VM is and always was configured with both VirtIO NIC and HDD.
So what could be the reason that the VM couldn't access the harddisk anymore?
Best regards, Christian
-----Ursprüngliche Nachricht-----
Von: Daniel Helgenberger [mailto:daniel.helgenberger@m-box.de]
Gesendet: Dienstag, 15. September 2015 16:15
An: Christian Hailer <christian(a)hailer.eu>; users(a)ovirt.org; ydary(a)redhat.com
Betreff: Re: [ovirt-users] Some VMs in status "not responding" in oVirt
interface
Hello,
I do not want to hijack the thread but maybe my issue is related?
It might have started with ovirt 3.5.3; but I cannot tell for sure.
For me, one vm (foreman) is affected; the second time in 14 days. I can confirm this as I
also loose any network connection to the VM and the ability to connect a console.
Also, the only thing witch 'fixes' the issue is right now 'kill -9 <pid of
qemu-kvm process>'
As far as I can tell the VM became unresponsive at around Sep 15 12:30:01; engine logged
this at 12:34. Nothing obvious in VDSM logs (see attached).
Below the engine.log part.
Versions:
ovirt-engine-3.5.4.2-1.el7.centos.noarch
vdsm-4.16.26-0.el7.centos
libvirt-1.2.8-16.el7_1.3
engine.log (1200 - 1300:
2015-09-15 12:03:47,949 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-56) [264d502a] HA reservation status for cluster Default is
OK
2015-09-15 12:08:02,708 INFO [org.ovirt.engine.core.bll.OvfDataUpdater]
(DefaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update VMs/Templates Ovf.
2015-09-15 12:08:02,709 INFO
[org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]
(DefaultQuartzScheduler_Worker-89)
[5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities
affected : ID:
00000002-0002-0002-0002-000000000088 Type: l
2015-09-15 12:08:02,780 INFO
[org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]
(DefaultQuartzScheduler_Worker-89)
[5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks= key:
00000002-0002-0002-0002-000000000088 value: OVF_UPDATE
2015-09-15 12:08:47,997 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-21) [3fc854a2] HA reservation status for cluster Default is
OK
2015-09-15 12:13:06,998 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]
(org.ovirt.thread.pool-8-thread-48)
[50221cdc] START, GetFileStatsVDSCommand( storagePoolId =
00000002-0002-0002-0002-000000000088, ignoreFailoverLimit = false), log id: 1503968
2015-09-15 12:13:07,137 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]
(org.ovirt.thread.pool-8-thread-48)
[50221cdc] FINISH, GetFileStatsVDSCommand, return:
{pfSense-2.0-RELEASE-i386.iso={status=0, ctime=1432286887.0, size=115709952},
Fedora-15-i686-Live8
2015-09-15 12:13:07,178 INFO [org.ovirt.engine.core.bll.IsoDomainListSyncronizer]
(org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished automatic refresh process for ISO
file type with success, for storage domain id 84dcb2fc-fb63-442f-aa77-3e84dc7d5a72.
2015-09-15 12:13:48,043 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA reservation status for cluster Default is
OK
2015-09-15 12:18:48,088 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-44) [6345e698] HA reservation status for cluster Default is
OK
2015-09-15 12:23:48,137 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-13) HA reservation status for cluster Default is OK
2015-09-15 12:28:48,183 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-76) [154c91d5] HA reservation status for cluster Default is
OK
2015-09-15 12:33:48,229 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-36) [27c73ac6] HA reservation status for cluster Default is
OK
2015-09-15 12:34:49,432 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM foreman
8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up --> NotResponding
2015-09-15 12:34:49,578 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-41)
[5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM
foreman is not responding.
2015-09-15 12:38:48,273 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-10) [7a800766] HA reservation status for cluster Default is
OK
2015-09-15 12:43:48,320 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-42) [440f1c40] HA reservation status for cluster Default is
OK
2015-09-15 12:48:48,366 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-70) HA reservation status for cluster Default is OK
2015-09-15 12:53:48,412 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-12) [50221cdc] HA reservation status for cluster Default is
OK
2015-09-15 12:58:48,459 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler_Worker-3) HA reservation status for cluster Default is OK
On 29.08.2015 22:48, Christian Hailer wrote:
Hello,
last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a
single Centos
7 server, so I started by suspending the VMs in order to set the oVirt
engine host to maintenance mode. During the process of suspending the
VMs the server crashed, kernel panic…
After restarting the server I installed the updates via yum an
restarted the server again. Afterwards, all the VMs could be started
again. Some hours later my monitoring system registered some
unresponsive hosts, I had a look in the oVirt interface, 3 of the VMs
were in the state “not responding”, marked by a question mark.
I tried to shut down the VMs, but oVirt wasn’t able to do so. I tried
to reset the status in the database with the sql statement
update vm_dynamic set status = 0 where vm_guid = (select vm_guid from
vm_static where vm_name = 'MYVMNAME');
but that didn’t help, either. Only rebooting the whole hypervisor
helped… afterwards everything worked again. But only for a few hours,
then one of the VMs entered the “not responding” state again… again only a reboot
helped.
Yesterday it happened again:
2015-08-28 17:44:22,664 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC
0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding
2015-08-28 17:44:22,692 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM DC is not responding.
Does anybody know what I can do? Where should I have a look? Hints are
greatly appreciated!
Thanks,
Christian
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767