Hello Christian,
just a quick round up:
Did you still see the issue? It stopped for me after removing live snap shots.
On 17.09.2015 07:39, Christian Hailer wrote:
Hi,
just to get it straight: most of my VMs had one or more existing snapshots. Do you think
this is a problem currently? If I understand it correctly the BZ of Markus concerns only a
short period of time while removing a snapshot, but my VMs stopped responding in the
middle of the night without any interaction...
I deleted all the snapshots, just in case :) my system is running fine for nearly three
days now, I'm not quite sure but I think it helped that I changed the HDD and NIC of
the Windows 2012 VMs to VirtIO devices...
Best regards, Christian
-----Ursprüngliche Nachricht-----
Von: Daniel Helgenberger [mailto:daniel.helgenberger@m-box.de]
Gesendet: Dienstag, 15. September 2015 22:24
An: Markus Stockhausen <stockhausen(a)collogia.de>; Christian Hailer
<christian(a)hailer.eu>
Cc: ydary(a)redhat.com; users(a)ovirt.org
Betreff: Re: AW: [ovirt-users] Some VMs in status "not responding" in oVirt
interface
On 15.09.2015 21:31, Markus Stockhausen wrote:
> Hi Christian,
>
> I think of a package similar like this:
>
> qemu-debuginfo.x86_64 2:2.1.3-10.fc21
>
> That allows gdb to show information about backtrace symbols. See
> comment 12 of
https://bugzilla.redhat.com/show_bug.cgi?id=1262251
> Makes error search much simpler - especially if qemu hangs.
Markus, thanks for the BZ. I think I do see the same issue. Actually my VM is currently
the only with a live snapshot and (puppetmaster) does a lot of I/O.
Christian, maybe this BZ1262251 also applicable?
I'll go ahead and delete the live snapshot. If I see this issue again I will submit
the trace to your BZ.
>
> Markus
>
> **********************************
>
> Von: Christian Hailer [christian(a)hailer.eu]
>
> Gesendet: Dienstag, 15. September 2015 21:24
>
> An: Markus Stockhausen; 'Daniel Helgenberger'
>
> Cc: ydary(a)redhat.com; users(a)ovirt.org
>
> Betreff: AW: [ovirt-users] Some VMs in status "not responding" in
> oVirt interface
>
>
>
>
>
>
>
>
>
>
> Hi Markus,
>
> gdb is available on CentOS 7, but what do you mean by qemu-debug? I Installed
qemu-kvm-tools, maybe this is the pendant for CentOS?
>
> qemu-kvm-tools.x86_64 : KVM debugging and diagnostics tools
> qemu-kvm-tools-ev.x86_64 : KVM debugging and diagnostics tools
> qemu-kvm-tools-rhev.x86_64 : KVM debugging and diagnostics tools
>
> Regards, Christian
>
>
>
>
>
> Von: Markus Stockhausen [mailto:stockhausen@collogia.de]
>
>
> Gesendet: Dienstag, 15. September 2015 20:40
>
> An: Daniel Helgenberger <daniel.helgenberger(a)m-box.de>
>
> Cc: Christian Hailer <christian(a)hailer.eu>; ydary(a)redhat.com;
> users(a)ovirt.org
>
> Betreff: Re: [ovirt-users] Some VMs in status "not responding" in
> oVirt interface
>
>
>
> Do you have a chance to install qemu-debug? If yes I would try a backtrace.
> gdb -p <qemu-pid>
>
> # bt
> Markus
>
>
> Am 15.09.2015 4:15 nachm. schrieb Daniel Helgenberger
<daniel.helgenberger(a)m-box.de>:
>
>
>
>
>
> Hello,
>
>
>
> I do not want to hijack the thread but maybe my issue is related?
>
>
>
> It might have started with ovirt 3.5.3; but I cannot tell for sure.
>
>
>
> For me, one vm (foreman) is affected; the second time in 14 days. I
> can confirm this as I also loose any network connection to the VM and
>
> the ability to connect a console.
>
> Also, the only thing witch 'fixes' the issue is right now 'kill -9
<pid of qemu-kvm process>'
>
>
>
> As far as I can tell the VM became unresponsive at around Sep 15
> 12:30:01; engine logged this at 12:34. Nothing obvious in VDSM logs
> (see
>
> attached).
>
>
>
> Below the engine.log part.
>
>
>
> Versions:
>
> ovirt-engine-3.5.4.2-1.el7.centos.noarch
>
>
>
> vdsm-4.16.26-0.el7.centos
>
> libvirt-1.2.8-16.el7_1.3
>
>
>
> engine.log (1200 - 1300:
>
> 2015-09-15 12:03:47,949 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-56) [264d502a] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:08:02,708 INFO
> [org.ovirt.engine.core.bll.OvfDataUpdater]
> (DefaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update
>
> VMs/Templates Ovf.
>
> 2015-09-15 12:08:02,709 INFO
> [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]
> (DefaultQuartzScheduler_Worker-89)
>
> [5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true.
Entities affected : ID:
>
> 00000002-0002-0002-0002-000000000088 Type: l
>
> 2015-09-15 12:08:02,780 INFO
> [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]
> (DefaultQuartzScheduler_Worker-89)
>
> [5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks= key:
> 00000002-0002-0002-0002-000000000088 value: OVF_UPDATE
>
> 2015-09-15 12:08:47,997 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-21) [3fc854a2] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:13:06,998 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]
> (org.ovirt.thread.pool-8-thread-48)
>
> [50221cdc] START, GetFileStatsVDSCommand( storagePoolId =
> 00000002-0002-0002-0002-000000000088, ignoreFailoverLimit = false),
> log id: 1503968
>
> 2015-09-15 12:13:07,137 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]
> (org.ovirt.thread.pool-8-thread-48)
>
> [50221cdc] FINISH, GetFileStatsVDSCommand, return:
> {pfSense-2.0-RELEASE-i386.iso={status=0, ctime=1432286887.0,
> size=115709952},
>
> Fedora-15-i686-Live8
>
> 2015-09-15 12:13:07,178 INFO
> [org.ovirt.engine.core.bll.IsoDomainListSyncronizer]
> (org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished
>
> automatic refresh process for ISO file type with success, for storage domain id
84dcb2fc-fb63-442f-aa77-3e84dc7d5a72.
>
> 2015-09-15 12:13:48,043 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:18:48,088 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-44) [6345e698] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:23:48,137 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-13) HA reservation
>
> status for cluster Default is OK
>
> 2015-09-15 12:28:48,183 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-76) [154c91d5] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:33:48,229 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-36) [27c73ac6] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:34:49,432 INFO
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM
>
> foreman 8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up -->
> NotResponding
>
> 2015-09-15 12:34:49,578 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-41)
>
> [5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM
foreman is not responding.
>
> 2015-09-15 12:38:48,273 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-10) [7a800766] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:43:48,320 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-42) [440f1c40] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:48:48,366 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-70) HA reservation
>
> status for cluster Default is OK
>
> 2015-09-15 12:53:48,412 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-12) [50221cdc] HA
>
> reservation status for cluster Default is OK
>
> 2015-09-15 12:58:48,459 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler_Worker-3) HA reservation
>
> status for cluster Default is OK
>
>
>
>
>
>
>
> On 29.08.2015 22:48, Christian Hailer wrote:
>
>> Hello,
>
>>
>
>> last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a
>> single Centos
>
>
>> 7 server, so I started by suspending the VMs in order to set the
>> oVirt engine
>
>> host to maintenance mode. During the process of suspending the VMs
>> the server
>
>> crashed, kernel panic…
>
>>
>
>> After restarting the server I installed the updates via yum an
>> restarted the
>
>> server again. Afterwards, all the VMs could be started again. Some
>> hours later
>
>> my monitoring system registered some unresponsive hosts, I had a look
>> in the
>
>> oVirt interface, 3 of the VMs were in the state “not responding”,
>> marked by a
>
>> question mark.
>
>>>
>> I tried to shut down the VMs, but oVirt wasn’t able to do so. I tried
>> to reset
>
>> the status in the database with the sql statement
>
>>
>
>> update vm_dynamic set status = 0 where vm_guid = (select vm_guid from
>> vm_static
>
>
>> where vm_name = 'MYVMNAME');
>
>>
>
>> but that didn’t help, either. Only rebooting the whole hypervisor
>> helped…
>
>> afterwards everything worked again. But only for a few hours, then
>> one of the
>
>> VMs entered the “not responding” state again… again only a reboot helped.
>
>> Yesterday it happened again:
>
>>
>
>> 2015-08-28 17:44:22,664 INFO
>
>> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
>
>> (DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC
>
>> 0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding
>
>>
>
>> 2015-08-28 17:44:22,692 WARN
>
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector
>> ]
>
>> (DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call Stack:
>
>
>> null, Custom Event ID: -1, Message: VM DC is not responding.
>
>>
>
>> Does anybody know what I can do? Where should I have a look? Hints
>> are greatly
>
>> appreciated!
>
>>
>
>> Thanks,
>
>>
>
>> Christian
>
>>
>
>
>
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767