
Hello,=0A= =0A= =0A= =0A= last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a single= Centos=0A= =0A= =0A= 7 server, so I started by suspending the VMs in order to set the oVirt en= gine =0A= =0A= host to maintenance mode. During the process of suspending the VMs the se= rver =0A= =0A= crashed, kernel panic=85=0A= =0A= =0A= =0A= After restarting the server I installed the updates via yum an restarted =
server again. Afterwards, all the VMs could be started again. Some hours = later =0A= =0A= my monitoring system registered some unresponsive hosts, I had a look in =
oVirt interface, 3 of the VMs were in the state =93not responding=94, mar= ked by a =0A= =0A= question mark.=0A= =0A= =0A= =0A= I tried to shut down the VMs, but oVirt wasn=92t able to do so. I tried t= o reset =0A= =0A= the status in the database with the sql statement=0A= =0A= =0A= =0A= update vm_dynamic set status =3D 0 where vm_guid =3D (select vm_guid from= vm_static=0A= =0A= =0A= where vm_name =3D 'MYVMNAME');=0A= =0A= =0A= =0A= but that didn=92t help, either. Only rebooting the whole hypervisor helpe= d=85 =0A= =0A= afterwards everything worked again. But only for a few hours, then one of=
This is a multi-part message in MIME format. ------=_NextPartTM-000-c61d044c-4b16-4064-ab82-f5449098e861 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Christian,=0A= =0A= I think of a package similar like this:=0A= =0A= qemu-debuginfo.x86_64 2:2.1.3-10.fc21=0A= =0A= That allows gdb to show information about backtrace symbols. See=0A= comment 12 of https://bugzilla.redhat.com/show_bug.cgi?id=3D1262251=0A= Makes error search much simpler - especially if qemu hangs.=0A= =0A= Markus=0A= =0A= **********************************=0A= =0A= Von: Christian Hailer [christian@hailer.eu]=0A= =0A= Gesendet: Dienstag, 15. September 2015 21:24=0A= =0A= An: Markus Stockhausen; 'Daniel Helgenberger'=0A= =0A= Cc: ydary@redhat.com; users@ovirt.org=0A= =0A= Betreff: AW: [ovirt-users] Some VMs in status "not responding" in oVirt int= erface=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= Hi Markus,=0A= =0A= gdb is available on CentOS 7, but what do you mean by qemu-debug? I Install= ed qemu-kvm-tools, maybe this is the pendant for CentOS?=0A= =0A= qemu-kvm-tools.x86_64 : KVM debugging and diagnostics tools=0A= qemu-kvm-tools-ev.x86_64 : KVM debugging and diagnostics tools=0A= qemu-kvm-tools-rhev.x86_64 : KVM debugging and diagnostics tools=0A= =0A= Regards, Christian=0A= =0A= =0A= =0A= =0A= =0A= Von: Markus Stockhausen [mailto:stockhausen@collogia.de]=0A= =0A= =0A= Gesendet: Dienstag, 15. September 2015 20:40=0A= =0A= An: Daniel Helgenberger <daniel.helgenberger@m-box.de>=0A= =0A= Cc: Christian Hailer <christian@hailer.eu>; ydary@redhat.com; users@ovirt.o= rg=0A= =0A= Betreff: Re: [ovirt-users] Some VMs in status "not responding" in oVirt int= erface=0A= =0A= =0A= =0A= Do you have a chance to install qemu-debug? If yes I would try a backtrace.= =0A= gdb -p <qemu-pid>=0A= =0A= # bt=0A= Markus=0A= =0A= =0A= Am 15.09.2015 4:15 nachm. schrieb Daniel Helgenberger <daniel.helgenberger@= m-box.de>:=0A= =0A= =0A= =0A= =0A= =0A= Hello,=0A= =0A= =0A= =0A= I do not want to hijack the thread but maybe my issue is related?=0A= =0A= =0A= =0A= It might have started with ovirt 3.5.3; but I cannot tell for sure.=0A= =0A= =0A= =0A= For me, one vm (foreman) is affected; the second time in 14 days. I can con= firm this as I also loose any network connection to the VM and=0A= =0A= the ability to connect a console.=0A= =0A= Also, the only thing witch 'fixes' the issue is right now 'kill -9 <pid of = qemu-kvm process>'=0A= =0A= =0A= =0A= As far as I can tell the VM became unresponsive at around Sep 15 12:30:01; = engine logged this at 12:34. Nothing obvious in VDSM logs (see=0A= =0A= attached).=0A= =0A= =0A= =0A= Below the engine.log part.=0A= =0A= =0A= =0A= Versions:=0A= =0A= ovirt-engine-3.5.4.2-1.el7.centos.noarch=0A= =0A= =0A= =0A= vdsm-4.16.26-0.el7.centos=0A= =0A= libvirt-1.2.8-16.el7_1.3=0A= =0A= =0A= =0A= engine.log (1200 - 1300:=0A= =0A= 2015-09-15 12:03:47,949 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-56) [264d502a] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:08:02,708 INFO [org.ovirt.engine.core.bll.OvfDataUpdater] (D= efaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update=0A= =0A= VMs/Templates Ovf.=0A= =0A= 2015-09-15 12:08:02,709 INFO [org.ovirt.engine.core.bll.ProcessOvfUpdateFo= rStoragePoolCommand] (DefaultQuartzScheduler_Worker-89)=0A= =0A= [5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand internal:= true. Entities affected : ID:=0A= =0A= 00000002-0002-0002-0002-000000000088 Type: l=0A= =0A= 2015-09-15 12:08:02,780 INFO [org.ovirt.engine.core.bll.ProcessOvfUpdateFo= rStoragePoolCommand] (DefaultQuartzScheduler_Worker-89)=0A= =0A= [5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks=3D key: 00000002= -0002-0002-0002-000000000088 value: OVF_UPDATE=0A= =0A= 2015-09-15 12:08:47,997 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-21) [3fc854a2] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:13:06,998 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.Ge= tFileStatsVDSCommand] (org.ovirt.thread.pool-8-thread-48)=0A= =0A= [50221cdc] START, GetFileStatsVDSCommand( storagePoolId =3D 00000002-0002-0= 002-0002-000000000088, ignoreFailoverLimit =3D false), log id: 1503968=0A= =0A= 2015-09-15 12:13:07,137 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.Ge= tFileStatsVDSCommand] (org.ovirt.thread.pool-8-thread-48)=0A= =0A= [50221cdc] FINISH, GetFileStatsVDSCommand, return: {pfSense-2.0-RELEASE-i38= 6.iso=3D{status=3D0, ctime=3D1432286887.0, size=3D115709952},=0A= =0A= Fedora-15-i686-Live8=0A= =0A= 2015-09-15 12:13:07,178 INFO [org.ovirt.engine.core.bll.IsoDomainListSyncr= onizer] (org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished=0A= =0A= automatic refresh process for ISO file type with success, for storage domai= n id 84dcb2fc-fb63-442f-aa77-3e84dc7d5a72.=0A= =0A= 2015-09-15 12:13:48,043 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:18:48,088 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-44) [6345e698] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:23:48,137 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-13) HA reservation=0A= =0A= status for cluster Default is OK=0A= =0A= 2015-09-15 12:28:48,183 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-76) [154c91d5] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:33:48,229 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-36) [27c73ac6] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:34:49,432 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRun= TimeInfo] (DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM=0A= =0A= foreman 8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up --> NotRespondin= g=0A= =0A= 2015-09-15 12:34:49,578 WARN [org.ovirt.engine.core.dal.dbbroker.auditlogh= andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-41)=0A= =0A= [5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Mes= sage: VM foreman is not responding.=0A= =0A= 2015-09-15 12:38:48,273 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-10) [7a800766] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:43:48,320 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-42) [440f1c40] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:48:48,366 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-70) HA reservation=0A= =0A= status for cluster Default is OK=0A= =0A= 2015-09-15 12:53:48,412 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-12) [50221cdc] HA=0A= =0A= reservation status for cluster Default is OK=0A= =0A= 2015-09-15 12:58:48,459 INFO [org.ovirt.engine.core.bll.scheduling.HaReser= vationHandling] (DefaultQuartzScheduler_Worker-3) HA reservation=0A= =0A= status for cluster Default is OK=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= On 29.08.2015 22:48, Christian Hailer wrote:=0A= =0A= the =0A= =0A= the =0A= =0A= the =0A= =0A=
VMs entered the =93not responding=94 state again=85 again only a reboot h= elped. =0A= =0A= Yesterday it happened again:=0A= =0A= =0A= =0A= 2015-08-28 17:44:22,664 INFO =0A= =0A= [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] =0A= =0A= (DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC =0A= =0A= 0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding=0A= =0A= =0A= =0A= 2015-08-28 17:44:22,692 WARN =0A= =0A= [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] = =0A= =0A= (DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call = Stack:=0A= =0A= =0A= null, Custom Event ID: -1, Message: VM DC is not responding.=0A= =0A= =0A= =0A= Does anybody know what I can do? Where should I have a look? Hints are gr= eatly =0A= =0A= appreciated!=0A= =0A= =0A= =0A= Thanks,=0A= =0A= =0A= =0A= Christian=0A= =0A= =0A= =0A= =0A= =0A= -- =0A= =0A= Daniel Helgenberger=0A= =0A= m box bewegtbild GmbH=0A= =0A= =0A= =0A= P: +49/30/2408781-22=0A= =0A= F: +49/30/2408781-10=0A= =0A= =0A= =0A= ACKERSTR. 19=0A= =0A= D-10115 BERLIN=0A= =0A= =0A= =0A= =0A= =0A= www.m-box.de =0A= =0A= www.monkeymen.tv=0A= =0A= =0A= =0A= Gesch=E4ftsf=FChrer: Martin Retschitzegger / Michaela G=F6llner=0A= =0A= Handeslregister: Amtsgericht Charlottenburg / HRB 112767=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= ------=_NextPartTM-000-c61d044c-4b16-4064-ab82-f5449098e861 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt"
**************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-c61d044c-4b16-4064-ab82-f5449098e861--