This is a multi-part message in MIME format.
------=_NextPartTM-000-4a9428dc-2b1c-4caf-8ee3-a447a009db9b
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Hi Christian,=0A=
=0A=
you are right. These are different errors. In the past we had VM hangs or c=
rashes from time to time.=0A=
So we implemented 2 features for mitigation. That helped to analyze a lot o=
f problems.=0A=
=0A=
1) Install debug packages to get backtraces in case of stalled qemu process=
es.=0A=
2) Setup the hosts to generate coredumps for qemu:=0A=
=0A=
a) /etc/security/limits.conf=0A=
* soft core unlimited=0A=
b) /usr/lib/systemd/system/libvirtd.service section Service=0A=
LimitCORE=3Dinfinity=0A=
c) create new before vdsm start hook to disable full qemu memory dump=0A=
libvirtpid=3D`ps -ef | grep libvirt | grep listen | awk '{ print $2 }'`=
=0A=
echo 0 > /proc/$libvirtpid/coredump_filter=0A=
=0A=
Best regards.=0A=
=0A=
Markus=0A=
=0A=
________________________________________=0A=
Von: Christian Hailer [christian(a)hailer.eu]=0A=
Gesendet: Donnerstag, 17. September 2015 07:39=0A=
An: 'Daniel Helgenberger'; Markus Stockhausen=0A=
Cc: ydary(a)redhat.com; users(a)ovirt.org=0A=
Betreff: AW: AW: [ovirt-users] Some VMs in status "not responding" in oVirt=
interface=0A=
=0A=
Hi,=0A=
=0A=
just to get it straight: most of my VMs had one or more existing snapshots.=
Do you think this is a problem currently? If I understand it correctly the=
BZ of Markus concerns only a short period of time while removing a snapsho=
t, but my VMs stopped responding in the middle of the night without any int=
eraction...=0A=
I deleted all the snapshots, just in case :) my system is running fine for =
nearly three days now, I'm not quite sure but I think it helped that I chan=
ged the HDD and NIC of the Windows 2012 VMs to VirtIO devices...=0A=
=0A=
Best regards, Christian=0A=
=0A=
-----Urspr=FCngliche Nachricht-----=0A=
Von: Daniel Helgenberger [mailto:daniel.helgenberger@m-box.de]=0A=
Gesendet: Dienstag, 15. September 2015 22:24=0A=
An: Markus Stockhausen <stockhausen(a)collogia.de>; Christian Hailer <christi=
an(a)hailer.eu>=0A=
Cc: ydary(a)redhat.com; users(a)ovirt.org=0A=
Betreff: Re: AW: [ovirt-users] Some VMs in status "not responding" in oVirt=
interface=0A=
=0A=
=0A=
=0A=
On 15.09.2015 21:31, Markus Stockhausen wrote:=0A=
Hi Christian,=0A=
=0A=
I think of a package similar like this:=0A=
=0A=
qemu-debuginfo.x86_64 2:2.1.3-10.fc21=0A=
=0A=
That allows gdb to show information about backtrace symbols. See=0A=
comment 12 of
https://bugzilla.redhat.com/show_bug.cgi?id=3D1262251=0A=
Makes error search much simpler - especially if qemu hangs.=0A=
=0A=
Markus, thanks for the BZ. I think I do see the same issue. Actually my VM =
is currently the only with a live snapshot and (puppetmaster) does a lot of=
I/O.=0A=
=0A=
Christian, maybe this BZ1262251 also applicable?=0A=
=0A=
I'll go ahead and delete the live snapshot. If I see this issue again I wil=
l submit the trace to your BZ.=0A=
=0A=
=0A=
=0A=
Markus=0A=
=0A=
**********************************=0A=
=0A=
Von: Christian Hailer [christian(a)hailer.eu]=0A=
=0A=
Gesendet: Dienstag, 15. September 2015 21:24=0A=
=0A=
An: Markus Stockhausen; 'Daniel Helgenberger'=0A=
=0A=
Cc: ydary(a)redhat.com; users(a)ovirt.org=0A=
=0A=
Betreff: AW: [ovirt-users] Some VMs in status "not responding" in=0A=
oVirt interface=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
Hi Markus,=0A=
=0A=
gdb is available on CentOS 7, but what do you mean by qemu-debug? I Insta=
lled
qemu-kvm-tools, maybe this is the pendant for CentOS?=0A=
=0A=
qemu-kvm-tools.x86_64 : KVM debugging and diagnostics tools=0A=
qemu-kvm-tools-ev.x86_64 : KVM debugging and diagnostics tools=0A=
qemu-kvm-tools-rhev.x86_64 : KVM debugging and diagnostics tools=0A=
=0A=
Regards, Christian=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
Von: Markus Stockhausen [mailto:stockhausen@collogia.de]=0A=
=0A=
=0A=
Gesendet: Dienstag, 15. September 2015 20:40=0A=
=0A=
An: Daniel Helgenberger <daniel.helgenberger(a)m-box.de>=0A=
=0A=
Cc: Christian Hailer <christian(a)hailer.eu>; ydary(a)redhat.com;=0A=
users(a)ovirt.org=0A=
=0A=
Betreff: Re: [ovirt-users] Some VMs in status "not responding" in=0A=
oVirt interface=0A=
=0A=
=0A=
=0A=
Do you have a chance to install qemu-debug? If yes I would try a backtrac=
e.=0A=
gdb -p <qemu-pid>=0A=
=0A=
# bt=0A=
Markus=0A=
=0A=
=0A=
Am 15.09.2015 4:15 nachm. schrieb Daniel Helgenberger <daniel.helgenberge=
r@m-box.de>:=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
Hello,=0A=
=0A=
=0A=
=0A=
I do not want to hijack the thread but maybe my issue is related?=0A=
=0A=
=0A=
=0A=
It might have started with ovirt 3.5.3; but I cannot tell for sure.=0A=
=0A=
=0A=
=0A=
For me, one vm (foreman) is affected; the second time in 14 days. I=0A=
can confirm this as I also loose any network connection to the VM and=0A=
=0A=
the ability to connect a console.=0A=
=0A=
Also, the only thing witch 'fixes' the issue is right now 'kill -9 <pid o=
f qemu-kvm process>'=0A=
=0A=
=0A=
=0A=
As far as I can tell the VM became unresponsive at around Sep 15=0A=
12:30:01; engine logged this at 12:34. Nothing obvious in VDSM logs=0A=
(see=0A=
=0A=
attached).=0A=
=0A=
=0A=
=0A=
Below the engine.log part.=0A=
=0A=
=0A=
=0A=
Versions:=0A=
=0A=
ovirt-engine-3.5.4.2-1.el7.centos.noarch=0A=
=0A=
=0A=
=0A=
vdsm-4.16.26-0.el7.centos=0A=
=0A=
libvirt-1.2.8-16.el7_1.3=0A=
=0A=
=0A=
=0A=
engine.log (1200 - 1300:=0A=
=0A=
2015-09-15 12:03:47,949 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-56) [264d502a] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:08:02,708 INFO=0A=
[org.ovirt.engine.core.bll.OvfDataUpdater]=0A=
(DefaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update=0A=
=0A=
VMs/Templates Ovf.=0A=
=0A=
2015-09-15 12:08:02,709 INFO=0A=
[org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]=0A=
(DefaultQuartzScheduler_Worker-89)=0A=
=0A=
[5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand interna=
l: true.
Entities affected : ID:=0A=
=0A=
00000002-0002-0002-0002-000000000088 Type: l=0A=
=0A=
2015-09-15 12:08:02,780 INFO=0A=
[org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand]=0A=
(DefaultQuartzScheduler_Worker-89)=0A=
=0A=
[5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks=3D key:=0A=
00000002-0002-0002-0002-000000000088 value: OVF_UPDATE=0A=
=0A=
2015-09-15 12:08:47,997 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-21) [3fc854a2] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:13:06,998 INFO=0A=
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]=0A=
(org.ovirt.thread.pool-8-thread-48)=0A=
=0A=
[50221cdc] START, GetFileStatsVDSCommand( storagePoolId =3D=0A=
00000002-0002-0002-0002-000000000088, ignoreFailoverLimit =3D false),=0A=
log id: 1503968=0A=
=0A=
2015-09-15 12:13:07,137 INFO=0A=
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand]=0A=
(org.ovirt.thread.pool-8-thread-48)=0A=
=0A=
[50221cdc] FINISH, GetFileStatsVDSCommand, return:=0A=
{pfSense-2.0-RELEASE-i386.iso=3D{status=3D0, ctime=3D1432286887.0,=0A=
size=3D115709952},=0A=
=0A=
Fedora-15-i686-Live8=0A=
=0A=
2015-09-15 12:13:07,178 INFO=0A=
[org.ovirt.engine.core.bll.IsoDomainListSyncronizer]=0A=
(org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished=0A=
=0A=
automatic refresh process for ISO file type with success, for storage dom=
ain id
84dcb2fc-fb63-442f-aa77-3e84dc7d5a72.=0A=
=0A=
2015-09-15 12:13:48,043 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:18:48,088 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-44) [6345e698] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:23:48,137 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-13) HA reservation=0A=
=0A=
status for cluster Default is OK=0A=
=0A=
2015-09-15 12:28:48,183 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-76) [154c91d5] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:33:48,229 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-36) [27c73ac6] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:34:49,432 INFO=0A=
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]=0A=
(DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM=0A=
=0A=
foreman 8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up -->=0A=
NotResponding=0A=
=0A=
2015-09-15 12:34:49,578 WARN=0A=
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]=0A=
(DefaultQuartzScheduler_Worker-41)=0A=
=0A=
[5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, M=
essage:
VM foreman is not responding.=0A=
=0A=
2015-09-15 12:38:48,273 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-10) [7a800766] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:43:48,320 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-42) [440f1c40] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:48:48,366 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-70) HA reservation=0A=
=0A=
status for cluster Default is OK=0A=
=0A=
2015-09-15 12:53:48,412 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-12) [50221cdc] HA=0A=
=0A=
reservation status for cluster Default is OK=0A=
=0A=
2015-09-15 12:58:48,459 INFO=0A=
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]=0A=
(DefaultQuartzScheduler_Worker-3) HA reservation=0A=
=0A=
status for cluster Default is OK=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
On 29.08.2015 22:48, Christian Hailer wrote:=0A=
=0A=
> Hello,=0A=
=0A=
>=0A=
=0A=
> last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a=0A=
> single Centos=0A=
=0A=
=0A=
> 7 server, so I started by suspending the VMs in order to set the=0A=
> oVirt engine=0A=
=0A=
> host to maintenance mode. During the process of suspending the VMs=0A=
> the server=0A=
=0A=
> crashed, kernel panic=85=0A=
=0A=
>=0A=
=0A=
> After restarting the server I installed the updates via yum an=0A=
> restarted the=0A=
=0A=
> server again. Afterwards, all the VMs could be started again. Some=0A=
> hours later=0A=
=0A=
> my monitoring system registered some unresponsive hosts, I had a look=0A=
> in the=0A=
=0A=
> oVirt interface, 3 of the VMs were in the state =93not responding=94,=0A=
> marked by a=0A=
=0A=
> question mark.=0A=
=0A=
>>=0A=
> I tried to shut down the VMs, but oVirt wasn=92t able to do so. I tried=
=0A=
> to reset=0A=
=0A=
> the status in the database with the sql statement=0A=
=0A=
>=0A=
=0A=
> update vm_dynamic set status =3D 0 where vm_guid =3D (select vm_guid fro=
m=0A=
> vm_static=0A=
=0A=
=0A=
> where vm_name =3D 'MYVMNAME');=0A=
=0A=
>=0A=
=0A=
> but that didn=92t help, either. Only rebooting the whole hypervisor=0A=
> helped=85=0A=
=0A=
> afterwards everything worked again. But only for a few hours, then=0A=
> one of the=0A=
=0A=
> VMs entered the =93not responding=94 state again=85 again only a reboot =
helped.=0A=
=0A=
> Yesterday it happened again:=0A=
=0A=
>=0A=
=0A=
> 2015-08-28 17:44:22,664 INFO=0A=
=0A=
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]=0A=
=0A=
> (DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC=0A=
=0A=
> 0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding=0A=
=0A=
>=0A=
=0A=
> 2015-08-28 17:44:22,692 WARN=0A=
=0A=
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector=0A=
> ]=0A=
=0A=
> (DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call=
Stack:=0A=
=0A=
=0A=
> null, Custom Event ID: -1, Message: VM DC is not responding.=0A=
=0A=
>=0A=
=0A=
> Does anybody know what I can do? Where should I have a look? Hints=0A=
> are greatly=0A=
=0A=
> appreciated!=0A=
=0A=
>=0A=
=0A=
> Thanks,=0A=
=0A=
>=0A=
=0A=
> Christian=0A=
=0A=
>=0A=
=0A=
=0A=
=0A=
=0A=
--=0A=
Daniel Helgenberger=0A=
m box bewegtbild GmbH=0A=
=0A=
P: +49/30/2408781-22=0A=
F: +49/30/2408781-10=0A=
=0A=
ACKERSTR. 19=0A=
D-10115 BERLIN=0A=
=0A=
=0A=
www.m-box.de www.monkeymen.tv=0A=
=0A=
Gesch=E4ftsf=FChrer: Martin Retschitzegger / Michaela G=F6llner=0A=
Handeslregister: Amtsgericht Charlottenburg / HRB 112767=0A=
------=_NextPartTM-000-4a9428dc-2b1c-4caf-8ee3-a447a009db9b
Content-Type: text/plain;
name="InterScan_Disclaimer.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="InterScan_Disclaimer.txt"
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
------=_NextPartTM-000-4a9428dc-2b1c-4caf-8ee3-a447a009db9b--