Hi Sandro,
Thanks for the response. I logged onto oVirt this morning, and I see the node is in a
“Unassigned” state. I can ping it, but cannot SSH, so there is something that is causing
the host to be unresponsive.
On Saturday after I sent the mail, I opened a console to the node, and I saw the below
entries before logging in:
audit:backlog limit exceeded
I the tried the solution of increasing the buffer size in the audit.rules file in
/etc/audit/rules.d/ , as per below, but it did not resolve the issue.
## First rule - delete all
-D
## Increase the buffers to survive stress events.
## Make this bigger for busy systems
-b 8192
## Set failure mode to syslog
-f 1
Is it possible to upgrade the node to 4.4 while the engine is still on 4.3?
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
From: Sandro Bonazzola <sbonazzo(a)redhat.com>
Sent: 13 November 2020 18:39
To: Anton Louw <Anton.Louw(a)voxtelecom.co.za>; Arik Hadas <ahadas(a)redhat.com>;
Dominik Holler <dholler(a)redhat.com>
Cc: users(a)ovirt.org; Johan Koen <Johan.Koen(a)voxtelecom.co.za>
Subject: Re: [ovirt-users] oVirt Node Crash
Il giorno ven 13 nov 2020 alle ore 17:37 Sandro Bonazzola
<sbonazzo@redhat.com<mailto:sbonazzo@redhat.com>> ha scritto:
Il giorno ven 13 nov 2020 alle ore 13:38 Anton Louw via Users
<users@ovirt.org<mailto:users@ovirt.org>> ha scritto:
Hi Everybody,
I have built a new host which has been running fine for the last couple of days. I noticed
today that the host crashed, but it is not giving me a reason as to why.
It happened at 13:45 today, but I have given time before that on the logs as well.
Is there something I am missing here?
Not related to the crash, but I see in the logs that 5 out of 20 guests have qemu guest
agent not responding.
Also you seem to have some issues with some firewalld rules. (Maybe +Dominik
Holler<mailto:dholler@redhat.com> would like to have a look)
I don't see anything explaining why the host got rebooted.
Still related to guest agent I find a bit alarming the following lines:
Nov 13 13:29:34 jb2-node03 libvirtd: 2020-11-13 11:29:34.294+0000: 12603: error :
qemuDomainAgentAvailable:9144 : Guest agent is not responding: QEMU guest agent is not
connected
Nov 13 13:29:34 jb2-node03 vdsm[13843]: ERROR Shutdown by QEMU Guest Agent
failed#012Traceback (most recent call last):#012 File
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5304, in
qemuGuestAgentShutdown#012
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)#012 File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f#012
ret = attr(*args, **kwargs)#012 File
"/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131,
in wrapper#012 ret = f(*args, **kwargs)#012 File
"/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in
wrapper#012 return func(inst, *args, **kwargs)#012 File
"/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in shutdownFlags#012
if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed',
dom=self)#012libvirtError: Guest agent is not responding: QEMU guest agent is not
connected
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered disabled state
Nov 13 13:29:42 jb2-node03 kernel: device vnet15 left promiscuous mode
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered disabled state
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6539] device
(vnet15): state change: disconnected -> unmanaged (reason 'unmanaged',
sys-iface-state: 'removed')
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6550] device
(vnet15): released from master device vlan0077
Nov 13 13:29:42 jb2-node03 libvirtd: 2020-11-13 11:29:42.669+0000: 12557: error :
qemuMonitorIO:718 : internal error: End of file from qemu monitor
+Arik Hadas<mailto:ahadas@redhat.com> any clue?
About the crash, can you please provide full sos report from the host? the log you
provided is not enough to understand what caused the reported crash
Also, given python2 is used here, I assume you're on 4.3 or older. I would recommend
to upgrade to 4.4 as soon as practical.
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization at Vox
________________________________
T: 087 805 0000 | D: 087 805 1572
M: N/A
E: anton.louw@voxtelecom.co.za<mailto:anton.louw@voxtelecom.co.za>
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za<http://www.vox.co.za>
[
F]<https://www.facebook.com/voxtelecomZA>
[
T]<https://www.twitter.com/voxtelecom>
[
I]<https://www.instagram.com/voxtelecomza/>
[
L]<https://www.linkedin.com/company/voxtelecom>
[
Y]<https://www.youtube.com/user/VoxTelecom>
[#VoxBrand]<https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME>
Disclaimer
The contents of this email are confidential to the sender and the intended recipient.
Unless the contents are clearly and entirely of a personal nature, they are subject to
copyright in favour of the holding company of the Vox group of companies. Any recipient
who receives this email in error should immediately report the error to the sender and
permanently delete this email from all storage devices.
This email has been scanned for viruses and malware, and may have been automatically
archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business.
Providing a safer and more useful place for your human generated data. Specializing in;
Security, archiving and compliance. To find out more Click
Here<https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>.
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to
users-leave@ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement:
https://www.ovirt.org/privacy-policy.html<https://www.ovirt.org/privac...
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/<https://ww...
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XMRUDMRBYZK...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat
EMEA<https://www.redhat.com/>
sbonazzo@redhat.com<mailto:sbonazzo@redhat.com>
[
https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]&...
Red Hat respects your work life balance. Therefore there is no need to answer this email
out of your office hours.
<
https://mojo.redhat.com/docs/DOC-1199578>
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat
EMEA<https://www.redhat.com/>
sbonazzo@redhat.com<mailto:sbonazzo@redhat.com>
[
https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]&...
Red Hat respects your work life balance. Therefore there is no need to answer this email
out of your office hours.
<
https://mojo.redhat.com/docs/DOC-1199578>