Il giorno gio 19 nov 2020 alle ore 10:48 Anton Louw <
Anton.Louw(a)voxtelecom.co.za> ha scritto:
Hi Sandro,
Thanks for the response.
If I upgrade my datacenter to 4.4.3, will I first need to upgrade my
engine? I see my only options now in the datacenter is:
Also, if the data center is upgraded, will it still be compatible with my
other hosts, some running 4.3.3?
4.3.3 should be able to run cluster compatibility 4.3 :-)
In general, it would be better to align the datacenter to the latest
version as soon as practical.
Thanks
*Anton Louw*
*Cloud Engineer: Storage and Virtualization* at *Vox*
------------------------------
*T:* 087 805 0000 | *D:* 087 805 1572
*M:* N/A
*E:* anton.louw(a)voxtelecom.co.za
*A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za
[image: F] <
https://www.facebook.com/voxtelecomZA>
[image: T] <
https://www.twitter.com/voxtelecom>
[image: I] <
https://www.instagram.com/voxtelecomza/>
[image: L] <
https://www.linkedin.com/company/voxtelecom>
[image: Y] <
https://www.youtube.com/user/VoxTelecom>
*From:* Sandro Bonazzola <sbonazzo(a)redhat.com>
*Sent:* 19 November 2020 10:00
*To:* Anton Louw <Anton.Louw(a)voxtelecom.co.za>
*Cc:* Arik Hadas <ahadas(a)redhat.com>; Dominik Holler <dholler(a)redhat.com>;
users(a)ovirt.org; Johan Koen <Johan.Koen(a)voxtelecom.co.za>
*Subject:* Re: [ovirt-users] oVirt Node Crash
Il giorno mar 17 nov 2020 alle ore 16:01 Anton Louw <
Anton.Louw(a)voxtelecom.co.za> ha scritto:
Hi Sandro,
Have you perhaps seen anything in the SOS report that could shed some
light on the issues?
Sadly no. I see it's oVirt Node 4.3.8, I can suggest to upgrade to 4.3.10
at least and consider upgrading to 4.4.3 the whole datacenter.
I had the feeling watchdog was the trigger of the reboot but couldn't find
any evidence.
I also don't see anything suspicious in the logs.
Thanks
*Anton Louw*
*Cloud Engineer: Storage and Virtualization* at *Vox*
------------------------------
*T:* 087 805 0000 | *D:* 087 805 1572
*M:* N/A
*E:* anton.louw(a)voxtelecom.co.za
*A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za
[image: F] <
https://www.facebook.com/voxtelecomZA>
[image: T] <
https://www.twitter.com/voxtelecom>
[image: I] <
https://www.instagram.com/voxtelecomza/>
[image: L] <
https://www.linkedin.com/company/voxtelecom>
[image: Y] <
https://www.youtube.com/user/VoxTelecom>
*From:* Anton Louw
*Sent:* 16 November 2020 07:30
*To:* Sandro Bonazzola <sbonazzo(a)redhat.com>; Arik Hadas <
ahadas(a)redhat.com>; Dominik Holler <dholler(a)redhat.com>
*Cc:* users(a)ovirt.org; Johan Koen <Johan.Koen(a)voxtelecom.co.za>
*Subject:* RE: [ovirt-users] oVirt Node Crash
I have also attached the SOS report as requested
*From:* Anton Louw
*Sent:* 16 November 2020 06:54
*To:* Sandro Bonazzola <sbonazzo(a)redhat.com>; Arik Hadas <
ahadas(a)redhat.com>; Dominik Holler <dholler(a)redhat.com>
*Cc:* users(a)ovirt.org; Johan Koen <Johan.Koen(a)voxtelecom.co.za>
*Subject:* RE: [ovirt-users] oVirt Node Crash
Hi Sandro,
Thanks for the response. I logged onto oVirt this morning, and I see the
node is in a “Unassigned” state. I can ping it, but cannot SSH, so there is
something that is causing the host to be unresponsive.
On Saturday after I sent the mail, I opened a console to the node, and I
saw the below entries before logging in:
audit:backlog limit exceeded
I the tried the solution of increasing the buffer size in the audit.rules
file in /etc/audit/rules.d/ , as per below, but it did not resolve the
issue.
## First rule - delete all
-D
## Increase the buffers to survive stress events.
## Make this bigger for busy systems
-b 8192
## Set failure mode to syslog
-f 1
Is it possible to upgrade the node to 4.4 while the engine is still on 4.3?
Thanks
*From:* Sandro Bonazzola <sbonazzo(a)redhat.com>
*Sent:* 13 November 2020 18:39
*To:* Anton Louw <Anton.Louw(a)voxtelecom.co.za>; Arik Hadas <
ahadas(a)redhat.com>; Dominik Holler <dholler(a)redhat.com>
*Cc:* users(a)ovirt.org; Johan Koen <Johan.Koen(a)voxtelecom.co.za>
*Subject:* Re: [ovirt-users] oVirt Node Crash
Il giorno ven 13 nov 2020 alle ore 17:37 Sandro Bonazzola <
sbonazzo(a)redhat.com> ha scritto:
Il giorno ven 13 nov 2020 alle ore 13:38 Anton Louw via Users <
users(a)ovirt.org> ha scritto:
Hi Everybody,
I have built a new host which has been running fine for the last couple of
days. I noticed today that the host crashed, but it is not giving me a
reason as to why.
It happened at 13:45 today, but I have given time before that on the logs
as well.
Is there something I am missing here?
Not related to the crash, but I see in the logs that 5 out of 20 guests
have qemu guest agent not responding.
Also you seem to have some issues with some firewalld rules. (Maybe +Dominik
Holler <dholler(a)redhat.com> would like to have a look)
I don't see anything explaining why the host got rebooted.
Still related to guest agent I find a bit alarming the following lines:
Nov 13 13:29:34 jb2-node03 libvirtd: 2020-11-13 11:29:34.294+0000: 12603:
error : qemuDomainAgentAvailable:9144 : Guest agent is not responding: QEMU
guest agent is not connected
Nov 13 13:29:34 jb2-node03 vdsm[13843]: ERROR Shutdown by QEMU Guest Agent
failed#012Traceback (most recent call last):#012 File
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5304, in
qemuGuestAgentShutdown#012
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)#012 File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in
f#012 ret = attr(*args, **kwargs)#012 File
"/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line
131, in wrapper#012 ret = f(*args, **kwargs)#012 File
"/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in
wrapper#012 return func(inst, *args, **kwargs)#012 File
"/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in
shutdownFlags#012 if ret == -1: raise libvirtError
('virDomainShutdownFlags() failed', dom=self)#012libvirtError: Guest agent
is not responding: QEMU guest agent is not connected
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered
disabled state
Nov 13 13:29:42 jb2-node03 kernel: device vnet15 left promiscuous mode
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered
disabled state
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6539]
device (vnet15): state change: disconnected -> unmanaged (reason
'unmanaged', sys-iface-state: 'removed')
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6550]
device (vnet15): released from master device vlan0077
Nov 13 13:29:42 jb2-node03 libvirtd: 2020-11-13 11:29:42.669+0000: 12557:
error : qemuMonitorIO:718 : internal error: End of file from qemu monitor
+Arik Hadas <ahadas(a)redhat.com> any clue?
About the crash, can you please provide full sos report from the host? the
log you provided is not enough to understand what caused the reported crash
Also, given python2 is used here, I assume you're on 4.3 or older. I would
recommend to upgrade to 4.4 as soon as practical.
Thanks
*Anton Louw*
*Cloud Engineer: Storage and Virtualization* at *Vox*
------------------------------
*T:* 087 805 0000 | *D:* 087 805 1572
*M:* N/A
*E:* anton.louw(a)voxtelecom.co.za
*A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za
[image: F] <
https://www.facebook.com/voxtelecomZA>
[image: T] <
https://www.twitter.com/voxtelecom>
[image: I] <
https://www.instagram.com/voxtelecomza/>
[image: L] <
https://www.linkedin.com/company/voxtelecom>
[image: Y] <
https://www.youtube.com/user/VoxTelecom>
[image: #VoxBrand]
<
https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME>
*Disclaimer*
The contents of this email are confidential to the sender and the intended
recipient. Unless the contents are clearly and entirely of a personal
nature, they are subject to copyright in favour of the holding company of
the Vox group of companies. Any recipient who receives this email in error
should immediately report the error to the sender and permanently delete
this email from all storage devices.
This email has been scanned for viruses and malware, and may have been
automatically archived by *Mimecast Ltd*, an innovator in Software as a
Service (SaaS) for business. Providing a *safer* and *more useful* place
for your human generated data. Specializing in; Security, archiving and
compliance. To find out more Click Here
<
https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XMRUDMRBYZK...
--
*Sandro Bonazzola*
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
https://www.redhat.com/>
sbonazzo(a)redhat.com
<
https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.
<
https://mojo.redhat.com/docs/DOC-1199578>*
--
*Sandro Bonazzola*
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
https://www.redhat.com/>
sbonazzo(a)redhat.com
<
https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.
<
https://mojo.redhat.com/docs/DOC-1199578>*
--
*Sandro Bonazzola*
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
https://www.redhat.com/>
sbonazzo(a)redhat.com
<
https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.
<
https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.
<