
Hi all, I'm experiencing random reboot on several oVirt nodes (CentOS 7/8, oVirt 4.3/4.4 as well). Sometimes it happens three times in a day, and the more hosts I'm adding to my pool, the more I noticing. The logs are not helpful: it's like a brute poweroff cause there are no entries at all in the messages, vdsm, secure (I looked all over the logs) from the last "normal" entry (user logged in/off, normal vdsm log ecc.) until the first entry of the boot. kdump is enabled and /var/crash is empty. I used to run Xen on the servers of the same provider and I didn't have all of these frequent reboots, that's why I'm not sure it is a hardware related issue. Any advice on what enables for getting more info about this crash? Thank you for your time, Francesco

My first guess would be fencing.Fencing kicks in when there are network issues or when the Hypervisor is stuck. Check the engine's logs to verify that guess. Best Regards,Strahil Nikolov Sent from Yahoo Mail on Android On Fri, Feb 5, 2021 at 11:50, francesco--- via Users<users@ovirt.org> wrote: Hi all, I'm experiencing random reboot on several oVirt nodes (CentOS 7/8, oVirt 4.3/4.4 as well). Sometimes it happens three times in a day, and the more hosts I'm adding to my pool, the more I noticing. The logs are not helpful: it's like a brute poweroff cause there are no entries at all in the messages, vdsm, secure (I looked all over the logs) from the last "normal" entry (user logged in/off, normal vdsm log ecc.) until the first entry of the boot. kdump is enabled and /var/crash is empty. I used to run Xen on the servers of the same provider and I didn't have all of these frequent reboots, that's why I'm not sure it is a hardware related issue. Any advice on what enables for getting more info about this crash? Thank you for your time, Francesco _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/G3DOP7A7SREBYQ...

Hi Strahil, I have the Power Management setting disabled on all the hosts, so I doubt it's a fencing issue related, but thank you for the suggestion. The only logs that I see in the engine is the "set non responsive status". Francesco Il 06/02/2021 06:20, Strahil Nikolov via Users ha scritto:
My first guess would be fencing. Fencing kicks in when there are network issues or when the Hypervisor is stuck.
Check the engine's logs to verify that guess.
Best Regards, Strahil Nikolov
Sent from Yahoo Mail on Android <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature>
On Fri, Feb 5, 2021 at 11:50, francesco--- via Users <users@ovirt.org> wrote: Hi all,
I'm experiencing random reboot on several oVirt nodes (CentOS 7/8, oVirt 4.3/4.4 as well). Sometimes it happens three times in a day, and the more hosts I'm adding to my pool, the more I noticing.
The logs are not helpful: it's like a brute poweroff cause there are no entries at all in the messages, vdsm, secure (I looked all over the logs) from the last "normal" entry (user logged in/off, normal vdsm log ecc.) until the first entry of the boot. kdump is enabled and /var/crash is empty. I used to run Xen on the servers of the same provider and I didn't have all of these frequent reboots, that's why I'm not sure it is a hardware related issue.
Any advice on what enables for getting more info about this crash?
Thank you for your time, Francesco _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/G3DOP7A7SREBYQ... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/G3DOP7A7SREBYQ5IY24HBE4GYCKM6QH7/>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KNBORTK5GAOFT3...
-- -- Shellrent - Il primo hosting italiano Security First *Francesco Lorenzini* /System Administrator & DevOps Engineer/ Shellrent Srl Via dell'Edilizia, 19 - 36100 Vicenza Tel. 0444321155 <tel:+390444321155> | Fax 04441492177

I would then configure remote logging and wait for another crash. Also, patch to latest OS/oVirt version (if possible). Recently I figured out a similar case - 2 PCI NICs went bad. Best Regards,Strahil Nikolov On Mon, Feb 8, 2021 at 11:42, Francesco via Users<users@ovirt.org> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UB7ASXNGSV4HBU...
participants (3)
-
Francesco
-
francesco@shellrent.com
-
Strahil Nikolov