Hi Artur,
Thanks for the reply. I have attached the system logs. There was a disconnect at 10:54,
but no error that is different to the rest. I do see a whole lot of QEMU Guest Agent and
block_io errors in the system logs. Not entirely sure what this means.
Checking the vdsm logs at the time or the error, the only entry is the below:
“2020-09-18 10:55:57,081+0000 WARN (qgapoller/2) [virt.periodic.VmDispatcher] could not
run <function <lambda> at 0x7f2170395578> on
['d3838612-70bb-4731-a0d4-8f65d31b40a6',
'59a2f394-48fe-4bd9-91d6-08115f2eec0a',
'f81e3ab8-c1a9-4674-b238-7e229fd43e7c',
'42189fa1-4381-02c7-d830-20eac408da2c',
'423f1c57-f98e-707f-c0f9-d4958d3f0fec',
'64d1eabc-20ff-4288-98ff-dcfd120fe7d2',
'4218baf0-e2a1-42c7-2efd-077407f47b4d',
'42184650-5a60-5403-d758-840bdbf92dd8',
'492ea3fe-0a27-4dde-abf9-7d146ee1b988',
'4218df00-15cd-bdf9-efd9-c5ead49fd89c',
'9c373379-718b-4906-abc1-960fb1820c2d',
'b9441c7a-0bfd-4d41-a8de-ee24e4259b36',
'd810325a-1a45-4054-a870-c8c052a22354',
'42189d3f-4570-45ea-6e5a-94c85a5885a1'] (periodic:289)”
I am stumped. Do you think it is worth a shot increasing the vdsConnectionTimeout and
vdsHeartbeatInSeconds to 40 for testing purposes?
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
From: Artur Socha <asocha(a)redhat.com>
Sent: 18 September 2020 13:27
To: Anton Louw <Anton.Louw(a)voxtelecom.co.za>
Cc: users(a)ovirt.org
Subject: Re: [ovirt-users] Re: Random hosts disconnects
Hi Anton,
I am not sure if changing this value would fix the issue. Defaults are pretty high. For
example vdsHeartbeatInSeconds=30seconds, vdsTimeout=180seconds,
vdsConnectionTimeout=20seconds.
Do you still have relevant logs from the affected hosts:
/var/logs/vdsm/vdsm.log
/var/logs/vdsm/supervdsm.log
Please look for any jsonrpc errors ie. write/read errors or (connection) timeouts.
Storage related warnings/errors might also be relevant.
Plus system logs if possible:
journalctl -f /usr/share/vdsm/vdsmd
journalctl -f /usr/sbin/libvirtd
In order to get system logs from particular time period please combine it with the
following example using -S -U options:
journalctl -S "2020-01-12 07:00:00" -U "2020-01-12 07:15:00"
I haven't a clue what to look there for besides any warnings/errors or anything else
that seems .... unusual.
Artur
On Thu, Sep 17, 2020 at 8:09 AM Anton Louw via Users
<users@ovirt.org<mailto:users@ovirt.org>> wrote:
Hi Everybody,
Did some digging around, and saw a few things regarding “vdsHeartbeatInSeconds”
I had a look at the properties file located at
/etc/ovirt-engine/engine-config/engine-config.properties, and do not see an entry for
“vdsHeartbeatInSeconds.type=Integer”.
Seeing as these data centers are geographically split, could the “vdsHeartbeatInSeconds”
potentially be the issue? Is it safe to increase this value after I add
“vdsHeartbeatInSeconds.type=Integer” into my
engine-config.properties<http://engine-config.properties> file?
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization at Vox
________________________________
T: 087 805 0000 | D: 087 805 1572
M: N/A
E: anton.louw@voxtelecom.co.za<mailto:anton.louw@voxtelecom.co.za>
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za<http://www.vox.co.za>
[
F]<https://www.facebook.com/voxtelecomZA>
[
T]<https://www.twitter.com/voxtelecom>
[
I]<https://www.instagram.com/voxtelecomza/>
[
L]<https://www.linkedin.com/company/voxtelecom>
[
Y]<https://www.youtube.com/user/VoxTelecom>
From: Anton Louw via Users <users@ovirt.org<mailto:users@ovirt.org>>
Sent: 16 September 2020 09:01
To: users@ovirt.org<mailto:users@ovirt.org>
Subject: [ovirt-users] Random hosts disconnects
Hi All,
I have a strange issue in my oVirt environment. I currently have a standalone manager
which is running in VMware. In my oVirt environment, I have two Data Centers. The manager
is currently sitting on the same subnet as DC1. Randomly, hosts in DC2 will say “Not
Responding” and then 2 seconds later, the hosts will activate again.
The strange thing is, when the manager was sitting on the same subnet as DC2, hosts in DC1
will randomly say “Not Responding”
I have tried going through the logs, but I cannot see anything out of the ordinary
regarding why the hosts would drop connection. I have attached the engine.log for anybody
that would like to do a spot check.
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization at Vox
________________________________
T: 087 805 0000 | D: 087 805 1572
M: N/A
E: anton.louw@voxtelecom.co.za<mailto:anton.louw@voxtelecom.co.za>
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za<http://www.vox.co.za>
[
F]<https://www.facebook.com/voxtelecomZA>
[
T]<https://www.twitter.com/voxtelecom>
[
I]<https://www.instagram.com/voxtelecomza>
[
L]<https://www.linkedin.com/company/voxtelecom>
[
Y]<https://www.youtube.com/user/VoxTelecom>
[#VoxBrand]<https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME>
Disclaimer
The contents of this email are confidential to the sender and the intended recipient.
Unless the contents are clearly and entirely of a personal nature, they are subject to
copyright in favour of the holding company of the Vox group of companies. Any recipient
who receives this email in error should immediately report the error to the sender and
permanently delete this email from all storage devices.
This email has been scanned for viruses and malware, and may have been automatically
archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business.
Providing a safer and more useful place for your human generated data. Specializing in;
Security, archiving and compliance. To find out more Click
Here<https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>.
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to
users-leave@ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement:
https://www.ovirt.org/privacy-policy.html<https://www.ovirt.org/privac...
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/<https://ww...
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EJL246IPBGE...
--
Artur Socha
Senior Software Engineer, RHV
Red Hat