
On Thu, May 21, 2020 at 8:55 AM Joseph Goldman <joseph@goldman.id.au> wrote:
Hi List,
Running a 3 node setup for a client, i'm constantly having the HostedEngine move itself around, whatever node its on ends up penalizing its score so low that it forces a migrate to the other node.
Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent amount of:
MainThread::INFO::2020-05-21 15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 319 due to network status
What I want to know is how do I get more debug out of this to know what network status its concerned about, so I can go about stablising it.
You can see some more info in broker.log in same log dir. Search for "network". If it's not enough to understand why it penalizes, you might want to add some logging to the code, which is: https://github.com/oVirt/ovirt-hosted-engine-ha/blob/master/ovirt_hosted_eng... or, on your machine, in /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/network.py (or python2.7, for <= 4.3). See also the git log ("History" button in github) for recent changes, including adding logging to the dns tester.
The system is heavily monitored with ping checks, never drops link and never drops ICMP. None of its VM's falter accessing shared NFS space for disk storage so I'm not sure what the concern is. The node will literally over time penalise itself down to ~2000 and then HA agent will want it to swap nodes. It's not necessarily a bad thing but generates a heap of status emails multiple times a day which is just garbage - and makes the HE unavailable sometimes when mid-admin task.
Understood. Which network tester do you use? If it's dns (IIRC the default now), perhaps it's a problem with your dns server(s). Good luck and best regards, -- Didi