On Thu, May 21, 2020 at 8:55 AM Joseph Goldman <joseph(a)goldman.id.au> wrote:
Hi List,
Running a 3 node setup for a client, i'm constantly having the
HostedEngine move itself around, whatever node its on ends up penalizing
its score so low that it forces a migrate to the other node.
Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent
amount of:
MainThread::INFO::2020-05-21
15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 319 due to network status
What I want to know is how do I get more debug out of this to know
what network status its concerned about, so I can go about stablising it.
You can see some more info in broker.log in same log dir. Search for "network".
If it's not enough to understand why it penalizes, you might want to add some
logging to the code, which is:
https://github.com/oVirt/ovirt-hosted-engine-ha/blob/master/ovirt_hosted_...
or, on your machine, in
/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/network.py
(or python2.7, for <= 4.3).
See also the git log ("History" button in github) for recent changes, including
adding logging to the dns tester.
The system is heavily monitored with ping checks, never drops link and
never drops ICMP. None of its VM's falter accessing shared NFS space for
disk storage so I'm not sure what the concern is. The node will
literally over time penalise itself down to ~2000 and then HA agent will
want it to swap nodes. It's not necessarily a bad thing but generates a
heap of status emails multiple times a day which is just garbage - and
makes the HE unavailable sometimes when mid-admin task.
Understood.
Which network tester do you use? If it's dns (IIRC the default now), perhaps
it's a problem with your dns server(s).
Good luck and best regards,
--
Didi