
Hi List, Running a 3 node setup for a client, i'm constantly having the HostedEngine move itself around, whatever node its on ends up penalizing its score so low that it forces a migrate to the other node. Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent amount of: MainThread::INFO::2020-05-21 15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 319 due to network status What I want to know is how do I get more debug out of this to know what network status its concerned about, so I can go about stablising it. The system is heavily monitored with ping checks, never drops link and never drops ICMP. None of its VM's falter accessing shared NFS space for disk storage so I'm not sure what the concern is. The node will literally over time penalise itself down to ~2000 and then HA agent will want it to swap nodes. It's not necessarily a bad thing but generates a heap of status emails multiple times a day which is just garbage - and makes the HE unavailable sometimes when mid-admin task. Any help is appreciated. Thanks, Joe

On Thu, May 21, 2020 at 8:55 AM Joseph Goldman <joseph@goldman.id.au> wrote:
Hi List,
Running a 3 node setup for a client, i'm constantly having the HostedEngine move itself around, whatever node its on ends up penalizing its score so low that it forces a migrate to the other node.
Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent amount of:
MainThread::INFO::2020-05-21 15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 319 due to network status
What I want to know is how do I get more debug out of this to know what network status its concerned about, so I can go about stablising it.
You can see some more info in broker.log in same log dir. Search for "network". If it's not enough to understand why it penalizes, you might want to add some logging to the code, which is: https://github.com/oVirt/ovirt-hosted-engine-ha/blob/master/ovirt_hosted_eng... or, on your machine, in /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/network.py (or python2.7, for <= 4.3). See also the git log ("History" button in github) for recent changes, including adding logging to the dns tester.
The system is heavily monitored with ping checks, never drops link and never drops ICMP. None of its VM's falter accessing shared NFS space for disk storage so I'm not sure what the concern is. The node will literally over time penalise itself down to ~2000 and then HA agent will want it to swap nodes. It's not necessarily a bad thing but generates a heap of status emails multiple times a day which is just garbage - and makes the HE unavailable sometimes when mid-admin task.
Understood. Which network tester do you use? If it's dns (IIRC the default now), perhaps it's a problem with your dns server(s). Good luck and best regards, -- Didi

In addition to what Didi suggested, you can enable DEBUG level in order to get more details in broker.log: 1. Edit /etc/ovirt-hosted-engine-ha/broker-log.conf 2. In [logger_root] section change the level parameter to level=DEBUG 3. Restart the service: systemctl restart ovirt-ha-broker Regards, Asaf On Thu, May 21, 2020 at 10:01 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, May 21, 2020 at 8:55 AM Joseph Goldman <joseph@goldman.id.au> wrote:
Hi List,
Running a 3 node setup for a client, i'm constantly having the HostedEngine move itself around, whatever node its on ends up penalizing its score so low that it forces a migrate to the other node.
Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent amount of:
MainThread::INFO::2020-05-21
15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 319 due to network status
What I want to know is how do I get more debug out of this to know what network status its concerned about, so I can go about stablising it.
You can see some more info in broker.log in same log dir. Search for "network".
If it's not enough to understand why it penalizes, you might want to add some logging to the code, which is:
https://github.com/oVirt/ovirt-hosted-engine-ha/blob/master/ovirt_hosted_eng...
or, on your machine, in
/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/network.py
(or python2.7, for <= 4.3).
See also the git log ("History" button in github) for recent changes, including adding logging to the dns tester.
The system is heavily monitored with ping checks, never drops link and never drops ICMP. None of its VM's falter accessing shared NFS space for disk storage so I'm not sure what the concern is. The node will literally over time penalise itself down to ~2000 and then HA agent will want it to swap nodes. It's not necessarily a bad thing but generates a heap of status emails multiple times a day which is just garbage - and makes the HE unavailable sometimes when mid-admin task.
Understood.
Which network tester do you use? If it's dns (IIRC the default now), perhaps it's a problem with your dns server(s).
Good luck and best regards, -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JHZPC7IHU3LRPN...
participants (3)
-
Asaf Rachmani
-
Joseph Goldman
-
Yedidyah Bar David