I'd say you were close!
I tried fiddling with the penalties, but that didn't do anything good.
But once I found that hosted-engine --vm-status displayed the score across the hosts, I
found them to be very low constantly, the 1600 gateway penalty seems a proper match.
I then reinstalled the cluster, bypassing any dependencies on DNS, which may be a little
slow as it's not under my control. I have fully-fleshed out /etc/hosts files to
accelerate that, but those seem to be ignored sometimes, or only come into play when a DNS
lookup has outright failed, not just taken too long.
In the cockpit setup screen you get to chose if you want to use DNS, ping or TCP for a
liveliness check, I guess for the ovirt-ha-agent or -broker, and I also chose
'ping' there, which also has the cockpit screen immediately happy, while the
'dns' setting seems to take a long time.
With that, I see scores of 3400 all around so I guess that nailed it. I've found the
Python code that implements the ovirt-ha monitors, but I can't something a broker.conf
file or any other entry where the mechanism is actually configured, so I can change and
test with different settings without a re-installation.
I quite like the liberty a proper DNS might give me, in case I need to move networks
again. Yet after this, I'm very motivated to go back to plain old hardwired IPv4.
Pretty confident it wasn't the missing package updates now (sorry guys!), but at least
it got me looking in the proper direction...