[ovirt-users] Re: Hosted Engine stuck in Firmware

30 Aug 2020

      Thanks for diving into that mess first, because it allowed me to understand what I had done as well...

In my case the issue was a VM moved from 4.3 to 4.4 seemed to be silently upgraded from "default" (whatever was default on 4.3) to "Q35", which seems to be the new default of 4.4.

But that had it lose the network, because udev was now renaming the NIC in yet another manner, when few VMs ever need anything beyond eth0 anyway.

So I went ahead and changed the cluster default to those of the 4.3 cluster (including Nehalem CPUs, because I also use J5005 Atom systems). BTW, that was initially impossible as the edit-button for the cluster ways always greyed out. But on a browser refresh, it suddenly was enabled...
What I don't remember is if the cluster had a BIOS default (it doesn't on 4.3), or if I changed that in the default template, which is mentioned somewhere here as being rather distructive.

I was about to re-import the machine from an export domain, when I did a scheduled reboot of the single node HCI cluster after OS updates.

Those HCI reboots always require a bit ot twiddling on 4.3 and 4.4 for the hosted-engine to start, evidently because of some race conditions (requiring restarts of glusterd/ovirt-ha-broker/ovirt-ha-agent/vdsmd to fix), but this time the SHE simply didn't want to start at all, complaining about missing PCI devices at boot after some digging through log files.

With my 4.4. instance currently dead I don't remember if the BIOS or PCI vs PCIe machine type is a cluster attribute or part of the template but I do seem to remember that the hosted-engine is a bit special here, especially when it comes to picking up the base CPU type.

What is a bit astonishing is the fall-through processing that seems to go on here, when an existing VM should have its hardware nailed down when it was shut down.

It then realized that I might have killed the hosted-engine right there.

And no, /var/run/ovirt...vm.cfg is long gone and I guess it's time for a re-install.

For me one issue remains unclear: How identical do machines remain as they are moved from a 4.3 host to a 4.4 host?

In my view a hypervisor's most basic social contract is to turn a machine into a file and the file back into the very same machine, hopefully even for decades. Upgrade of the virtual hardware should be possible, but under controll of the user/orchestrator.

I am afraid that oVirt's dynamic reconstruction of the machine from database data doesn't always respect that social contract and that needs at least documentation, if not fixing.

The 4.3 to 4.4 migration is far from seamless already, this does not help.

[ovirt-users] Re: Hosted Engine stuck in Firmware

thomas＠hoberg.net