On Sat, Apr 10, 2021 at 1:14 PM David White via Users <users@ovirt.org> wrote:

This is resolved, and my environment is 100% stable now.

Glad to hear that, thanks for the report!

Or was, until I then used the engine to "upgrade" one of the hosts, at which point I started having problems again after the reboot, because the old vlan came back.
I'll finish getting things stabilized today, and hopefully won't run into this again.

I've been turning things on and off quite a bit, because they aren't in a proper data center (yet) and are just sitting here in my home office.
So I'm sure shutting them down and turning them back on fairly often hasn't helped the situation.

I initially had a few issues going on:
I of course first broke things when I tried to change the management vlan
Aside from my notes below and the troubleshooting steps I went through, yesterday, I had forgotten that connectivity to the DNS server hadn't been restored. Once I got DNS operational, the engine was able to see two of the hosts, and finally started showing some green.
I then went in and ran `hosted-engine --vm-stop` to shutdown the engine, and then I started it again... and viola. The last remaining problematic host came online, and a few minutes later, the disks, volumes, and datacenter came online.
I think part of my problem has been this switch. I purchased a Netgear GS324T for my frontend traffic. But I've also needed to put my backend traffic onto some temporary ports on that switch until I can get a VM controller setup that will run my other switch, a Ubiquiti US-XG-16 for my permanent backend traffic. The Netgear hasn't been nearly as simple to configure as I had hoped. The vlan behavior has also been inconsistent - sometimes I have vlan settings in place, and things work. Sometimes they don't work. It has also been re-assigning a of the vlans occasionally after reboots, which has been frustrating. I'm close to being completely done configuring the infrastructure, but I'm also getting increasingly tempted to go find a different switch.
Lessons learned:
Always make sure DNS is functional
I was really hoping that I could run DNS as a VM (or multiple VMs) inside the cluster.
That said, if the cluster and the engine won't even start correctly without, then I may need to run DNS externally. I'm open to feedback on this.
I have 1 extra U of space at the datacenter reserved, and I do have a 4th spare server that I haven't decided what to do with yet. It has way more CPU and RAM than would be necessary to run an internal DNS server... but perhaps I have no choice. Thoughts?

You can also have the IP addresses of the engine and hosts in /etc/hosts of all machines (engine and hosts) - then things should work fine. It does mean you'll have to manually maintain these hosts files somehow.

Make sure your vlan settings are correct before you start deploying the hosted engine and configure oVirt.

Definitely. As well as making sure that IP addresses (and netmasks, routes, etc.) are as intended and working, name resolution is correct (DNS or /etc/hosts), etc. .

If possible, don't turn off and turn on your servers constantly. :) I realize this is a given. I just don't have much choice in the matter right now, due to lack of datacenter in my home office.

While definitely not recommended, in principle this should be harmless. If you find concrete reproducible bugs around this, please report them (with clear accurate details - just "I turn off and on my hosts and things stop working" is not helpful, obviously...).

Thanks again and best regards,