On Fri, Jun 16, 2017 at 9:11 AM, Andrew Dent <adent@ctcroydon.com.au> wrote:
Hi

Well I've got myself into a fine mess. 

host01 was setup with hosted-engine v4.1. This was successful. 
Imported 3 VMs from a v3.6 OVirt AIO instance. (This OVirt 3.6 is still running with more VMs on it)
Tried to add host02 to the new Ovirt 4.1 setup. This partially succeeded but I couldn't add any storage domains to it. Cannot remember why. 
In Ovirt engine UI I removed host02. 
I reinstalled host02 with Centos7, tried to add it and Ovirt UI told me it was already there (but it wasn't listed in the UI). 
Renamed the reinstalled host02 to host03, changed the ipaddress, reconfig the DNS server and added host03 into the Ovirt Engine UI. 
All good, and I was able to import more VMs to it. 
I was also able to shutdown a VM on host01 assign it to host03 and start the VM. Cool, everything working. 
The above was all last couple of weeks. 

This week I performed some yum updates on the Engine VM. No reboot. 
Today noticed that the Ovirt services in the Engine VM were in a endless restart loop. They would be up for a 5 minutes and then die. 
Looking into /var/log/ovirt-engine/engine.log and I could only see errors relating to host02. Ovirt was trying to find it and failing. Then falling over. 
I ran "hosted-engine --clean-metadata" thinking it would cleanup and remove bad references to hosts, but now realise that was a really bad idea as it didn't do what I'd hoped. 
At this point the sequence below worked, I could login to Ovirt UI but after 5 minutes the services would be off
service ovirt-engine restart
service ovirt-websocket-proxy restart
service httpd restart

I saw some reference to having to remove hosts from the database by hand in situations where under the hood of Ovirt a decommission host was still listed, but wasn't showing in the GUI. 
So I removed reference to host02 (vds_id and host_id) in the following tables in this order. 
vds_dynamic
vds_statistics
vds_static
host_device

Now when I try to start ovirt-websocket it will not start
service ovirt-websocket start
Redirecting to /bin/systemctl start  ovirt-websocket.service
Failed to start ovirt-websocket.service: Unit not found.

I'm now thinking that I need to do the following in the engine VM
# engine-cleanup
# yum remove ovirt-engine
# yum install ovirt-engine
# engine-setup 
But to run engine-cleanup I need to put the engine-vm into maintenance mode and because of the --clean-metadata that I ran earlier on host01 I cannot do that. 

What is the best course of action from here?

To be honest, with all the steps taken above, I'd install everything (including OS) from scratch...
There's a bit too much mess to try to clean up properly here.
Y.
 

Cheers


Andrew


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users