Non responsive host (4.3.10)

28 May 2023

      Hello everyone! 

Due to a recent major power outage in my area I now have an unresponsive self hosted host in an environment of 3 self hosted hosts. There's one vm stuck on there as well as some metadata I guess from when hosted engine was running there (before the power went down). 

I'm running 4.3.10 ovirt node with 3 nodes and GlusterFS, no arbiter, and I'm using it to provide services to our clients i.e. DNS, web sites, wikis, ticketing etc. and I cannot shut them down. 

The ovirt engine is up and running and I can manage all the other VMs that run on the other hosts through the web gui. 

The unresponsive host replies only to ICMP requests; in every other sense it's dead, no ssh, no gluster bricks, no console, nothing. 

I tried to place the faulty host in maintenance, using the option to stop glusterd, but wasn't able to as the engine won't let the host go into maintenance mode because it thinks the host has running VMs on it. The host won't go into maintenance even if I chose the "Ignore gluster quorum and self-heal validations" option. 

I spent last week creating a backup environment were I copied the VMs, to have somewhere to run them in case something goes terribly wrong with the systems or the gluster in the production system. 

I'm thinking of using the global maintenance mode and then shutting down the engine itself with *hosted-engine --vm-shutdown* and rebooting the affected host.

Should I remove the host from the cluster and then re-add it or should I do something else? 

Thanks for any of your help!

Maria Souvalioti

Derek Atkins

Maria Souvalioti

Maria Souvalioti

Clint Boggio

tags

participants (3)