On Tue, Oct 2, 2018 at 4:16 PM Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hi,

Just run into the issue during cluster upgrade from 4.24 to 4.2.6.1. I'm running small cluster with 2 hosts and gluster storage. Once I upgraded one of the hosts to 4.2.6.1 something went wrong (looks like it tried to start HE instance) and I can't connect to hosted-engine any longer. 

As I can see HostedEngine is still running on the second host (and another yet 7 VM's) , but I can't stop it. 
ovirt-ha-agent and ovirt-ha-broker are failing to start. hosted-engine --vm-status gives nothing but error message 
"The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable."

Is the storage available? Check gluster volume status <volumename> and gluster volume heal <volumename> info (replace <volumename> with name of the gluster volume hosting your HE disk.
You mention cluster with 2 hosts - replica 2? You're likely to run into split brain scenarios.

ps -ef shows plenty of vdsm processes in defunc state thats probably the reason why agent and brocker can't start. Just wondering that is the good way to start problem resolution here to minimize downtime for running VM's? 

Restart vdsm and try again restarting agent and broker or just reboot the whole host?

If storage is available , try restarting the vdsm, agent and broker services


Regards,
Artem
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BKU2N2UOEHWJ3XKJ5DRTERKBTQZ4X7EB/