On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Ok,

I have managed to recover again and no issues are detected this time.
I guess this case is quite rare and nobody has experienced that.

>Hi,
>can you please explain how you fixed it?

I have set again to global maintenance, defined the HostedEngine from the old xml (taken from old vdsm log) , defined the network and powered it off.
Set the OVF update period to 5 min , but it took several hours until the OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and undefined it from ovirt1.

then I set the maintenance to 'none' and the VM powered on ovirt1.
In order to test a failure, I removed the global maintenance and powered off the HostedEngine from itself (via ssh). It was brought back to the other node.

In order to test failure of ovirt2, I set ovirt1 in local maintenance and removed it (mode 'none') and again shutdown the VM via ssh and it started again to ovirt1.

It seems to be working, as I have later shut down the Engine several times and it managed to start without issues.

I'm not sure this is related, but I had detected that ovirt2 was out-of-sync of the vdsm-ovirtmgmt network , but it got fixed easily via the UI.



Best Regards,
Strahil Nikolov