Hosted Engine is down

We lost the backend storage that hosts our self hosted engine tonight. We've recovered it and there was no data corruption on the volume containing the HE disk. However, when we try to start the HE it doesn't give an error, but it also doesn't start. The VM isn't pingable and the liveliness check always fails. [root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1 Hostname : ovirttest1.wolfram.com Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 2c2f3ec9 local_conf_timestamp : 18980042 Host timestamp : 18980039 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=18980039 (Fri Nov 10 01:17:59 2017) host-id=1 score=3400 vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False The environment is in Global Maintenance so that we can isolate it to starting on a specific host to eliminate as many variables as possible. I've attached the agent and broker logs Regards, Logan Kuhn

Hi Logan, When i look at the hosted-engine --vm-status i see that vm is up but it is health is bad. Can you try connecting to the vm using remote-viewer using the command below ? remote-viewer vnc://ovirttest1.wolfram.com:5900 Thanks kasturi On Fri, Nov 10, 2017 at 12:52 PM, Logan Kuhn <logankuhn510@gmail.com> wrote:
We lost the backend storage that hosts our self hosted engine tonight. We've recovered it and there was no data corruption on the volume containing the HE disk. However, when we try to start the HE it doesn't give an error, but it also doesn't start.
The VM isn't pingable and the liveliness check always fails.
[root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1 Hostname : ovirttest1.wolfram.com Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 2c2f3ec9 local_conf_timestamp : 18980042 Host timestamp : 18980039 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=18980039 (Fri Nov 10 01:17:59 2017) host-id=1 score=3400 vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
The environment is in Global Maintenance so that we can isolate it to starting on a specific host to eliminate as many variables as possible. I've attached the agent and broker logs
Regards, Logan Kuhn
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, Following my answer, this is the bug you opened to track this issue right? https://bugzilla.redhat.com/show_bug.cgi?id=1511788 You said in comment #2 of that bug that all is well now. Should we close the bug then? Best regards Martin Sivak On Fri, Nov 10, 2017 at 8:22 AM, Logan Kuhn <logankuhn510@gmail.com> wrote:
We lost the backend storage that hosts our self hosted engine tonight. We've recovered it and there was no data corruption on the volume containing the HE disk. However, when we try to start the HE it doesn't give an error, but it also doesn't start.
The VM isn't pingable and the liveliness check always fails.
[root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1 Hostname : ovirttest1.wolfram.com Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 2c2f3ec9 local_conf_timestamp : 18980042 Host timestamp : 18980039 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=18980039 (Fri Nov 10 01:17:59 2017) host-id=1 score=3400 vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
The environment is in Global Maintenance so that we can isolate it to starting on a specific host to eliminate as many variables as possible. I've attached the agent and broker logs
Regards, Logan Kuhn
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
Kasturi Narra
-
Logan Kuhn
-
Martin Sivak