hosted-engine stuck "failed liveliness check" "detail": "up"

Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh the machine when I do hosted-engine --vm-status, I get: .......... --== Host 2 status ==-- Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............ I tried restarting all services/nfs mounts, start hosted engine on other hosts, but all the same host up, but liveliness failed and unable to access the network/IP. I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly? Any help is highly appreciated!

Hi, you can access the console using vnc or use virsh to get access to the serial console. Check the following commands on the host where the VM is currently running: virsh -r list virsh -r console HostedEngine virsh -r vncdisplay HostedEngine Those should give you enough pointers to connect to the VM. Regards Martin Sivak On Fri, Apr 15, 2016 at 10:14 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh the machine
when I do hosted-engine --vm-status, I get:
.......... --== Host 2 status ==--
Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............
I tried restarting all services/nfs mounts, start hosted engine on other hosts, but all the same host up, but liveliness failed and unable to access the network/IP.
I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly?
Any help is highly appreciated!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Thanks! I managed to get the console through: hosted-engine --add-console-password /bin/remote-viewer vnc://localhost:5900 Turns out, there seems to be some corruption on the partition: http://screencast.com/t/6iR0U3QuI Is there a way to boot from CD, so I can start rescue mode? Op vr 15 apr. 2016 om 10:58 schreef Martin Sivak <msivak@redhat.com>:
Hi,
you can access the console using vnc or use virsh to get access to the serial console.
Check the following commands on the host where the VM is currently running:
virsh -r list virsh -r console HostedEngine virsh -r vncdisplay HostedEngine
Those should give you enough pointers to connect to the VM.
Regards
Martin Sivak
On Fri, Apr 15, 2016 at 10:14 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh
the
machine
when I do hosted-engine --vm-status, I get:
.......... --== Host 2 status ==--
Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............
I tried restarting all services/nfs mounts, start hosted engine on other hosts, but all the same host up, but liveliness failed and unable to access the network/IP.
I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly?
Any help is highly appreciated!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Apr 15, 2016 at 11:02 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Thanks!
I managed to get the console through:
hosted-engine --add-console-password /bin/remote-viewer vnc://localhost:5900
Turns out, there seems to be some corruption on the partition: http://screencast.com/t/6iR0U3QuI
Is there a way to boot from CD, so I can start rescue mode?
Yes, you have to set global maintenance mode to avoid VM start attempts on other hosts. You have to poweroff the engine VM with: hosted-engine --vm-shutdown/--vm-poweroff (probably the second one in your case) Then you have to create a copy of /var/run/ovirt-hosted-engine-ha/vm.conf and you have to manually edit it in order to attach the ISO image and change the boot order. At that point you can start the engine VM with your custom configuration with: hosted-engine --vm-start --vm-conf=my_custom_vm.conf Good luck!
Op vr 15 apr. 2016 om 10:58 schreef Martin Sivak <msivak@redhat.com>:
Hi,
you can access the console using vnc or use virsh to get access to the serial console.
Check the following commands on the host where the VM is currently running:
virsh -r list virsh -r console HostedEngine virsh -r vncdisplay HostedEngine
Those should give you enough pointers to connect to the VM.
Regards
Martin Sivak
On Fri, Apr 15, 2016 at 10:14 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh the machine
when I do hosted-engine --vm-status, I get:
.......... --== Host 2 status ==--
Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............
I tried restarting all services/nfs mounts, start hosted engine on other hosts, but all the same host up, but liveliness failed and unable to access the network/IP.
I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly?
Any help is highly appreciated!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

From there I was able to fsck my partition. And answered a lot of y to the
Thanks for the help! I managed to fix it :-) I made a device from the hosted engine file with losetup and for the LVM again. list of inconsistences. I removed the loop back devices. I turned off maintenance mode and hosted engine started :-) Few hours later , restore of the pgsql db and scratching my head, very happy it is running again without any VMS being offline meanwhile :) I will defintely save the way to start hosted-engine with cd iso, as this seems less error sensitive and easier to do. Thanks! Op vr 15 apr. 2016 om 11:35 schreef Simone Tiraboschi <stirabos@redhat.com>:
On Fri, Apr 15, 2016 at 11:02 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Thanks!
I managed to get the console through:
hosted-engine --add-console-password /bin/remote-viewer vnc://localhost:5900
Turns out, there seems to be some corruption on the partition: http://screencast.com/t/6iR0U3QuI
Is there a way to boot from CD, so I can start rescue mode?
Yes, you have to set global maintenance mode to avoid VM start attempts on other hosts.
You have to poweroff the engine VM with: hosted-engine --vm-shutdown/--vm-poweroff (probably the second one in your case) Then you have to create a copy of /var/run/ovirt-hosted-engine-ha/vm.conf and you have to manually edit it in order to attach the ISO image and change the boot order. At that point you can start the engine VM with your custom configuration with: hosted-engine --vm-start --vm-conf=my_custom_vm.conf
Good luck!
Op vr 15 apr. 2016 om 10:58 schreef Martin Sivak <msivak@redhat.com>:
Hi,
you can access the console using vnc or use virsh to get access to the serial console.
Check the following commands on the host where the VM is currently running:
virsh -r list virsh -r console HostedEngine virsh -r vncdisplay HostedEngine
Those should give you enough pointers to connect to the VM.
Regards
Martin Sivak
On Fri, Apr 15, 2016 at 10:14 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh the machine
when I do hosted-engine --vm-status, I get:
.......... --== Host 2 status ==--
Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............
I tried restarting all services/nfs mounts, start hosted engine on
other
hosts, but all the same host up, but liveliness failed and unable to access the network/IP.
I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly?
Any help is highly appreciated!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Apr 15, 2016 at 2:33 PM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Thanks for the help!
I managed to fix it :-)
I made a device from the hosted engine file with losetup and for the LVM again. From there I was able to fsck my partition. And answered a lot of y to the list of inconsistences. I removed the loop back devices. I turned off maintenance mode and hosted engine started :-) Few hours later , restore of the pgsql db and scratching my head, very happy it is running again without any VMS being offline meanwhile :)
I will defintely save the way to start hosted-engine with cd iso, as this seems less error sensitive and easier to do.
It was recently mentioned [1] here that in 3.6, you can't start the vm when you are in maintenance. Was this changed? Adding also Martin. [1] http://lists.ovirt.org/pipermail/users/2016-January/036993.html Best,
Thanks!
Op vr 15 apr. 2016 om 11:35 schreef Simone Tiraboschi <stirabos@redhat.com>:
On Fri, Apr 15, 2016 at 11:02 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Thanks!
I managed to get the console through:
hosted-engine --add-console-password /bin/remote-viewer vnc://localhost:5900
Turns out, there seems to be some corruption on the partition: http://screencast.com/t/6iR0U3QuI
Is there a way to boot from CD, so I can start rescue mode?
Yes, you have to set global maintenance mode to avoid VM start attempts on other hosts.
You have to poweroff the engine VM with: hosted-engine --vm-shutdown/--vm-poweroff (probably the second one in your case) Then you have to create a copy of /var/run/ovirt-hosted-engine-ha/vm.conf and you have to manually edit it in order to attach the ISO image and change the boot order. At that point you can start the engine VM with your custom configuration with: hosted-engine --vm-start --vm-conf=my_custom_vm.conf
Good luck!
Op vr 15 apr. 2016 om 10:58 schreef Martin Sivak <msivak@redhat.com>:
Hi,
you can access the console using vnc or use virsh to get access to the serial console.
Check the following commands on the host where the VM is currently running:
virsh -r list virsh -r console HostedEngine virsh -r vncdisplay HostedEngine
Those should give you enough pointers to connect to the VM.
Regards
Martin Sivak
On Fri, Apr 15, 2016 at 10:14 AM, Paul Groeneweg | Pazion <paul@pazion.nl> wrote:
Tonight my server with NFS hosted-engine mount crashed. Now all is back online ,except the hosted engine. I can't ping or ssh the machine
when I do hosted-engine --vm-status, I get:
.......... --== Host 2 status ==--
Status up-to-date : True Hostname : geisha-3.pazion.nl Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d71d7c6b Host timestamp : 4404 ............
I tried restarting all services/nfs mounts, start hosted engine on other hosts, but all the same host up, but liveliness failed and unable to access the network/IP.
I imagine it is stuck at the console requiring a fsck check maybe? Is there a way to access the boot display directly?
Any help is highly appreciated!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
participants (4)
-
Martin Sivak
-
Paul Groeneweg | Pazion
-
Simone Tiraboschi
-
Yedidyah Bar David