Ovirt self-hosted engine won't come up

The self-hosted engine had an issue whereby it'd /var partition was full. A few days later the engine was rebooted; however, it did not come back up gracefully. I presently have three CentOS 7 hosts which can spin up a new hosted engine vm; however every attempt to do so results in: Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} The VM is not "up" as it does not even respond to ping, let alone having any ability to ssh or console into it to see what's wrong. All of the VMs that have been built using this engine appear to be fully operational. Is there any guidance anyone can give me? At this point I'm wondering what the consequences are to deploying a new hosted engine within that same cluster. If anyone could shed light on this matter it would be tremendously appreciated! Thanks.

On Tue, Feb 12, 2019 at 4:47 PM <joshuaosko@gmail.com> wrote:
The self-hosted engine had an issue whereby it'd /var partition was full. A few days later the engine was rebooted; however, it did not come back up gracefully. I presently have three CentOS 7 hosts which can spin up a new hosted engine vm; however every attempt to do so results in: Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
The VM is not "up" as it does not even respond to ping, let alone having any ability to ssh or console into it to see what's wrong.
check if that VM is really up with virsh -r list If so maybe it could be something with fsck previnting it from booting due to disk errors; you can try to connect with VNC since in that case neither ssh or the serial console will work due to the lack of guest OS support. You can use hosted-engine --add-console-password to add a temporary VNC password; it will also print out the connection string.
All of the VMs that have been built using this engine appear to be fully operational. Is there any guidance anyone can give me? At this point I'm wondering what the consequences are to deploying a new hosted engine within that same cluster.
If anyone could shed light on this matter it would be tremendously appreciated! Thanks. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6UENXHDTWSMYX5...

# virsh -r list Id Name State ---------------------------------------------------- 292 HostedEngine running # hosted-engine --console The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null> It appears to be listed, but the serial console appears to not be able to connect.

On Tue, Feb 12, 2019 at 5:30 PM <joshuaosko@gmail.com> wrote:
# virsh -r list Id Name State ---------------------------------------------------- 292 HostedEngine running
# hosted-engine --console The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null>
It appears to be listed, but the serial console appears to not be able to connect.
Try with vnc
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PJDZW3EQDMLSOY...

# virsh -r vncdisplay HostedEngine error: Failed to get VNC port. Is this domain using VNC? I'm seeing this

On Tue, Feb 12, 2019 at 8:52 PM <joshuaosko@gmail.com> wrote:
# virsh -r vncdisplay HostedEngine
error: Failed to get VNC port. Is this domain using VNC?
I'm seeing this
Did you tried also hosted-engine --add-console-password ?
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TGGDLKGW6VQIP6...

hosted-engine --add-console-password Enter password: [root@corp-ovirt01 ~]# virsh -r vncdisplay HostedEngine error: Failed to get VNC port. Is this domain using VNC? Yes. Sorry I forgot to copy that part. It didn't respond with a connection string. I was prompted for a password and then tried.

On Tue, Feb 12, 2019 at 9:45 PM <joshuaosko@gmail.com> wrote:
hosted-engine --add-console-password Enter password: [root@corp-ovirt01 ~]# virsh -r vncdisplay HostedEngine error: Failed to get VNC port. Is this domain using VNC?
Yes. Sorry I forgot to copy that part. It didn't respond with a connection string. I was prompted for a password and then tried.
Ok, so maybe you configured your hosted-engine VM for spice only. In that case you can: directly check spice port: virsh -r dumpxml HostedEngine |grep -i tlsPort Copy ca cert (/etc/pki/vdsm/libvirt-spice/ca-cert.pem) to your laptop. Identify the subject of /etc/pki/vdsm/libvirt-spice/server-cert.pem with: openssl x509 -in /etc/pki/vdsm/libvirt-spice/server-cert.pem -noout -subject Add a console password with hosted-engine --add-console-password Connect over spice with something like: remote-viewer --debug --spice-ca-file="/tmp/ca-cert.pem" --spice-host-subject="O=example.com, CN=host1.example.com" spice:// host1.example.com?tls-port=5900 _______________________________________________
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GIUK67T5NTHUTT...

I managed to connect briefly but the console said ok at the top and it didn't appear I had any control. I sent ctrl-alt-delete which apparently closed the session and this was the output I saw (with O and CN appropriately populated): $ remote-viewer --debug --spice-ca-file="/tmp/ca-cert.pem" --spice-host-subject="O=<O>, CN=<CN>" spice://<CN>?tls-port=5901 (remote-viewer:22260): remote-viewer-DEBUG: Opening display to spice://<CN>?tls-port=5901 (remote-viewer:22260): remote-viewer-DEBUG: Guest (null) has a spice display (remote-viewer:22260): remote-viewer-DEBUG: After open connection callback fd=-1 (remote-viewer:22260): remote-viewer-DEBUG: Opening connection to display at spice://<CN>?tls-port=5901 (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x28b2bd0 SpiceMainChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: notebook show status 0x266a2a0 (remote-viewer:22260): remote-viewer-DEBUG: main channel: auth failure (wrong username/password?) (remote-viewer:22260): remote-viewer-DEBUG: main channel: auth failure (wrong username/password?) (remote-viewer:22260): remote-viewer-DEBUG: main channel: opened (remote-viewer:22260): remote-viewer-DEBUG: notebook show status 0x266a2a0 (remote-viewer:22260): remote-viewer-DEBUG: virt_viewer_app_set_uuid_string: UUID changed to b048c0af-a6d6-42eb-9716-f1a00806e02d (remote-viewer:22260): remote-viewer-DEBUG: app is not in full screen (remote-viewer:22260): remote-viewer-DEBUG: app is not in full screen (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x27d9250 SpiceRecordChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x2a39390 SpicePlaybackChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: new audio channel (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x2684580 SpiceCursorChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x2874290 SpiceDisplayChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: New spice channel 0x285a240 SpiceInputsChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: new inputs channel (remote-viewer:22260): remote-viewer-DEBUG: creating spice display (#:0) (remote-viewer:22260): remote-viewer-DEBUG: Insert display 0 0x27ea700 (remote-viewer:22260): remote-viewer-DEBUG: Found a window without a display, reusing for this display... (remote-viewer:22260): remote-viewer-DEBUG: Zoom level not changed, using: 100 (remote-viewer:22260): remote-viewer-DEBUG: notebook show display 0x266a2a0 (remote-viewer:22260): GSpice-WARNING **: Warning no automount-inhibiting implementation available (remote-viewer:22260): remote-viewer-DEBUG: Allocated 1024x768 (remote-viewer:22260): remote-viewer-DEBUG: Child allocate 1024x768 (remote-viewer:22260): remote-viewer-DEBUG: main channel: closed (remote-viewer:22260): remote-viewer-DEBUG: Not removing main window 0 0x26678f0 (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpiceInputsChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpiceDisplayChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: zap display channel (#0) (remote-viewer:22260): remote-viewer-DEBUG: Destroying spice display 0x27ea700 (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpiceCursorChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpicePlaybackChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: zap audio channel (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpiceRecordChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: Destroy SPICE channel SpiceMainChannel 0 (remote-viewer:22260): remote-viewer-DEBUG: zap main channel (remote-viewer:22260): remote-viewer-DEBUG: notebook show status 0x266a2a0 (remote-viewer:22260): remote-viewer-DEBUG: Guest HostedEngine display has disconnected, shutting down (remote-viewer:22260): remote-viewer-DEBUG: Disposing window 0x26678f0 (remote-viewer:22260): remote-viewer-DEBUG: Set connect info: (null),(null),(null),-1,(null),(null),(null),0 Now when I attempt to reconnect I see: $ remote-viewer --debug --spice-ca-file="/tmp/ca-cert.pem" --spice-host-subject="O=<O>, CN=<CN>" spice://<CN>?tls-port=5901 (remote-viewer:22497): remote-viewer-DEBUG: Opening display to spice://<CN>?tls-port=5901 (remote-viewer:22497): remote-viewer-DEBUG: Guest (null) has a spice display (remote-viewer:22497): remote-viewer-DEBUG: After open connection callback fd=-1 (remote-viewer:22497): remote-viewer-DEBUG: Opening connection to display at spice://<CN>?tls-port=5901 (remote-viewer:22497): remote-viewer-DEBUG: New spice channel 0x9b1c40 SpiceMainChannel 0 (remote-viewer:22497): remote-viewer-DEBUG: notebook show status 0x7662a0 (remote-viewer:22497): remote-viewer-DEBUG: main channel: failed to connect Could not connect to <CN>: Connection refused (remote-viewer:22497): remote-viewer-DEBUG: Destroy SPICE channel SpiceMainChannel 0 (remote-viewer:22497): remote-viewer-DEBUG: zap main channel (remote-viewer:22497): remote-viewer-DEBUG: Disposing window 0x7648f0 (remote-viewer:22497): remote-viewer-DEBUG: Set connect info: (null),(null),(null),-1,(null),(null),(null),0 And attempting to provide the console password results in: # hosted-engine --add-console-password Enter password: Command VM.setTicket with args {'params': {}, 'password': '<password>', 'vmID': 'b048c0af-a6d6-42eb-9716-f1a00806e02d', 'existingConnAction': 'keep', 'ttl': '120'} failed: (code=100, message=General Exception: ("VM 'b048c0af-a6d6-42eb-9716-f1a00806e02d' was not defined yet or was undefined",)) Thanks for all of the help Simone! Really appreciate it.

It appears the engine is down entirely now and hosted-engine --vm-start doesn't appear to change anything. Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"}

I have noticed that sometimes virsh shows the real error (like firewalld being stopped).Can you try to start paused and then ask virsh to resume:hosted-engine --vm-start-paused virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf list virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume HostedEngine Best Regards,Strahil Nikolov В четвъртък, 14 февруари 2019 г., 19:39:35 ч. Гринуич+2, joshuaosko@gmail.com <joshuaosko@gmail.com> написа: It appears the engine is down entirely now and hosted-engine --vm-start doesn't appear to change anything. Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2CADFURYFT5ULV...
participants (3)
-
joshuaosko@gmail.com
-
Simone Tiraboschi
-
Strahil Nikolov