Hosted Engine deploy failed on new HCI build

Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached). This is the current status: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : MASKED Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3fa48e03 local_conf_timestamp : 8468 Host timestamp : 8468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8468 (Mon Jul 30 14:21:09 2018) host-id=1 score=3400 vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False If I do hosted-engine --console I get: The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null> does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further. Log of deploy attached: Thanks!

I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does. I still feel like it could be network related in some way just not sure how. Any ideas? On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com> wrote:
Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached).
This is the current status:
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : MASKED Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3fa48e03 local_conf_timestamp : 8468 Host timestamp : 8468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8468 (Mon Jul 30 14:21:09 2018) host-id=1 score=3400 vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False
If I do hosted-engine --console I get:
The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null>
does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further.
Log of deploy attached:
Thanks!

That happened to me twice… the second time I figure it out and it was networking. I am not familiar with the hosted-engine ---console… The person that helped me said to do the following: Run on your first host hosted-engine --add-console-password to set a temporary VNC password and then connect to it over VNC with something like remote-viewer vnc://<host>:<port> Which got me in and allowed me to fix the networking once I saw what was wrong… can you get to the console like that? Regards Bill From: Jayme [mailto:jaymef@gmail.com] Sent: Monday, July 30, 2018 3:38 PM To: users <users@ovirt.org> Subject: [ovirt-users] Re: Hosted Engine deploy failed on new HCI build I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does. I still feel like it could be network related in some way just not sure how. Any ideas? On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com <mailto:jaymef@gmail.com> > wrote: Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached). This is the current status: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : MASKED Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3fa48e03 local_conf_timestamp : 8468 Host timestamp : 8468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8468 (Mon Jul 30 14:21:09 2018) host-id=1 score=3400 vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False If I do hosted-engine --console I get: The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null> does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further. Log of deploy attached: Thanks!

Bill, I think this is the same problem. I figured out the vnc stuff earlier and got a console to the engine and got its status up but I don't know if that's enough. I feel like all steps haven't been completed. Once you figured out what was going on did you have to redeploy the engine again? On Mon, Jul 30, 2018 at 9:49 PM, William Dossett <william.dossett@gmail.com> wrote:
That happened to me twice… the second time I figure it out and it was networking.
I am not familiar with the hosted-engine ---console…
The person that helped me said to do the following:
Run on your first host hosted-engine --add-console-password to set a temporary VNC password and then connect to it over VNC with something like remote-viewer vnc://<host>:<port>
Which got me in and allowed me to fix the networking once I saw what was wrong… can you get to the console like that?
Regards
Bill
*From:* Jayme [mailto:jaymef@gmail.com] *Sent:* Monday, July 30, 2018 3:38 PM *To:* users <users@ovirt.org> *Subject:* [ovirt-users] Re: Hosted Engine deploy failed on new HCI build
I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does.
I still feel like it could be network related in some way just not sure how. Any ideas?
On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com> wrote:
Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached).
This is the current status:
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : MASKED
Host ID : 1
Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 3fa48e03
local_conf_timestamp : 8468
Host timestamp : 8468
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=8468 (Mon Jul 30 14:21:09 2018)
host-id=1
score=3400
vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
If I do hosted-engine --console I get:
The engine VM is running on this host
Connected to domain HostedEngine
Escape character is ^]
error: internal error: cannot find character device <null>
does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further.
Log of deploy attached:
Thanks!

On Tue, Jul 31, 2018 at 2:51 AM William Dossett <william.dossett@gmail.com> wrote:
That happened to me twice… the second time I figure it out and it was networking.
I am not familiar with the hosted-engine ---console…
The issue with the console device will be fixed for new deployment as for https://bugzilla.redhat.com/show_bug.cgi?id=1561964 If you deployed in the past, you have to edit the definition of the engine VM on the engine enabling the serial console device and reboot it. in the mean time you can also use VNC as detailed below.
The person that helped me said to do the following:
Run on your first host hosted-engine --add-console-password to set a temporary VNC password and then connect to it over VNC with something like remote-viewer vnc://<host>:<port>
Which got me in and allowed me to fix the networking once I saw what was wrong… can you get to the console like that?
Regards
Bill
*From:* Jayme [mailto:jaymef@gmail.com] *Sent:* Monday, July 30, 2018 3:38 PM *To:* users <users@ovirt.org> *Subject:* [ovirt-users] Re: Hosted Engine deploy failed on new HCI build
I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does.
I still feel like it could be network related in some way just not sure how. Any ideas?
On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com> wrote:
Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached).
This is the current status:
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : MASKED
Host ID : 1
Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
This indicates that the engine VM is up at libvirt eyes but the engine could not be reached over the network. I'd suggest to open a console to the engine VM (also the VNC one) and check it's network configuration. If you are using DHCP, do you have a working reservation for that?
Score : 3400
stopped : False
Local maintenance : False
crc32 : 3fa48e03
local_conf_timestamp : 8468
Host timestamp : 8468
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=8468 (Mon Jul 30 14:21:09 2018)
host-id=1
score=3400
vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
If I do hosted-engine --console I get:
The engine VM is running on this host
Connected to domain HostedEngine
Escape character is ^]
error: internal error: cannot find character device <null>
does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further.
Log of deploy attached:
Thanks!
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WQDUJCZ46GMWDK...

I'm going to try a full rebuild again and see if I can get it working this time. I will pay more attention to the networking and see if I can figure out what is going on. How much more work does the scripts do after this step I'm failing at? Is there any way to redeploy the engine without having to wipe everything out? the last time I tried I got about 75% through it but the gluster storage info didn't auto-populate on a second run so I had to put it in manually but failed on a later step. On Tue, Jul 31, 2018 at 9:35 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Tue, Jul 31, 2018 at 2:51 AM William Dossett <william.dossett@gmail.com> wrote:
That happened to me twice… the second time I figure it out and it was networking.
I am not familiar with the hosted-engine ---console…
The issue with the console device will be fixed for new deployment as for https://bugzilla.redhat.com/show_bug.cgi?id=1561964
If you deployed in the past, you have to edit the definition of the engine VM on the engine enabling the serial console device and reboot it. in the mean time you can also use VNC as detailed below.
The person that helped me said to do the following:
Run on your first host hosted-engine --add-console-password to set a temporary VNC password and then connect to it over VNC with something like remote-viewer vnc://<host>:<port>
Which got me in and allowed me to fix the networking once I saw what was wrong… can you get to the console like that?
Regards
Bill
*From:* Jayme [mailto:jaymef@gmail.com] *Sent:* Monday, July 30, 2018 3:38 PM *To:* users <users@ovirt.org> *Subject:* [ovirt-users] Re: Hosted Engine deploy failed on new HCI build
I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does.
I still feel like it could be network related in some way just not sure how. Any ideas?
On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com> wrote:
Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached).
This is the current status:
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : MASKED
Host ID : 1
Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
This indicates that the engine VM is up at libvirt eyes but the engine could not be reached over the network. I'd suggest to open a console to the engine VM (also the VNC one) and check it's network configuration. If you are using DHCP, do you have a working reservation for that?
Score : 3400
stopped : False
Local maintenance : False
crc32 : 3fa48e03
local_conf_timestamp : 8468
Host timestamp : 8468
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=8468 (Mon Jul 30 14:21:09 2018)
host-id=1
score=3400
vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
If I do hosted-engine --console I get:
The engine VM is running on this host
Connected to domain HostedEngine
Escape character is ^]
error: internal error: cannot find character device <null>
does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further.
Log of deploy attached:
Thanks!
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/WQDUJCZ46GMWDKKYIQZXGVKIORLT3VBN/

Can you connect to it with VNC? That was a new one on me … since last working with this. Actually this is the first time I did hosted engine deploy as I used to use a VM on my vmware infra to do it so had no idea I could connect to the hosted engine console until Simone helped me. Hopefully you can connect with VNC … I was able to find the networking issue very quickly that way, a LOT faster than starting over. I wiped and started over like 5 times, but learned something each time. I’m around most of today on and off and trying to fix some code for another project but happy to have a look with you if you want to ping me. Regards Bill From: Simone Tiraboschi [mailto:stirabos@redhat.com] Sent: Tuesday, July 31, 2018 6:35 AM To: William Dossett <william.dossett@gmail.com> Cc: Jayme <jaymef@gmail.com>; users <users@ovirt.org> Subject: Re: [ovirt-users] Re: Hosted Engine deploy failed on new HCI build On Tue, Jul 31, 2018 at 2:51 AM William Dossett <william.dossett@gmail.com <mailto:william.dossett@gmail.com> > wrote: That happened to me twice… the second time I figure it out and it was networking. I am not familiar with the hosted-engine ---console… The issue with the console device will be fixed for new deployment as for https://bugzilla.redhat.com/show_bug.cgi?id=1561964 If you deployed in the past, you have to edit the definition of the engine VM on the engine enabling the serial console device and reboot it. in the mean time you can also use VNC as detailed below. The person that helped me said to do the following: Run on your first host hosted-engine --add-console-password to set a temporary VNC password and then connect to it over VNC with something like remote-viewer vnc://<host>:<port> Which got me in and allowed me to fix the networking once I saw what was wrong… can you get to the console like that? Regards Bill From: Jayme [mailto:jaymef@gmail.com <mailto:jaymef@gmail.com> ] Sent: Monday, July 30, 2018 3:38 PM To: users <users@ovirt.org <mailto:users@ovirt.org> > Subject: [ovirt-users] Re: Hosted Engine deploy failed on new HCI build I haven't had much luck with this yet I completely wiped the three hosts and did the entire install over again from the ground up only this time I used dhcp instead of static IP for the hostedengine deployment and ended up failing again in the exact step as before, waiting for the VM to come back but never does. I still feel like it could be network related in some way just not sure how. Any ideas? On Mon, Jul 30, 2018, 2:25 PM Jayme, <jaymef@gmail.com <mailto:jaymef@gmail.com> > wrote: Latest version of oVirt node 4.2 installed on three hosts. I completed successfully the cockpit gdeploy process to deploy HCI. All of that went well with no errors. I then proceeded to the hosted engine deployment step which eventually failed (log attached). This is the current status: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : MASKED Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} This indicates that the engine VM is up at libvirt eyes but the engine could not be reached over the network. I'd suggest to open a console to the engine VM (also the VNC one) and check it's network configuration. If you are using DHCP, do you have a working reservation for that? Score : 3400 stopped : False Local maintenance : False crc32 : 3fa48e03 local_conf_timestamp : 8468 Host timestamp : 8468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8468 (Mon Jul 30 14:21:09 2018) host-id=1 score=3400 vm_conf_refresh_time=8468 (Mon Jul 30 14:21:09 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False If I do hosted-engine --console I get: The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] error: internal error: cannot find character device <null> does anyone know why it may have failed or what I could do to recover from this? I'm thinking it could have potentially failed due to some problem with network config. If I could get a console in to the engine VM I might be able to fix it but that serial error above is preventing me from reaching the vm console to diagnose further. Log of deploy attached: Thanks! _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WQDUJCZ46GMWDK...
participants (3)
-
Jayme
-
Simone Tiraboschi
-
William Dossett