[ovirt-users] Failed HostedEngine Deployment

23 Jan 2022

      Greetings oVirt people,

I am having a problem with the hosted-engine deployment, and unfortunately after a weekend spent trying to get this far, I am finally stuck, and cannot figure out how to fix this.

I am starting with 1 host, and will have 4 when this is finished.  Storage is GlusterFS, hyperconverged, but I am managing that myself outside of oVirt. It's a single-node GlusterFS volume, which I will expand out across the other 4 nodes as well.  I get all the way through the initial hosted-engine deployment (via the cockpit interface) pre-storage, then get most of the way through the storage portion of it.  It fails at starting the HostedEngine VM in its final state after copying the VM disk to shared storage.

This is where it gets weird.

[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is while the engine's he_fqdn ovirt.deleted.domain resolves to 192.168.x.x. If you are using DHCP, check your DHCP reservation configuration"}

I've masked out the domain and IP for obvious reasons.  However I think this deployment error isn't really the reason for the failure, it's just where it is at when it fails.  The HostedEngine VM is starting, but not actually booting.   I was able to change the VNC password with `hosted-engine --add-console-password`, and see the local console display with that, however it just displays "The guest has not initialized the display (yet)".

I also did:

# hosted-engine --console
The engine VM is running on this host
Escape character is ^]

Yet that doesn't move any further, nor allow any input.  The VM does not respond on the network.  I am thinking it's just not making it to the initial BIOS screen and booting at all.  What would cause that?

Here is the glusterfs volume for clarity.

# gluster volume info storage

Volume Name: storage
Type: Distribute
Volume ID: e9544310-8890-43e3-b49c-6e8c7472dbbb
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node1:/var/glusterfs/storage/1
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
network.ping-timeout: 5
performance.client-io-threads: on
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 1024
cluster.locking-scheme: full
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.strict-o-direct: on
network.remote-dio: disable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz
stepping : 9
microcode : 0x21
cpu MHz : 4000.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips : 7199.86
clflush size : 64
cache_alignment: 64
address sizes : 36 bits physical, 48 bits virtual
power management:

[ plus 7 more ]

Thanks for any insight that can be provided.

[ovirt-users] Failed HostedEngine Deployment

Robert Tongue