Barebone Hosted Engine Deployment fails

Hey people, I hope someone can help me identify if there's something I'm doing wrong or if there's a bug. Originally I wanted to re-deploy my self-hosted engine, as it is still on version 4.5.5 with CentOS 8, and therefore can't be updated anymore. To get a fresh backup of the currently running config I did the following: 1. On physical host `hosted-engine --set-maintenance --mode=global` 2. On engine VM `systemctl stop ovirt-engine` and then 3. `engine-backup --scope=all --mode=backup --file=/mnt/ovirt-engine-backup/ovirt-engine-4.5.5-backup.bck --log=/mnt/ovirt-engine-backup/ovirt-engine-4.5.5-backup.log` (where `/mnt/ovirt-engine-backup/` is a NFS share) 4. On physical host `hosted-engine --vm-shutdown` Then I set up a new physical host with CentOS 9 Stream (Build 20250320), enabled and installed the oVirt repository as mentioned at https://www.ovirt.org/download/install_on_rhel.html, then ran: `hosted-engine --deploy --4 --restore-from-file=/mnt/ovirt-engine-backup/ovirt-engine-4.5.5-backup.bck` (of course mounting the NFS share prior to this). But after giving it all the information it needed it fails after 15-20 minutes with several errors in the logfile, which I cannot identify the primary/root cause of. The first error that appears is this one: ``` 2025-03-25 11:19:35,427+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Include after engine-setup custom tasks files for the engine VM] 2025-03-25 11:19:36,730+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Wait for the engine to reach a stable condition] 2025-03-25 11:19:37,431+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 skipping: [localhost] 2025-03-25 11:19:38,133+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Configure LibgfApi support] 2025-03-25 11:19:38,835+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 skipping: [localhost] 2025-03-25 11:19:39,536+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Save original OvfUpdateIntervalInMinutes] 2025-03-25 11:19:42,041+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 changed: [localhost -> 192.168.222.170] 2025-03-25 11:19:42,742+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Set OVF update interval to 1 minute] 2025-03-25 11:19:45,147+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 {'changed': True, 'stdout': 'Index 1 out of bounds for length 1', 'stderr': 'Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false', 'rc': 1, 'cmd': ['engine-config', '-s', 'OvfUpdateIntervalInMinutes=1'], 'start': '2025-03-25 11:19:43.872322', 'end': '2025-03-25 11:19:44.941216', 'delta': '0:00:01.068894', 'msg': 'non-zero return code', 'invocation': {'module_args': {'_raw_params': 'engine-config -s OvfUpdateIntervalInMinutes=1', '_uses_shell': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ['Index 1 out of bounds for length 1'], 'stderr_lines': ['Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false'], '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_host': '192.168.222.170', 'ansible_port': None, 'ansible_user': 'root', 'ansible_ connection': 'smart'}} 2025-03-25 11:19:45,248+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:113 fatal: [localhost -> 192.168.222.170]: FAILED! => {"changed": true, "cmd": ["engine-config", "-s", "OvfUpdateIntervalInMinutes=1"], "delta": "0:00:01.068894", "end": "2025-03-25 11:19:44.941216", "msg": "non-zero return code", "rc": 1, "start": "2025-03-25 11:19:43.872322", "stderr": "Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false", "stderr_lines": ["Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false"], "stdout": "Index 1 out of bounds for length 1", "stdout_lines": ["Index 1 out of bounds for length 1"]} ``` The full log is available here: https://filebin.net/dimi5g2o6q20t4aj I also tried with a completely clean deployment, without restoring from backup (once without and once with using `ovirt-hosted-engine-cleanup` in between). The errors in the log are completely the same. Also I tried to use the oVirt Node master experimental ISO (`ovirt-node-ng-installer-4.5.6-2025031111.c9s.iso`), same issue. Does the problem lie somewhere in my answers for the questions asked by the tool? I cannot identify giving any wrong info. Can someone maybe reproduce the issue?

Sorry, filebin.net uploads expire after a week, didn't pay attention before posting. Here ist the full log on pastebin: https://pastebin.com/a7BZktB7

For some reason, the pastebin has also been deleted. I've created a new one with a registered account now: https://pastebin.com/S5HBqydf

I can confirm the same behaviour at the Ansible playbook location "Set OVF update interval to 1 minute". From what I can tell this step is supposed to run the following on the engine VM: engine-config -s OvfUpdateIntervalInMinutes 1 I tried to run this command manually: - SSH into the VM's IP address as root and Engine password - execute the command at the prompt I receive the same error message in the following sequence: $ engine-config -s OvfUpdateIntervalInMinutes 1 Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false Index 1 out of bounds for length 1 If I run the following, I should get a list of all parameters: $ engine-config --list Picked up JAVA_TOOL_OPTIONS: -Dcom.redhat.fips=false Index 1 out of bounds for length 1 This fails with the same error sequence. I am running a CentOS Stream 9 minimal installation and following instructions known to work against appliance 'ovirt-engine-appliance-4.5-20231201120201.1.el9.x86_64' (which made use of CS8-based repos). I am trying to get things working with the most recent 'ovirt-engine-appliance-4.5-20240817071039.1.el9.x86_64' image downloaded manually as an RPM. I haven't tried to retrace my steps in detail but, running into this error and looking at the list here, I wanted to mention that I was seeing the same. I will try to report back if I find something more and will keep an eye out here as well. Thank you! Dan

Good Evening - I wanted to let you know that I troubleshot this issue and have created a bug. Github: https://github.com/oVirt/ovirt-engine/issues/1016 If you are familiar with Ansible, you could also implement a work-around but it involves making a few changes to have a mechanism to pause/wait after the "TASK [ovirt.ovirt.engine_setup : Update all packages]" step during hosted-engine deploy. Here is what I did: - Modified Ansible task "engine_setup.yml" to include a "pause_execution.yml" task immediately after the "Update all packages" step. This is located in "/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/engine_setup/tasks/engine_setup.yml" on the host you are executing "hosted-engine --deploy" on. - included a copy of "pause_execution.yml" in the engine_setup role folder (ends up being ""/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/engine_setup/tasks/pause_execution.yml" - borrowing a copy from the "hosted_engine_setup" ansible role folder which is adjacent in the file system) The net result of these steps means the next time you execute the hosted-engine --deploy you can specify the ansible var that triggers the pause: For example: sudo hosted-engine --deploy --4 --ansible-extra-vars=he_pause_before_engine_setup=true You will be prompted for two pauses - the first time, identify and delete the lock file. The second time SSH, as the root user with the password specified during the hosted-engine deploy questionnaire, into the new ovirt-engine-appliance VM (the IP address is provided in the hosted-engine console output - for example 192.168.222.123) from another session on the host machine -- "ssh root@192.168.222.123" for example. Navigate to /etc/ovirt-engine/engine-config/engine-config.properties and vi the file. Find "ServerRebootSleepTime=Integer" (around Line 120) and edit this to read "ServerRebootSleepTime.type=Integer". Save and delete the lock file. The hosted-engine deployment should continue rather than fail. You can also verify, once you SSH into the new engine VM, that the problem exists and is fixed: To confirm the issue prior to changing engine-config.properties: "engine-config -l" should throw the "Index 1 out of bounds for length 1" error. After changing engine-config.properties: "engine-config -l" should tell you that it cannot find the database (which is expected because engine-setup hasn't run yet). If you are interested in pursuing this further (or if you run into trouble), please let me know and maybe we can figure out how to coordinate the details. I am hopeful that the source of the issue can be easily fixed and propagated into the official RPM to avoid it altogether. In any case, I really hope this helps! Best Regards, Dan

Following up - I can confirm that the following combination worked properly as of 2025-May-5 with the changes committed against the case linked in my comment above: oVirt Node: ovirt-node-ng-installer-4.5.6-2025031111.c9s.iso (latest available at the time of testing) oVirt Appliance: ovirt-engine-appliance-4.5-20240817071039.1.el9.x86_64.rpm (latest available at time of testing)
participants (3)
-
Kevin Köllmann
-
kevin@kllmnn.de
-
lmd@gto.net