Hosted engine deployment fails consistently when trying to download files.

Hello, I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean. Any idea what's going on? [1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook')) - Gilboa

On Fri, Jun 12, 2020 at 1:49 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean.
Any idea what's going on?
[1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook'))
This snippet does not reveal the cause for failure, or the exact place where it happened. Can you please check/share the full file, as long as perhaps other files in /var/log/ovirt-hosted-engine-setup (and maybe others in /var/log)? Thanks! Best regards, -- Didi

On Mon, Jun 15, 2020 at 9:13 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Fri, Jun 12, 2020 at 1:49 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean.
Any idea what's going on?
[1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook'))
This snippet does not reveal the cause for failure, or the exact place where it happened. Can you please check/share the full file, as long as perhaps other files in /var/log/ovirt-hosted-engine-setup (and maybe others in /var/log)? Thanks!
Best regards, -- Didi
H, Compressed tar.bz2 of ovirt-hosted-engine-setup attached. Please let me know if you need additional log files. (/var/log/messages seems rather empty) - Gilboa

On Mon, Jun 15, 2020 at 11:21 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 9:13 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Fri, Jun 12, 2020 at 1:49 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean.
Any idea what's going on?
[1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook'))
This snippet does not reveal the cause for failure, or the exact place where it happened. Can you please check/share the full file, as long as perhaps other files in /var/log/ovirt-hosted-engine-setup (and maybe others in /var/log)? Thanks!
Best regards, -- Didi
H,
Compressed tar.bz2 of ovirt-hosted-engine-setup attached. Please let me know if you need additional log files. (/var/log/messages seems rather empty)
Ok, it's failing in the task "Install oVirt Engine package", which tries to install/upgrade the package 'ovirt-engine' on the engine VM. Can you try to do this manually and see if it works? At this stage, the engine VM is on libvirt's default network (private), you can find the temporary address by searching the log for local_vm_ip, which is 192.168.1.173, in your log. Good luck and best regards, -- Didi

On Mon, Jun 15, 2020 at 11:46 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Jun 15, 2020 at 11:21 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 9:13 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Fri, Jun 12, 2020 at 1:49 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean.
Any idea what's going on?
[1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook'))
This snippet does not reveal the cause for failure, or the exact place where it happened. Can you please check/share the full file, as long as perhaps other files in /var/log/ovirt-hosted-engine-setup (and maybe others in /var/log)? Thanks!
Best regards, -- Didi
H,
Compressed tar.bz2 of ovirt-hosted-engine-setup attached. Please let me know if you need additional log files. (/var/log/messages seems rather empty)
Ok, it's failing in the task "Install oVirt Engine package", which tries to install/upgrade the package 'ovirt-engine' on the engine VM. Can you try to do this manually and see if it works?
At this stage, the engine VM is on libvirt's default network (private), you can find the temporary address by searching the log for local_vm_ip, which is 192.168.1.173, in your log.
Good luck and best regards, -- Didi
You are correct. $ dnf install -y ovirt-engine Problem: package ovirt-engine-4.4.0.3-1.el8.noarch requires apache-commons-jxpath, but none of the providers can be installed - conflicting requests - package apache-commons-jxpath-1.3-29.module_el8.0.0+30+832da3a1.noarch is excluded (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) - Gilboa

On Mon, Jun 15, 2020 at 4:54 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 11:46 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Jun 15, 2020 at 11:21 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 9:13 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Fri, Jun 12, 2020 at 1:49 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
I'm trying to deploy a hosted engine on one of my test setups. No matter how I tried to deploy the hosted engine, either via command line or via "Hosted Engine" deployment from the cockpit web console, I always fails with the same error message. [1] Manually trying to download RPMs via dnf from the host, work just fine. Firewall log files are clean.
Any idea what's going on?
[1] 2020-06-12 06:09:38,609-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': "Failed to download metadata for repo 'AppStream'", 'results': [], 'rc': 1, 'invocation': {'module_args': {'name': ['ovirt-engine'], 'state': 'present', 'allow_downgrade': False, 'autoremove': False, 'bugfix': False, 'disable_gpg_check': False, 'disable_plugin': [], 'disablerepo': [], 'down load_only': False, 'enable_plugin': [], 'enablerepo': [], 'exclude': [], 'installroot': '/', 'install_repoquery': True, 'install_weak_deps': True, 'security': False, 'skip_broken': False, 'update_cache': False, 'update_only': False, 'validate_certs': True, 'lock_timeout': 30, 'conf_file': None, 'disable_excludes': None, 'download_dir': None, 'list': None, 'releasever': None}}, '_ansible_no_log': False, 'changed ': False, '_ansible_delegated_vars': {'ansible_host': 'test-vmengine.localdomain'}} 2020-06-12 06:09:38,709-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost -> gilboa-wx-vmovirt.localdomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream'", "rc": 1, "results": []} 2020-06-12 06:09:39,711-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 183 changed: 57 unreachable: 0 skipped: 77 failed: 1 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:215 ansible-playbook rc: 2 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:222 ansible-playbook stdout: 2020-06-12 06:09:39,812-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:225 ansible-playbook stderr: 2020-06-12 06:09:39,812-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook'))
This snippet does not reveal the cause for failure, or the exact place where it happened. Can you please check/share the full file, as long as perhaps other files in /var/log/ovirt-hosted-engine-setup (and maybe others in /var/log)? Thanks!
Best regards, -- Didi
H,
Compressed tar.bz2 of ovirt-hosted-engine-setup attached. Please let me know if you need additional log files. (/var/log/messages seems rather empty)
Ok, it's failing in the task "Install oVirt Engine package", which tries to install/upgrade the package 'ovirt-engine' on the engine VM. Can you try to do this manually and see if it works?
At this stage, the engine VM is on libvirt's default network (private), you can find the temporary address by searching the log for local_vm_ip, which is 192.168.1.173, in your log.
Good luck and best regards, -- Didi
You are correct. $ dnf install -y ovirt-engine Problem: package ovirt-engine-4.4.0.3-1.el8.noarch requires apache-commons-jxpath, but none of the providers can be installed - conflicting requests - package apache-commons-jxpath-1.3-29.module_el8.0.0+30+832da3a1.noarch is excluded (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Is this on the engine machine? Or elsewhere? Generally speaking, you need, for this, to first enable the module 'javapackages-tools'. But assuming that you are deploying the engine VM from ovirt-engine-appliance, the image should already have this done, so above error should not happen there. No idea why it does, if indeed it's there and not e.g. on the host (where you should not need to install the engine). Can you please clarify where you try this? Best regards, -- Didi

On Mon, Jun 15, 2020 at 5:18 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Jun 15, 2020 at 4:54 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 11:46 AM Yedidyah Bar David <didi@redhat.com> wrote:
Ok, it's failing in the task "Install oVirt Engine package", which tries to install/upgrade the package 'ovirt-engine' on the engine VM. Can you try to do this manually and see if it works?
At this stage, the engine VM is on libvirt's default network (private), you can find the temporary address by searching the log for local_vm_ip, which is 192.168.1.173, in your log.
Good luck and best regards, -- Didi
You are correct. $ dnf install -y ovirt-engine Problem: package ovirt-engine-4.4.0.3-1.el8.noarch requires apache-commons-jxpath, but none of the providers can be installed - conflicting requests - package apache-commons-jxpath-1.3-29.module_el8.0.0+30+832da3a1.noarch is excluded (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Is this on the engine machine? Or elsewhere?
Generally speaking, you need, for this, to first enable the module 'javapackages-tools'. But assuming that you are deploying the engine VM from ovirt-engine-appliance, the image should already have this done, so above error should not happen there. No idea why it does, if indeed it's there and not e.g. on the host (where you should not need to install the engine). Can you please clarify where you try this?
Best regards, -- Didi
Managed to find the reason for the failure. Wrong documentation order in one document [1], missing documentation on another [2]. [1] https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... [2] https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... The the first document, the cockpit part (5.2) actually precedes the oVirt engine module list (5.3). The second document is missing the modules part. Now I can manually install ovirt-engine on the host. That said, still no go. Both cockpit based deployment and command line based deployment (on the first node) fails. Log attached. - Gilboa - Gilboa

On Mon, Jun 15, 2020 at 8:20 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 5:18 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Jun 15, 2020 at 4:54 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 15, 2020 at 11:46 AM Yedidyah Bar David <didi@redhat.com> wrote:
Ok, it's failing in the task "Install oVirt Engine package", which tries to install/upgrade the package 'ovirt-engine' on the engine VM. Can you try to do this manually and see if it works?
At this stage, the engine VM is on libvirt's default network (private), you can find the temporary address by searching the log for local_vm_ip, which is 192.168.1.173, in your log.
Good luck and best regards, -- Didi
You are correct. $ dnf install -y ovirt-engine Problem: package ovirt-engine-4.4.0.3-1.el8.noarch requires apache-commons-jxpath, but none of the providers can be installed - conflicting requests - package apache-commons-jxpath-1.3-29.module_el8.0.0+30+832da3a1.noarch is excluded (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Is this on the engine machine? Or elsewhere?
Generally speaking, you need, for this, to first enable the module 'javapackages-tools'. But assuming that you are deploying the engine VM from ovirt-engine-appliance, the image should already have this done, so above error should not happen there. No idea why it does, if indeed it's there and not e.g. on the host (where you should not need to install the engine). Can you please clarify where you try this?
Best regards, -- Didi
Managed to find the reason for the failure. Wrong documentation order in one document [1], missing documentation on another [2]. [1] https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... [2] https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... The the first document, the cockpit part (5.2) actually precedes the oVirt engine module list (5.3).
Section 5.3 is not actually needed, the appliance already includes it.
The second document is missing the modules part.
(Seems to me to include it as well, but again, it's not needed)
Now I can manually install ovirt-engine on the host.
I think you got it wrong. Please see my previous posts in this thread. In particular: 1. You do not need to install ovirt-engine on the host. 2. You should try to see why it failed, on the engine VM. 3. This engine vm should be left running by the failed deploy process, and you can find its IP address by searching the setup logs for 'local_vm_ip'. So please find it, ssh to it from the host (using the root password you supplied), then try 'dnf install ovirt-engine'. If it fails, try to fix the problem. My guess: Some kind of networking/mirror/proxy/whatever issue. For an overview of the current hosted-engine deploy process, you can have a look at "Hosted Engine 4.3 Deep Dive (Simone Tiraboschi)" in: https://www.ovirt.org/community/get-involved/resources/slide-decks.html It was written for 4.3, but most of it applies also for 4.4. Also: The engine package is already part of the appliance, so strictly speaking, we should not try to install it. This is done so that if a newer version is available (e.g. because you installed an older appliance image rpm), you upgrade to the newer version. We have a bug open for allowing to prevent this, not sure about its current status - see also the discussion in the linked github issue: https://bugzilla.redhat.com/show_bug.cgi?id=1816619
That said, still no go. Both cockpit based deployment and command line based deployment (on the first node) fails. Log attached.
Thanks. As you can see there, it's still failing for the same reason: Failed to download metadata for repo 'AppStream' Good luck and best regards, -- Didi

On Wed, Jun 17, 2020 at 10:12 AM Yedidyah Bar David <didi@redhat.com> wrote:
Section 5.3 is not actually needed, the appliance already includes it.
The second document is missing the modules part.
(Seems to me to include it as well, but again, it's not needed)
Now I can manually install ovirt-engine on the host.
I think you got it wrong. Please see my previous posts in this thread. In particular:
1. You do not need to install ovirt-engine on the host. 2. You should try to see why it failed, on the engine VM. 3. This engine vm should be left running by the failed deploy process, and you can find its IP address by searching the setup logs for 'local_vm_ip'. So please find it, ssh to it from the host (using the root password you supplied), then try 'dnf install ovirt-engine'. If it fails, try to fix the problem. My guess: Some kind of networking/mirror/proxy/whatever issue.
For an overview of the current hosted-engine deploy process, you can have a look at "Hosted Engine 4.3 Deep Dive (Simone Tiraboschi)" in:
https://www.ovirt.org/community/get-involved/resources/slide-decks.html
It was written for 4.3, but most of it applies also for 4.4.
Also:
The engine package is already part of the appliance, so strictly speaking, we should not try to install it. This is done so that if a newer version is available (e.g. because you installed an older appliance image rpm), you upgrade to the newer version.
We have a bug open for allowing to prevent this, not sure about its current status - see also the discussion in the linked github issue:
Hello, My mistake. I think there is a miss-understanding on my end that side-stepped the main issue. Before I begin, I'd like to point out that I have a number of oVirt 4.3 deployments in production. Some installed via cockpit, others manually via hosted-engine --deploy. I'm attempting to cleanly test-install 4.4 on a CentOS 8.x machine before I begin (slowly) upgrading my existing 4.3 setups to 4.4. However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase. The reason I mocked around with the java module (on the host) was in an attempt to see if the _host_ machine somehow had the wrong module configuration, triggering the hosted engine deployment process. Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host. Sorry again for the misunderstanding. - Gilboa

On Wed, Jun 17, 2020 at 11:39 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Wed, Jun 17, 2020 at 10:12 AM Yedidyah Bar David <didi@redhat.com> wrote:
Section 5.3 is not actually needed, the appliance already includes it.
The second document is missing the modules part.
(Seems to me to include it as well, but again, it's not needed)
Now I can manually install ovirt-engine on the host.
I think you got it wrong. Please see my previous posts in this thread. In particular:
1. You do not need to install ovirt-engine on the host. 2. You should try to see why it failed, on the engine VM. 3. This engine vm should be left running by the failed deploy process, and you can find its IP address by searching the setup logs for 'local_vm_ip'. So please find it, ssh to it from the host (using the root password you supplied), then try 'dnf install ovirt-engine'. If it fails, try to fix the problem. My guess: Some kind of networking/mirror/proxy/whatever issue.
For an overview of the current hosted-engine deploy process, you can have a look at "Hosted Engine 4.3 Deep Dive (Simone Tiraboschi)" in:
https://www.ovirt.org/community/get-involved/resources/slide-decks.html
It was written for 4.3, but most of it applies also for 4.4.
Also:
The engine package is already part of the appliance, so strictly speaking, we should not try to install it. This is done so that if a newer version is available (e.g. because you installed an older appliance image rpm), you upgrade to the newer version.
We have a bug open for allowing to prevent this, not sure about its current status - see also the discussion in the linked github issue:
Hello,
My mistake. I think there is a miss-understanding on my end that side-stepped the main issue.
Before I begin, I'd like to point out that I have a number of oVirt 4.3 deployments in production.
OK.
Some installed via cockpit, others manually via hosted-engine --deploy.
I'm attempting to cleanly test-install 4.4 on a CentOS 8.x machine before I begin (slowly) upgrading my existing 4.3 setups to 4.4.
Makes sense.
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
The reason I mocked around with the java module (on the host) was in an attempt to see if the _host_ machine somehow had the wrong module configuration, triggering the hosted engine deployment process.
OK, and now I think we both agree this isn't the issue. Good.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sorry again for the misunderstanding.
NP at all! Best regards, -- Didi

On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment process, making the VM completely inaccessible. While I see qemu-kvm briefly start, it usually dies before I have any chance to access it. Can I somehow prevent hosted-engine --deploy from destroying the hosted engine VM, when the deployment fails, giving me access to it? - Gilboa

On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment process, making the VM completely inaccessible. While I see qemu-kvm briefly start, it usually dies before I have any chance to access it.
Can I somehow prevent hosted-engine --deploy from destroying the hosted engine VM, when the deployment fails, giving me access to it?
This is how it should behave normally, it does not kill the VM. Perhaps check logs, try to find who/what killed it. Anyway: Earlier today I pushed this patch: https://gerrit.ovirt.org/109730 Didn't yet get to try verifying it. Would you like to try? You can get an RPM from the CI build linked there, or download the patch and apply it manually (in the "gitweb" link [1]). Then, you can do: hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true If you try this, please share the results. Thanks! [1] https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitdiff... Best regards, -- Didi

On Thu, Jun 18, 2020 at 2:54 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment process, making the VM completely inaccessible. While I see qemu-kvm briefly start, it usually dies before I have any chance to access it.
Can I somehow prevent hosted-engine --deploy from destroying the hosted engine VM, when the deployment fails, giving me access to it?
This is how it should behave normally, it does not kill the VM. Perhaps check logs, try to find who/what killed it.
Anyway: Earlier today I pushed this patch:
https://gerrit.ovirt.org/109730
Didn't yet get to try verifying it. Would you like to try? You can get an RPM from the CI build linked there, or download the patch and apply it manually (in the "gitweb" link [1]).
Then, you can do:
hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true
If you try this, please share the results. Thanks!
[1] https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitdiff...
Best regards, -- Didi
Good news. I managed to connect to the VM and solve the problem. For some odd reason our primary DNS server had upstream connection issues and all the requests were silently handled by our secondary DNS server. Not sure I understand why, but while the ovirt host did manage to silently spill over to the secondary DNS, the hosted engine, at least during the initial deployment phase (when it still uses the host's dnsmasq), failed to spill over to the secondary DNS server and the deployment failed. Once I fixed our primary DNS upstream connection issues, the installer managed to download packages successfully (but failed once I provisioned the storage, more on that in a different mail). Many thanks, again, for taking the time to assist me! (And hope it helps anyone facing the same issue in the future) - Gilboa - Gilboa

Hello, Following the previous email, I think I'm hitting an odd problem, not sure if it's my mistake or an actual bug. 1. Newly deployed 4.4 self-hosted engine on localhost NFS storage on a single node. 2. Installation failed during the final phase with a non-descriptive error message [1]. 3. Log attached. 4. Even though the installation seemed to have failed, I managed to connect to the ovirt console, and noticed it failed to connect to the host. 5. SSH into the hosted engine, and noticed it cannot resolve the host hostname. 6. Added the missing /etc/hosts entry, restarted the ovirt-engine service, and all is green. 7. Looking the deployment log, I'm seeing the following message: "[WARNING] Failed to resolve gilboa-wx-ovirt.localdomain using DNS, it can be resolved only locally", which means the ansible was aware the my DNS server doesn't resolve the host hostname, but didn't add the missing /etc/hosts entry / and or errored out. A. Is it a bug, or is it PBKAC? B. What are the chances that I have a working ovirt (test) setup? - Gilboa [1] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": false}, "type": "cluster_default"}, "cdroms": [], "cluster": {"href": "/ovirt-engine/api/clusters/1ac7525a-b3d1-11ea-9c7a-00163e57d088", "id": "1ac7525a-b3d1-11ea-9c7a-00163e57d088"}, "comment": "", "cpu": {"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": {"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250", "id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0, "creation_time": "2020-06-21 11:15:08.207000-04:00", "delete_protected": false, "description": "", "disk_attachments": [], "display": {"address": "127.0.0.1", "allow_override": false, "certificate": {"content": "-----BEGIN CERTIFICATE-----\nMIID3jCCAsagAwIBAgICEAAwDQYJKoZIhvcNAQELBQAwUTELMAkGA1UEBhMCVVMxFDASBgNVBAoM\nC2xvY2FsZG9tYWluMSwwKgYDVQQDDCNnaWxib2Etd3gtdm1vdmlydC5sb2NhbGRvbWFpbi40MTE5\nMTAeFw0yMDA2MjAxNTA3MTFaFw0zMDA2MTkxNTA3MTFaMFExCzAJBgNVBAYTAlVTMRQwEgYDVQQK\nDAtsb2NhbGRvbWFpbjEsMCoGA1UEAwwjZ2lsYm9hLXd4LXZtb3ZpcnQubG9jYWxkb21haW4uNDEx\nOTEwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCUNgcCn28BMlMcadFZPR9JAWjOWyh0\nWMQffOSKUlr7H+6K02IdjCR5K9bR9moAlMA4dNzF/NJa12BlCmDkwOSsgZl+NK/Ut3kqfPp4CqMl\nU3jkJzqRnh0rqOFnQ4Q1tsejziH1MSiH5/eb4A3g2s0awXF6K+JRMp2MB9wYQx//tZrvhTLprK+Y\n9jXdQFZby8j+/9pqIdN7uoYbuqESRNcfIJ0WigJ10/IOAwloT0MASwyVtCRTCCXNE4PRN+Lexlcc\nxXq2QZ0zG8u3leLT6/J87PCP/OEj976fZ19q83stWjygu4+UiWS+QStlrzc1U+aGVxa+sO+9mv3f\n6CwT0clvAgMBAAGjgb8wgbwwHQYDVR0OBBYEFOiEmL8+rz3I4j5rmL+ws47Jv5KiMHoGA1UdIwRz\nMHGAFOiEmL8+rz3I4j5rmL+ws47Jv5KioVWkUzBRMQswCQYDVQQGEwJVUzEUMBIGA1UECgwLbG9j\nYWxkb21haW4xLDAqBgNVBAMMI2dpbGJvYS13eC12bW92aXJ0LmxvY2FsZG9tYWluLjQxMTkxggIQ\nADAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIBBjANBgkqhkiG9w0BAQsFAAOCAQEAStVI\nhHRrw5aa3YUNcwYh+kQfS47Es12nNRFeVVzbXj9CLS/TloYjyXEyZvFmYyyjNvuj4/3WcQDfeaG6\nTUGoFJ1sleOMT04WYWNJGyvsOfokT+I7yrBsVMg/7vip8UQV0ttmVoY/kMhZufwAUNlsZyh6F2o2\nNpAAcdLoguHo3UCGyaL8pF4G0NOAR/eV1rpl4VikqehUsXZ1sYzYZfK98xXrmepI42Lt3B2L6f9t\ngzYJ99jsrOGFhgvgV0H+PclviIdz79Jj3ZpPhezHkNQyrp0GOM0rqW+9xy50tlCQJ4rjdrRxnr21\nGpD3ZaQ2KSwGU79pnnRT6m7MSQ8irci3/A==\n-----END CERTIFICATE-----\n", "organization": "localdomain", "subject": "O=localdomain,CN=gilboa-wx-ovirt.localdomain"}, "copy_paste_enabled": true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, "single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "fqdn": "gilboa-wx-vmovirt.localdomain", "graphics_consoles": [], "guest_operating_system": {"architecture": "x86_64", "codename": "", "distribution": "CentOS Linux", "family": "Linux", "kernel": {"version": {"build": 0, "full_version": "4.18.0-147.8.1.el8_1.x86_64", "major": 4, "minor": 18, "revision": 147}}, "version": {"full_version": "8", "major": 8}}, "guest_time_zone": {"name": "EDT", "utc_offset": "-04:00"}, "high_availability": {"enabled": false, "priority": 0}, "host": {"href": "/ovirt-engine/api/hosts/5ca55132-6d20-4a7f-81a8-717095ba8f78", "id": "5ca55132-6d20-4a7f-81a8-717095ba8f78"}, "host_devices": [], "href": "/ovirt-engine/api/vms/60ba9f1a-cdb1-406e-810d-187dbdd7775c", "id": "60ba9f1a-cdb1-406e-810d-187dbdd7775c", "io": {"threads": 1}, "katello_errata": [], "large_icon": {"href": "/ovirt-engine/api/icons/a753f77a-89a4-4b57-9c23-d23bd61ebdaf", "id": "a753f77a-89a4-4b57-9c23-d23bd61ebdaf"}, "memory": 8589934592, "memory_policy": {"guaranteed": 8589934592, "max": 8589934592}, "migration": {"auto_converge": "inherit", "compressed": "inherit", "encrypted": "inherit"}, "migration_downtime": -1, "multi_queues_enabled": true, "name": "external-HostedEngineLocal", "next_run_configuration_exists": false, "nics": [], "numa_nodes": [], "numa_tune_mode": "interleave", "origin": "external", "original_template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot": {"devices": ["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": "migratable"}, "quota": {"id": "27d40902-b3d1-11ea-80f7-00163e57d088"}, "reported_devices": [], "run_once": false, "sessions": [], "small_icon": {"href": "/ovirt-engine/api/icons/0676b521-5b2b-4474-9394-8e9e8e3b426f", "id": "0676b521-5b2b-4474-9394-8e9e8e3b426f"}, "snapshots": [], "sso": {"methods": [{"id": "guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], "status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": [], "template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name": "Etc/GMT"}, "type": "server", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 24, "changed": false, "deprecations": [{"msg": "The 'ovirt_vm_facts' module has been renamed to 'ovirt_vm_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook

On Sun, Jun 21, 2020 at 8:04 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello,
Following the previous email, I think I'm hitting an odd problem, not sure if it's my mistake or an actual bug. 1. Newly deployed 4.4 self-hosted engine on localhost NFS storage on a single node. 2. Installation failed during the final phase with a non-descriptive error message [1].
I agree. Would you like to open a bug about this? It's not always easy to know the root cause for the failure, nor to pass it through the various components until it can reach the end-user.
3. Log attached. 4. Even though the installation seemed to have failed, I managed to connect to the ovirt console, and noticed it failed to connect to the host. 5. SSH into the hosted engine, and noticed it cannot resolve the host hostname. 6. Added the missing /etc/hosts entry, restarted the ovirt-engine service, and all is green. 7. Looking the deployment log, I'm seeing the following message: "[WARNING] Failed to resolve gilboa-wx-ovirt.localdomain using DNS, it can be resolved only locally", which means the ansible was aware the my DNS server doesn't resolve the host hostname, but didn't add the missing /etc/hosts entry / and or errored out.
Not sure it must abort. In principle, you could have supplied custom ansible code to be ran inside the appliance, to add the items yourself to /etc/hosts, or in theory it can also happen that you configured stuff so that the host fails DNS resolution but the engine VM does not.
A. Is it a bug, or is it PBKAC?
It also asked you: 2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? 2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Note: ensuring that this host could resolve the engine VM hostname is still up to you 2020-06-21 10:49:18,563-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND (Yes, No)[No] And you accepted the default 'No'. Perhaps we should change the default to Yes. Of course - Yes is also a risk - a user not noticing it, then later on changing the DNS, and not understanding why it "does not work"...
B. What are the chances that I have a working ovirt (test) setup?
In theory, you can examine the ansible code, and see what (not very many) next steps it should have done if it didn't fail there, and do that yourself (or decide that they are not important). In practice, I'd personally deploy again cleanly, unless this is for a quick test or something. Best regards, -- Didi

On Mon, Jun 22, 2020 at 9:12 AM Yedidyah Bar David <didi@redhat.com> wrote:
I agree. Would you like to open a bug about this? It's not always easy to know the root cause for the failure, nor to pass it through the various components until it can reach the end-user.
Sure. Happy to. Against which bugzilla component?
Not sure it must abort. In principle, you could have supplied custom ansible code to be ran inside the appliance, to add the items yourself to /etc/hosts, or in theory it can also happen that you configured stuff so that the host fails DNS resolution but the engine VM does not.
It also asked you:
2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? 2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Note: ensuring that this host could resolve the engine VM hostname is still up to you 2020-06-21 10:49:18,563-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND (Yes, No)[No]
And you accepted the default 'No'.
Perhaps we should change the default to Yes.
I must have missed it. In this case: A. It is essentially PBKAC. B. I believe that given the fact the problem was actually detected by the installer early on, I believe the installer should enforce having either hosts entry or working DNS setup. (Or at least show a big red flashing message saying: "Look, are you sure you want to set up a broken hosted engine VM and that cannot possibly resolve the host address and will uncertainly fail miserably once we try and deploy the hosted engine?")
Of course - Yes is also a risk - a user not noticing it, then later on changing the DNS, and not understanding why it "does not work"...
Indeed.
In theory, you can examine the ansible code, and see what (not very many) next steps it should have done if it didn't fail there, and do that yourself (or decide that they are not important). In practice, I'd personally deploy again cleanly, unless this is for a quick test or something.
Best regards, --
I'll simply clean up and redeploy. Hopefully after suffering a long string of PBKAC and DNS related failures, I'll finally have a working setup :) And again, many thanks for taking the time to assist me. I appreciate it! - Gilboa

On Mon, Jun 22, 2020 at 4:55 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 22, 2020 at 9:12 AM Yedidyah Bar David <didi@redhat.com> wrote:
I agree. Would you like to open a bug about this? It's not always easy to know the root cause for the failure, nor to pass it through the various components until it can reach the end-user.
Sure. Happy to. Against which bugzilla component?
Perhaps first have a look at: https://bugzilla.redhat.com/show_bug.cgi?id=1816002 and decide if to add a comment there, or create new bug (same product/component).
Not sure it must abort. In principle, you could have supplied custom ansible code to be ran inside the appliance, to add the items yourself to /etc/hosts, or in theory it can also happen that you configured stuff so that the host fails DNS resolution but the engine VM does not.
It also asked you:
2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? 2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Note: ensuring that this host could resolve the engine VM hostname is still up to you 2020-06-21 10:49:18,563-0400 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND (Yes, No)[No]
And you accepted the default 'No'.
Perhaps we should change the default to Yes.
I must have missed it. In this case: A. It is essentially PBKAC. B. I believe that given the fact the problem was actually detected by the installer early on, I believe the installer should enforce having either hosts entry or working DNS setup. (Or at least show a big red flashing message saying: "Look, are you sure you want to set up a broken hosted engine VM and that cannot possibly resolve the host address and will uncertainly fail miserably once we try and deploy the hosted engine?")
I agree this makes sense, although as I said, it's not fully certain to fail. The code emitting this warning is general - it's used both here and in engine-setup. I agree that here (in hosted-engine) it's more important.
Of course - Yes is also a risk - a user not noticing it, then later on changing the DNS, and not understanding why it "does not work"...
Indeed.
In theory, you can examine the ansible code, and see what (not very many) next steps it should have done if it didn't fail there, and do that yourself (or decide that they are not important). In practice, I'd personally deploy again cleanly, unless this is for a quick test or something.
Best regards, --
I'll simply clean up and redeploy. Hopefully after suffering a long string of PBKAC and DNS related failures, I'll finally have a working setup :)
Good luck!
And again, many thanks for taking the time to assist me. I appreciate it!
Best regards, -- Didi

On Mon, Jun 22, 2020 at 5:30 PM Yedidyah Bar David <didi@redhat.com> wrote:
Perhaps first have a look at:
https://bugzilla.redhat.com/show_bug.cgi?id=1816002
and decide if to add a comment there, or create new bug (same product/component).
After failing to create a new bug * I added a comment to the existing bug. * Decided to skip the oVirt team field (cause I didn't know which team I should choose), in return BZ decided to retaliate by deleting everything I wrote. Me not feeling pretty lucky right now. ... Oh, on the up side, once I finished making every possible mistake in the hand book (and a couple of others nobody bothered to think of) I finally have a working oVirt 4.4 setup... Yippie. - Gilboa

On Tue, Jun 23, 2020 at 5:44 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 22, 2020 at 5:30 PM Yedidyah Bar David <didi@redhat.com> wrote:
Perhaps first have a look at:
https://bugzilla.redhat.com/show_bug.cgi?id=1816002
and decide if to add a comment there, or create new bug (same product/component).
After failing to create a new bug * I added a comment to the existing bug.
Thanks
* Decided to skip the oVirt team field (cause I didn't know which team I should choose),
It's not that important - someone will fix it later. In this case it's "Integration".
in return BZ decided to retaliate by deleting everything I wrote. Me not feeling pretty lucky right now.
Happened to me as well :-(
... Oh, on the up side, once I finished making every possible mistake in the hand book (and a couple of others nobody bothered to think of) I finally have a working oVirt 4.4 setup... Yippie.
Congratulations, and thanks for the report! -- Didi

On Tue, Jun 23, 2020 at 5:47 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Tue, Jun 23, 2020 at 5:44 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Mon, Jun 22, 2020 at 5:30 PM Yedidyah Bar David <didi@redhat.com> wrote:
Perhaps first have a look at:
https://bugzilla.redhat.com/show_bug.cgi?id=1816002
and decide if to add a comment there, or create new bug (same product/component).
After failing to create a new bug * I added a comment to the existing bug.
Thanks
* Decided to skip the oVirt team field (cause I didn't know which team I should choose),
It's not that important - someone will fix it later. In this case it's "Integration".
in return BZ decided to retaliate by deleting everything I wrote. Me not feeling pretty lucky right now.
Happened to me as well :-(
... Oh, on the up side, once I finished making every possible mistake in the hand book (and a couple of others nobody bothered to think of) I finally have a working oVirt 4.4 setup... Yippie.
Congratulations, and thanks for the report! -- Didi
No problem. Thanks again for the help. - Gilboa

On Sun, Jun 21, 2020 at 7:36 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Thu, Jun 18, 2020 at 2:54 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment process, making the VM completely inaccessible. While I see qemu-kvm briefly start, it usually dies before I have any chance to access it.
Can I somehow prevent hosted-engine --deploy from destroying the hosted engine VM, when the deployment fails, giving me access to it?
This is how it should behave normally, it does not kill the VM. Perhaps check logs, try to find who/what killed it.
Anyway: Earlier today I pushed this patch:
https://gerrit.ovirt.org/109730
Didn't yet get to try verifying it. Would you like to try? You can get an RPM from the CI build linked there, or download the patch and apply it manually (in the "gitweb" link [1]).
Then, you can do:
hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true
If you try this, please share the results. Thanks!
[1] https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitdiff...
Now filed https://bugzilla.redhat.com/1849517 for this.
Best regards, -- Didi
Good news. I managed to connect to the VM and solve the problem.
Glad to hear that, thanks for the report!
For some odd reason our primary DNS server had upstream connection issues and all the requests were silently handled by our secondary DNS server. Not sure I understand why, but while the ovirt host did manage to silently spill over to the secondary DNS, the hosted engine, at least during the initial deployment phase (when it still uses the host's dnsmasq), failed to spill over to the secondary DNS server and the deployment failed.
Sounds like a bug in dnsmasq, although I am not sure. That said, DNS/DHCP are out of scope for oVirt. We simply assume they are robust. In retrospective, what do you think we should have done differently to make it easier for you to find (and fix) the problem? Best regards, -- Didi

On Mon, Jun 22, 2020 at 8:58 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jun 21, 2020 at 7:36 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Thu, Jun 18, 2020 at 2:54 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2 after yesterday release), either manually (via hosted-engine --deploy) or by using cockpit, fails when trying to download packages (see attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
Just to be clear, it is the hosted engine VM (during the deployment process) that fails to automatically download packages, _not_ the host.
Exactly. That's why I asked you (because the logs do not reveal that) to manually login there and try to install (update) the package, and see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment process, making the VM completely inaccessible. While I see qemu-kvm briefly start, it usually dies before I have any chance to access it.
Can I somehow prevent hosted-engine --deploy from destroying the hosted engine VM, when the deployment fails, giving me access to it?
This is how it should behave normally, it does not kill the VM. Perhaps check logs, try to find who/what killed it.
Anyway: Earlier today I pushed this patch:
https://gerrit.ovirt.org/109730
Didn't yet get to try verifying it. Would you like to try? You can get an RPM from the CI build linked there, or download the patch and apply it manually (in the "gitweb" link [1]).
Then, you can do:
hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true
If you try this, please share the results. Thanks!
[1] https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitdiff...
Now filed https://bugzilla.redhat.com/1849517 for this.
Best regards, -- Didi
Good news. I managed to connect to the VM and solve the problem.
Glad to hear that, thanks for the report!
For some odd reason our primary DNS server had upstream connection issues and all the requests were silently handled by our secondary DNS server. Not sure I understand why, but while the ovirt host did manage to silently spill over to the secondary DNS, the hosted engine, at least during the initial deployment phase (when it still uses the host's dnsmasq), failed to spill over to the secondary DNS server and the deployment failed.
Sounds like a bug in dnsmasq, although I am not sure.
That said, DNS/DHCP are out of scope for oVirt. We simply assume they are robust.
In retrospective, what do you think we should have done differently to make it easier for you to find (and fix) the problem?
Best regards, -- Didi
In retrospect, the main problem was the non-descriptive error message generated by DNF (which has nothing to do with the ovirt installer). That said, this could easily be circumvented by adding a simple network-test script to the installer playbook. Then again, the problem was clearly on my side... - Gilboa
participants (2)
-
Gilboa Davara
-
Yedidyah Bar David