Unable to deploy Hyperconverged Engine Node - v4.3.3

Hi everyone, I am trying a Gluster Hyperconvergence deployment where the Gluster part has been completed successfully. All hosts are Centos 7.6.1810 (fresh install) and two HP DL20 G9 (for VM's) and one HP 120 G7 (which hosts the Gluster arbiter volumes). Unfortunately I am unable to deploy the Engine, both CLI and GUI approaches fail with the error below. On first sight it looks similar to https://lists.ovirt.org/pipermail/users/2018-March/087802.html but I've configured a static IP (same subnet as the host), no DHCP. I also tried to force ipv4 with "/usr/sbin/ovirt-hosted-engine-setup --4" but the very same error was thrown in every case when trying to deploy the engine: [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "sub.sub.domain.tld", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "sub.domain.tld", "subject": "O=sub.domain.tld,CN=sub.sub.domain.tld"}, "cluster": {"href": "/ovirt-engine/api/clusters/f083f056-74fd-11e9-bba9-00163e522076", "id": "f083f056-74fd-11e9-bba9-00163e522076"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/dc4f5c15-4989-4454-ba46-3bd600796b69", "id": "dc4f5c15-4989-4454-ba46-3bd600796b69", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "sub.sub.domain.tld", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": fals e, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:L8YyAMcxLFJEng+CoDympwkpMwoagcBafI4fpLP4Kk0", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false} Unfortunately I don't really have an idea where to check for what considering the error message. The to be deployed engine VM gets listed as KVM VM, is accessible through the bridge and seems to be started up completely, I can even access the Engine web interface (engine01.sub.domain.tld/ovirt-engine). In /var/log/messages the following can be found ... "May 13 12:40:55 host ansible-async_wrapper.py: 15505 still running (86015) May 13 12:40:57 host python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=sub.sub.domain.tld fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://engine01.sub.domain.tld/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': '8s-vELzQqNTR6l7-KRuqnYLE3sVwVWU5NxiNWzc-s2CllaQG_5YZ32fCFkVsAgwEyLWjPIOxvyS-_4js-VYFFQ', 'ca_file': None}" ... and after 120 attempts Ansible stops and fails with a deployment error. When re-trying after removing the VM and ovirt-hosted-engine-cleanup the very same error is thrown. What is a bit weird is this entry in /var/log/ovirt-hosted-engine-setup/ ./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,369Z ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] SSH error running command root@sub.sub.domain.tld:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': RuntimeException: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied ./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,406Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host sub.sub.domain.tld: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied Could that be the cause and how can I fix it? What else do you guys need? Thanks in advance, Martin

On Mon, May 13, 2019 at 5:34 PM <anonmix@gmail.com> wrote:
Hi everyone,
I am trying a Gluster Hyperconvergence deployment where the Gluster part has been completed successfully. All hosts are Centos 7.6.1810 (fresh install) and two HP DL20 G9 (for VM's) and one HP 120 G7 (which hosts the Gluster arbiter volumes). Unfortunately I am unable to deploy the Engine, both CLI and GUI approaches fail with the error below. On first sight it looks similar to https://lists.ovirt.org/pipermail/users/2018-March/087802.html but I've configured a static IP (same subnet as the host), no DHCP. I also tried to force ipv4 with "/usr/sbin/ovirt-hosted-engine-setup --4" but the very same error was thrown in every case when trying to deploy the engine:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "sub.sub.domain.tld", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "sub.domain.tld", "subject": "O=sub.domain.tld,CN=sub.sub.domain.tld"}, "cluster": {"href": "/ovirt-engine/api/clusters/f083f056-74fd-11e9-bba9-00163e522076", "id": "f083f056-74fd-11e9-bba9-00163e522076"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/dc4f5c15-4989-4454-ba46-3bd600796b69", "id": "dc4f5c15-4989-4454-ba46-3bd600796b69", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "sub.sub.domain.tld", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": fals e, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:L8YyAMcxLFJEng+CoDympwkpMwoagcBafI4fpLP4Kk0", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false}
"status": "install_failed" means that the engine tried to deploy the host but for some reason it failed.
Unfortunately I don't really have an idea where to check for what considering the error message. The to be deployed engine VM gets listed as KVM VM, is accessible through the bridge and seems to be started up completely, I can even access the Engine web interface (engine01.sub.domain.tld/ovirt-engine).
In /var/log/messages the following can be found ...
"May 13 12:40:55 host ansible-async_wrapper.py: 15505 still running (86015) May 13 12:40:57 host python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=sub.sub.domain.tld fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://engine01.sub.domain.tld/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': '8s-vELzQqNTR6l7-KRuqnYLE3sVwVWU5NxiNWzc-s2CllaQG_5YZ32fCFkVsAgwEyLWjPIOxvyS-_4js-VYFFQ', 'ca_file': None}"
... and after 120 attempts Ansible stops and fails with a deployment error. When re-trying after removing the VM and ovirt-hosted-engine-cleanup the very same error is thrown.
What is a bit weird is this entry in /var/log/ovirt-hosted-engine-setup/
./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,369Z ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] SSH error running command root@sub.sub.domain.tld:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': RuntimeException: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied ./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,406Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host sub.sub.domain.tld: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied
Could that be the cause and how can I fix it? What else do you guys need?
Can you please share host-deploy logs? They are where you got engine.log in host-deploy subdir.
Thanks in advance, Martin _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FVV73LOZQ4U3EE...
-- Simone Tiraboschi He / Him / His Principal Software Engineer Red Hat <https://www.redhat.com/> stirabos@redhat.com @redhatjobs <https://twitter.com/redhatjobs> redhatjobs <https://www.facebook.com/redhatjobs> @redhatjobs <https://instagram.com/redhatjobs> <https://red.ht/sig> <https://redhat.com/summit>

Hi Simone,
On 13 May 2019, at 18:28, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Mon, May 13, 2019 at 5:34 PM <anonmix@gmail.com <mailto:anonmix@gmail.com>> wrote: Hi everyone,
I am trying a Gluster Hyperconvergence deployment where the Gluster part has been completed successfully. All hosts are Centos 7.6.1810 (fresh install) and two HP DL20 G9 (for VM's) and one HP 120 G7 (which hosts the Gluster arbiter volumes). Unfortunately I am unable to deploy the Engine, both CLI and GUI approaches fail with the error below. On first sight it looks similar to https://lists.ovirt.org/pipermail/users/2018-March/087802.html <https://lists.ovirt.org/pipermail/users/2018-March/087802.html> but I've configured a static IP (same subnet as the host), no DHCP. I also tried to force ipv4 with "/usr/sbin/ovirt-hosted-engine-setup --4" but the very same error was thrown in every case when trying to deploy the engine:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "sub.sub.domain.tld", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "sub.domain.tld", "subject": "O=sub.domain.tld,CN=sub.sub.domain.tld"}, "cluster": {"href": "/ovirt-engine/api/clusters/f083f056-74fd-11e9-bba9-00163e522076", "id": "f083f056-74fd-11e9-bba9-00163e522076"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/dc4f5c15-4989-4454-ba46-3bd600796b69", "id": "dc4f5c15-4989-4454-ba46-3bd600796b69", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "sub.sub.domain.tld", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": fals e, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:L8YyAMcxLFJEng+CoDympwkpMwoagcBafI4fpLP4Kk0", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false}
"status": "install_failed" means that the engine tried to deploy the host but for some reason it failed.
Unfortunately I don't really have an idea where to check for what considering the error message. The to be deployed engine VM gets listed as KVM VM, is accessible through the bridge and seems to be started up completely, I can even access the Engine web interface (engine01.sub.domain.tld/ovirt-engine).
In /var/log/messages the following can be found ...
"May 13 12:40:55 host ansible-async_wrapper.py: 15505 still running (86015) May 13 12:40:57 host python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=sub.sub.domain.tld fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://engine01.sub.domain.tld/ovirt-engine/api <https://engine01.sub.domain.tld/ovirt-engine/api>', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': '8s-vELzQqNTR6l7-KRuqnYLE3sVwVWU5NxiNWzc-s2CllaQG_5YZ32fCFkVsAgwEyLWjPIOxvyS-_4js-VYFFQ', 'ca_file': None}"
... and after 120 attempts Ansible stops and fails with a deployment error. When re-trying after removing the VM and ovirt-hosted-engine-cleanup the very same error is thrown.
What is a bit weird is this entry in /var/log/ovirt-hosted-engine-setup/
./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,369Z ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] SSH error running command root@sub.sub.domain.tld:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': RuntimeException: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied ./engine-logs-2019-05-13T12:26:20Z/ovirt-engine/engine.log:2019-05-13 12:34:40,406Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [12746235] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host sub.sub.domain.tld: Unexpected error during execution: bash: /tmp/ovirt-pTVEEzlb8b/ovirt-host-deploy: Permission denied
Could that be the cause and how can I fix it? What else do you guys need?
Can you please share host-deploy logs? They are where you got engine.log in host-deploy subdir.
of course! Please find them attached.
Thanks in advance, Martin _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FVV73LOZQ4U3EE... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/FVV73LOZQ4U3EEDFULL6Q7OOHHNQRJQV/>
-- Simone Tiraboschi He / Him / His Principal Software Engineer Red Hat <https://www.redhat.com/> stirabos@redhat.com <mailto:stirabos@redhat.com> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs <https://www.facebook.com/redhatjobs> @redhatjobs <https://instagram.com/redhatjobs> <https://red.ht/sig> <https://redhat.com/summit>

On Mon, May 13, 2019 at 5:34 PM <anonmix(a)gmail.com> wrote:
Can you please share host-deploy logs? They are where you got engine.log in host-deploy subdir.
Although I don't know exactly what must be present in logs, I can't spot anything else in host-deploy logs than the host waiting for the Engine node to come up. Therefore the Engine node logs are present in the package as well. Appreciate any hint or getting pointed in the right direction! Thanks in advance, Martin

I can't tell what's going wrong here. It looks a bit like this: https://bugzilla.redhat.com/show_bug.cgi?id=1507438

Thanks for taking a look. I've tried again to deploy an hyperconverged engine node on freshly installed hosts. Unfortunately it seems to be the very same issue again.
I can't tell what's going wrong here. It looks a bit like this: https://bugzilla.redhat.com/show_bug.cgi?id=1507438
This relates to /tmp but as far as I can see the initial engine VM resides in /var/tmp before it's supposed to be deployed to storage. I'm not sure what Ansible checks exactly (when waiting for the engine to come up) but the temporary entry in /etc/hosts (on the host) is present, I can ping and ssh into the engine VM through the bridge and from engine back to the host.

I can't tell what's going wrong here. It looks a bit like this: https://bugzilla.redhat.com/show_bug.cgi?id=1507438
You were correct. After a ... mount -o remount,exec /tmp ... the wizard completed successfully. Thank you!
participants (4)
-
anon mix
-
anonmix@gmail.com
-
Jason Brooks
-
Simone Tiraboschi