oVirt 4.4. Engine Deployment: Problems with Gluster Storage Domain

Hi everybody! I am having a hard time getting oVirt 4.4 to work. We want to update our 4.3 Cluster and i am trying to set up a fresh 4.4 Cluster (and restore the backup later on) in order to update to 4.5. It fails at the end of the engine deloyment, when the Gluster Storage Domain should be added. I installed oVirt Node 4.4.10 on an old PC and made the following modifications to the engine deployment process: - altered defaults in /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/defaults/main.yml: - "he_pause_before_engine_setup: true" (in this pause before engine setup, i ssh into the engine and exclude the package postgresql-jdbc from update, which otherwise breaks the deployment [1]) - "he_remove_appliance_rpm: false" (to avoid the large download every single try, i tried a lot) - "he_force_ip4: true" (to avoid problems with IPv6, see below) - in /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/fetch_host_ip.yml i added after "- name: Get host address resolution:" the following lines (to avoid a problem with an "invalid" IPv6-Adress, which otherwise breaks the deployment [2]): - name: Get host IP addresses ansible.builtin.command: hostname -I register: hostname_addresses_output changed_when: true Most times, i started deployment via shell but tried via webinterface of the node as well. It fails at the task "Add glusterfs storage domain" with the following message: "[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Failed to fetch Gluster Volume List]". HTTP response code is 400." (See also [3]) When the setup asks for storage, i tried different answers (gluster.local:/volume, gluster.local:/path/to/brick/volume, 192.168.8.51:/volume ...), no mount options. I added firewall rules for glusterfs at node and engine. Even tried disabling the firewall. No firewall on the gluster servers running. On the Node, i also tested setting SELinux to permissive. Recorded the traffic at different interfaces ("ovirtmgmt" and "virbr0" on the node and "eth0" on the engine) and i can see the node and the gluster server talking: Node gets the volume with options (which are, btw, compliant to the docs, "storage.owner-gid: 36" "storage.owner-uid: 36" etc) but thats it, no further packets to mount the volume. I noticed some ARP packets as well, the node asks the IP from the engine (the configured static IP, which is not yet active). And the engine sends a dns request for the gluster server to the node (via interface virbr0), but doesnt connect to the gluster server. At least, thats what i can see, most of the traffic is TLS, which i couldnt decrypt yet. I appreciate any hint where to find the right keys. Anyway, i can ssh from the engine to the gluster server and mount the gluster volume manually on the node (mount -t glusterfs gluster.local:/volume /local/path), so there seem no connectivity issues. Since the engine deployment log is around 30MB i attached a log summary with findings i found relevant. I'll provide more logs if needed. I really wanna put this huge timesink to an end. Can anyone help me or point me in the right direction? Many thanks in advance :) Regards, Niko [1] This was the error message i got: "[ ERROR ] fatal: [localhost -> 192.168.222.195]: FAILED! => {"attempts": 30, "changed": false, "connection": "close", "content": "Error500 - Internal Server Error", "content_encoding": "identity", "content_length": "86", "content_type": "text/html; charset=UTF-8", "date": "Wed, 17 May 2023 22:42:27 GMT", "elapsed": 0, "msg": "Status code was 500 and not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "server": "Apache/2.4.37 (centos) OpenSSL/1.1.1k mod_auth_gssapi/1.6.1 mod_wsgi/4.6.4 Python/3.6", "status": 500, "url": "http://localhost/ovirt-engine/services/health"}" [2] This was the error message i got: "VDSM ovirt.martinwi.local command HostSetupNetworksVDS failed: Internal JSON-RPC error: {'reason': "Invalid IP address: 'fe80::ea3f:67ff:fe7f:a029%ovirtmgmt' does not appear to be an IPv4 or IPv6 address"}" [3] /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20230718140628-rryscj.log: 2023-07-18 16:32:35,877+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:106 {'msg': 'Fault reason is "Operation Failed". Fault detail is "[Failed to fetch Gluster Volume List]". HTTP response code is 400.', 'exception': 'Traceback (most recent call last):\n File "/tmp/ansible_ovirt_storage_domain_payload_b4ofbzxa/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_storage_domain.py", line 804, in main\n File "/tmp/ansible_ovirt_storage_domain_payload_b4ofbzxa/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/module_utils/ovirt.py", line 674, in create\n **kwargs\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 26258, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add\n return future.wait() if wait else future\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait\n return self._code(response)\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback\n self._check_fault(response)\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n self._raise_error(response, body)\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n raise error\novirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Failed to fetch Gluster Volume List]". HTTP response code is 400.\n', 'invocation': {'module_args': {'state': 'unattached', 'name': 'hosted_storage', 'host': 'ovirt.martinwi.local', 'data_center': 'Default', 'wait': True, 'glusterfs': {'address': 'gluster1.martinwi.local', 'path': '/gv3', 'mount_options': ''}, 'timeout': 180, 'poll_interval': 3, 'fetch_nested': False, 'nested_attributes': [], 'domain_function': 'data', 'id': None, 'description': None, 'comment': None, 'localfs': None, 'nfs': None, 'iscsi': None, 'managed_block_storage': None, 'posixfs': None, 'fcp': None, 'wipe_after_delete': None, 'backup': None, 'critical_space_action_blocker': None, 'warning_low_space': None, 'destroy': None, 'format': None, 'discard_after_delete': None}}, '_ansible_no_log': False, 'changed': False}
participants (1)
-
Thyen, Niko