Hi everybody!
I am having a hard time getting oVirt 4.4 to work. We want to update our
4.3 Cluster and i am trying to set up a fresh 4.4 Cluster (and restore
the backup later on) in order to update to 4.5. It fails at the end of
the engine deloyment, when the Gluster Storage Domain should be added.
I installed oVirt Node 4.4.10 on an old PC and made the following
modifications to the engine deployment process:
- altered defaults in
/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/defaults/main.yml:
- "he_pause_before_engine_setup: true" (in this pause before engine
setup, i ssh into the engine and exclude the package postgresql-jdbc
from update, which otherwise breaks the deployment [1])
- "he_remove_appliance_rpm: false" (to avoid the large download every
single try, i tried a lot)
- "he_force_ip4: true" (to avoid problems with IPv6, see below)
- in
/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/fetch_host_ip.yml
i added after "- name: Get host address resolution:" the following lines
(to avoid a problem with an "invalid" IPv6-Adress, which otherwise
breaks the deployment [2]):
- name: Get host IP addresses
ansible.builtin.command: hostname -I
register: hostname_addresses_output
changed_when: true
Most times, i started deployment via shell but tried via webinterface of
the node as well. It fails at the task "Add glusterfs storage domain"
with the following message:
"[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault
detail is "[Failed to fetch Gluster Volume List]". HTTP response code is
400." (See also [3])
When the setup asks for storage, i tried different answers
(gluster.local:/volume, gluster.local:/path/to/brick/volume,
192.168.8.51:/volume ...), no mount options.
I added firewall rules for glusterfs at node and engine. Even tried
disabling the firewall. No firewall on the gluster servers running. On
the Node, i also tested setting SELinux to permissive.
Recorded the traffic at different interfaces ("ovirtmgmt" and "virbr0"
on the node and "eth0" on the engine) and i can see the node and the
gluster server talking: Node gets the volume with options (which are,
btw, compliant to the docs, "storage.owner-gid: 36" "storage.owner-uid:
36" etc) but thats it, no further packets to mount the volume.
I noticed some ARP packets as well, the node asks the IP from the engine
(the configured static IP, which is not yet active). And the engine
sends a dns request for the gluster server to the node (via interface
virbr0), but doesnt connect to the gluster server. At least, thats what
i can see, most of the traffic is TLS, which i couldnt decrypt yet. I
appreciate any hint where to find the right keys.
Anyway, i can ssh from the engine to the gluster server and mount the
gluster volume manually on the node (mount -t glusterfs
gluster.local:/volume /local/path), so there seem no connectivity
issues.
Since the engine deployment log is around 30MB i attached a log summary
with findings i found relevant. I'll provide more logs if needed.
I really wanna put this huge timesink to an end. Can anyone help me or
point me in the right direction?
Many thanks in advance :)
Regards,
Niko
[1] This was the error message i got:
"[ ERROR ] fatal: [localhost -> 192.168.222.195]: FAILED! =>
{"attempts": 30, "changed": false, "connection":
"close", "content":
"Error500 - Internal Server Error", "content_encoding":
"identity",
"content_length": "86", "content_type": "text/html;
charset=UTF-8",
"date": "Wed, 17 May 2023 22:42:27 GMT", "elapsed": 0,
"msg": "Status
code was 500 and not [200]: HTTP Error 500: Internal Server Error",
"redirected": false, "server": "Apache/2.4.37 (centos)
OpenSSL/1.1.1k
mod_auth_gssapi/1.6.1 mod_wsgi/4.6.4 Python/3.6", "status": 500,
"url":
"http://localhost/ovirt-engine/services/health"}"
[2] This was the error message i got:
"VDSM ovirt.martinwi.local command HostSetupNetworksVDS failed: Internal
JSON-RPC error: {'reason': "Invalid IP address:
'fe80::ea3f:67ff:fe7f:a029%ovirtmgmt' does not appear to be an IPv4 or
IPv6 address"}"
[3]
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20230718140628-rryscj.log:
2023-07-18 16:32:35,877+0200 DEBUG
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:106 {'msg': 'Fault reason is "Operation
Failed". Fault detail is "[Failed to fetch Gluster Volume List]". HTTP
response code is 400.', 'exception': 'Traceback (most recent call
last):\n File
"/tmp/ansible_ovirt_storage_domain_payload_b4ofbzxa/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_storage_domain.py",
line 804, in main\n File
"/tmp/ansible_ovirt_storage_domain_payload_b4ofbzxa/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/module_utils/ovirt.py",
line 674, in create\n **kwargs\n File
"/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 26258,
in add\n return self._internal_add(storage_domain, headers, query,
wait)\n File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py",
line 232, in _internal_add\n return future.wait() if wait else
future\n File
"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in
wait\n return self._code(response)\n File
"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in
callback\n self._check_fault(response)\n File
"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in
_check_fault\n self._raise_error(response, body)\n File
"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in
_raise_error\n raise error\novirtsdk4.Error: Fault reason is
"Operation Failed". Fault detail is "[Failed to fetch Gluster Volume
List]". HTTP response code is 400.\n', 'invocation':
{'module_args':
{'state': 'unattached', 'name': 'hosted_storage',
'host':
'ovirt.martinwi.local', 'data_center': 'Default', 'wait':
True,
'glusterfs': {'address': 'gluster1.martinwi.local',
'path': '/gv3',
'mount_options': ''}, 'timeout': 180, 'poll_interval': 3,
'fetch_nested': False, 'nested_attributes': [], 'domain_function':
'data', 'id': None, 'description': None, 'comment': None,
'localfs':
None, 'nfs': None, 'iscsi': None, 'managed_block_storage': None,
'posixfs': None, 'fcp': None, 'wipe_after_delete': None,
'backup': None,
'critical_space_action_blocker': None, 'warning_low_space': None,
'destroy': None, 'format': None, 'discard_after_delete': None}},
'_ansible_no_log': False, 'changed': False}