oVirt 4.5.1 Hyperconverge Gluster install fails

Hello all, I am hoping someone can help me with a oVirt installation that has just gotten the better of me after weeks of trying. After setting up ssh-keys and making sure each host is known to the primary host (sr-svr04), I go though Cockpit and "Configure Gluster storage and oVirt hosted engine", enter all of the details with <host>.san.lennoxconsulting.com.au for the storage network FQDN and <host>.core.lennoxconsulting.com.au for the public interfaces. Connectivity on each of the VLAN's test out as basically working (everything is pingable and ssh connections work) and the hosts are generally usable on the network. But the install ultimately dies with the following ansible error: ----------- gluster-deployment.log --------- : : TASK [gluster.infra/roles/backend_setup : Create volume groups] **************** task path: /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vg_create.yml:63 failed: [sr-svr04.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.058528", "end": "2022-07-16 16:18:37.018563", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:36.960035", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} failed: [sr-svr05.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.057186", "end": "2022-07-16 16:18:37.784063", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:37.726877", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} failed: [sr-svr06.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.062212", "end": "2022-07-16 16:18:37.250371", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:37.188159", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* sr-svr04.san.lennoxconsulting.com.au : ok=32 changed=13 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 sr-svr05.san.lennoxconsulting.com.au : ok=31 changed=12 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 sr-svr06.san.lennoxconsulting.com.au : ok=31 changed=12 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 ----------- gluster-deployment.log --------- A "gluster v status" gives me no volumes present and that is where I am stuck! Any ideas of what I start trying next? I have tried this with oVirt Node 4.5.1 el8 and el9 images as well as 4.5 el8 images so it has got to be somewhere in my infrastructure configuration but I am out of ideas. My hardware configuration is 3 x HP DL360's with oVirt Node 4.5.1 el8 installed on 2x146gb RAID1 array and a gluster 6x900gb RAID5 array. Network configuration is: root@sr-svr04 ~]# ip addr show up 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff 3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff permaddr 1c:98:ec:29:41:69 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 1c:98:ec:29:41:6a brd ff:ff:ff:ff:ff:ff 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:98:ec:29:41:6b brd ff:ff:ff:ff:ff:ff 6: eno3.3@eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:6a brd ff:ff:ff:ff:ff:ff inet 192.168.3.11/24 brd 192.168.3.255 scope global noprefixroute eno3.3 valid_lft forever preferred_lft forever inet6 fe80::ec:d7be:760e:8eda/64 scope link noprefixroute valid_lft forever preferred_lft forever 7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff 8: bond0.4@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff inet 192.168.4.11/24 brd 192.168.4.255 scope global noprefixroute bond0.4 valid_lft forever preferred_lft forever inet6 fe80::f503:a54:8421:ea8b/64 scope link noprefixroute valid_lft forever preferred_lft forever 9: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:5b:fd:ac brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever The public network is core.lennoxconsulting.com.au which is 192.168.4.0/24 and the storage network is san.lennoxconsulting.com.au which is 192.168.3.0/24 Any help to move forward please is appreciated. - Dave.

Can you cat this file /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vg_create.yml It seems that the VG creation is not idempotent .As a workaround, delete the VG 'gluster_vg_sdb' on all Gluster nodes: vgremove gluster_vg_sdb Best Regards,Strahil Nikolov Hello all, I am hoping someone can help me with a oVirt installation that has just gotten the better of me after weeks of trying. After setting up ssh-keys and making sure each host is known to the primary host (sr-svr04), I go though Cockpit and "Configure Gluster storage and oVirt hosted engine", enter all of the details with <host>.san.lennoxconsulting.com.au for the storage network FQDN and <host>.core.lennoxconsulting.com.au for the public interfaces. Connectivity on each of the VLAN's test out as basically working (everything is pingable and ssh connections work) and the hosts are generally usable on the network. But the install ultimately dies with the following ansible error: ----------- gluster-deployment.log --------- : : TASK [gluster.infra/roles/backend_setup : Create volume groups] **************** task path: /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vg_create.yml:63 failed: [sr-svr04.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.058528", "end": "2022-07-16 16:18:37.018563", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:36.960035", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} failed: [sr-svr05.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.057186", "end": "2022-07-16 16:18:37.784063", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:37.726877", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} failed: [sr-svr06.san.lennoxconsulting.com.au] (item={'key': 'gluster_vg_sdb', 'value': [{'vgname': 'gluster_vg_sdb', 'pvname': '/dev/sdb'}]}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["vgcreate", "--dataalignment", "1536K", "-s", "1536K", "gluster_vg_sdb", "/dev/sdb"], "delta": "0:00:00.062212", "end": "2022-07-16 16:18:37.250371", "item": {"key": "gluster_vg_sdb", "value": [{"pvname": "/dev/sdb", "vgname": "gluster_vg_sdb"}]}, "msg": "non-zero return code", "rc": 5, "start": "2022-07-16 16:18:37.188159", "stderr": " A volume group called gluster_vg_sdb already exists.", "stderr_lines": [" A volume group called gluster_vg_sdb already exists."], "stdout": "", "stdout_lines": []} NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* sr-svr04.san.lennoxconsulting.com.au : ok=32 changed=13 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 sr-svr05.san.lennoxconsulting.com.au : ok=31 changed=12 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 sr-svr06.san.lennoxconsulting.com.au : ok=31 changed=12 unreachable=0 failed=1 skipped=27 rescued=0 ignored=1 ----------- gluster-deployment.log --------- A "gluster v status" gives me no volumes present and that is where I am stuck! Any ideas of what I start trying next? I have tried this with oVirt Node 4.5.1 el8 and el9 images as well as 4.5 el8 images so it has got to be somewhere in my infrastructure configuration but I am out of ideas. My hardware configuration is 3 x HP DL360's with oVirt Node 4.5.1 el8 installed on 2x146gb RAID1 array and a gluster 6x900gb RAID5 array. Network configuration is: root@sr-svr04 ~]# ip addr show up 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff 3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff permaddr 1c:98:ec:29:41:69 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 1c:98:ec:29:41:6a brd ff:ff:ff:ff:ff:ff 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:98:ec:29:41:6b brd ff:ff:ff:ff:ff:ff 6: eno3.3@eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:6a brd ff:ff:ff:ff:ff:ff inet 192.168.3.11/24 brd 192.168.3.255 scope global noprefixroute eno3.3 valid_lft forever preferred_lft forever inet6 fe80::ec:d7be:760e:8eda/64 scope link noprefixroute valid_lft forever preferred_lft forever 7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff 8: bond0.4@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1c:98:ec:29:41:68 brd ff:ff:ff:ff:ff:ff inet 192.168.4.11/24 brd 192.168.4.255 scope global noprefixroute bond0.4 valid_lft forever preferred_lft forever inet6 fe80::f503:a54:8421:ea8b/64 scope link noprefixroute valid_lft forever preferred_lft forever 9: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:5b:fd:ac brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever The public network is core.lennoxconsulting.com.au which is 192.168.4.0/24 and the storage network is san.lennoxconsulting.com.au which is 192.168.3.0/24 Any help to move forward please is appreciated. - Dave. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCJASQR3EV4AOF...

Can you cat this file /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vg_create.yml
It seems that the VG creation is not idempotent .As a workaround, delete the VG 'gluster_vg_sdb' on all Gluster nodes: vgremove gluster_vg_sdb Best Regards,Strahil Nikolov
Strahil, Thank you for your response, I have added the file for you, but with my continued efforts over the past day I have finally managed to get oVirt to install but not without issues. Over a period of 2 weeks I have kicked off the install process over 20 times. Occasionally it would return the error posted here, but most of the time the install process would hang on this step. I posted the error as that was the only log/error I could get, when it hung no log files or any footprint was created it just sat there until I either reboot the host or cancelled the install. The problem I have found is that nodectl doesn't return while the installer is in progress, so I assume that when the installer tries to ssh to the localhost it never gets to a shell because nodectl is waiting indefinitely for something. So I removed nodectl-motd.sh and nodectl-run-banner.sh from /etc/profile.d and now the Gluster install wizards works perfectly. Next, the Cockpit wizard refused to identify any of my network devices, but the command line installer was fine, so I have now got a self hosted engine running on one node via: hosted-engine --deploy However my next issue now is that when I try to login to the Administration Portal with the admin user, it is giving me "Invalid username or password". I can log into the Monitoring Portal just fine, but the Administration and VM Portal doesn't like the admin credentials. So now to work out why authentication isn't working. - Dave. ----- /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vg_create.yml ----- --- # We have to set the dataalignment for physical volumes, and physicalextentsize # for volume groups. For JBODs we use a constant alignment value of 256K # however, for RAID we calculate it by multiplying the RAID stripe unit size # with the number of data disks. Hence in case of RAID stripe_unit_size and data # disks are mandatory parameters. - name: Check if valid disktype is provided fail: msg: "Unknown disktype. Allowed disktypes: JBOD, RAID6, RAID10, RAID5." when: gluster_infra_disktype not in [ 'JBOD', 'RAID6', 'RAID10', 'RAID5' ] # Set data alignment for JBODs, by default it is 256K. This set_fact is not # needed if we can always assume 256K for JBOD, however we provide this extra # variable to override it. - name: Set PV data alignment for JBOD set_fact: pv_dataalign: "{{ gluster_infra_dalign | default('256K') }}" when: gluster_infra_disktype == 'JBOD' # Set data alignment for RAID # We need KiB: ensure to keep the trailing `K' in the pv_dataalign calculation. - name: Set PV data alignment for RAID set_fact: pv_dataalign: > {{ gluster_infra_diskcount|int * gluster_infra_stripe_unit_size|int }}K when: > gluster_infra_disktype == 'RAID6' or gluster_infra_disktype == 'RAID10' or gluster_infra_disktype == 'RAID5' - name: Set VG physical extent size for RAID set_fact: vg_pesize: > {{ gluster_infra_diskcount|int * gluster_infra_stripe_unit_size|int }}K when: > gluster_infra_disktype == 'RAID6' or gluster_infra_disktype == 'RAID10' or gluster_infra_disktype == 'RAID5' - include_tasks: get_vg_groupings.yml vars: volume_groups: "{{ gluster_infra_volume_groups }}" when: gluster_infra_volume_groups is defined and gluster_infra_volume_groups is not none and gluster_infra_volume_groups|length >0 - name: Record for missing devices for phase 2 set_fact: gluster_phase2_has_missing_devices: true loop: "{{ vg_device_exists.results }}" when: item.stdout_lines is defined and "0" in item.stdout_lines - name: Print the gateway for each host when defined ansible.builtin.debug: msg: vg names {{ gluster_volumes_by_groupname }} # Tasks to create a volume group # The devices in `pvs' can be a regular device or a VDO device # Please take note; only the first item per volume group will define the actual configuraton! #TODO: fix pesize // {{ ((item.value | first).vg_pesize || vg_pesize) | default(4) }} - name: Create volume groups register: gluster_changed_vgs command: vgcreate --dataalignment {{ item.value.pv_dataalign | default(pv_dataalign) }} -s {{ vg_pesize | default(4) }} {{ (item.value | first).vgname }} {{ item.value | ovirt.ovirt.json_query('[].pvname') | unique | join(',') }} # lvg: # state: present # vg: "{{ (item.value | first).vgname }}" # pvs: "{{ item.value | json_query('[].pvname') | unique | join(',') }}" # pv_options: "--dataalignment {{ item.value.pv_dataalign | default(pv_dataalign) }}" # pesize is 4m by default for JBODs # pesize: "{{ vg_pesize | default(4) }}" loop: "{{gluster_volumes_by_groupname | default({}) | dict2items}}" when: gluster_volumes_by_groupname is defined and item.value|length>0 - name: update LVM fact's setup: filter: 'ansible_lvm' when: gluster_changed_vgs.changed
participants (2)
-
david.lennox@frontlinedigital.com.au
-
Strahil Nikolov