Hi Ritesh,
Yes, I have tried Gluster deployment several times. I was able to resolve
the kvdo not installed issue, but no matter what I have tried to date
recently I cannot get Gluster to deploy. I had a hyperconverged oVirt
cluster/Gluster with VDO successfully running on this hardware and switches
before. What I have changed since then was switching the storage to direct
connect and now installing with oVirt v4.4. I was last successful with
oVirt v4.2.
I tried Gluster deployment after cleaning within the Cockpit web console,
using the suggested ansible-playbook and fresh image with oVirt Node v4.4
ISO. Ping from each host to the other two works for both mgmt and storage
networks. I am using DHCP for management network, hosts file for direct
connect storage network.
Thanks again for your help,
Charles
On Mon, Jan 11, 2021 at 10:03 PM Ritesh Chikatwar <rchikatw(a)redhat.com>
wrote:
On Tue, Jan 12, 2021, 2:04 AM Charles Lam <clam2718(a)gmail.com> wrote:
> Dear Strahil and Ritesh,
>
> Thank you both. I am back where I started with:
>
> "One or more bricks could be down. Please execute the command again after
> bringing all bricks online and finishing any pending heals\nVolume heal
> failed.", "stdout_lines": ["One or more bricks could be down.
Please
> execute the command again after bringing all bricks online and finishing
> any pending heals", "Volume heal failed."]
>
> Regarding my most recent issue:
>
> "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe:
> FATAL: Module
> kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"
>
> Per Strahil's note, I checked for kvdo:
>
> [root(a)host1.tld.com conf.d]# rpm -qa | grep vdo
> libblockdev-vdo-2.24-1.el8.x86_64
> vdo-6.2.3.114-14.el8.x86_64
> kmod-kvdo-6.2.2.117-65.el8.x86_64
> [root(a)host1.tld.com conf.d]#
>
> [root(a)host2.tld.com conf.d]# rpm -qa | grep vdo
> libblockdev-vdo-2.24-1.el8.x86_64
> vdo-6.2.3.114-14.el8.x86_64
> kmod-kvdo-6.2.2.117-65.el8.x86_64
> [root(a)host2.tld.com conf.d]#
>
> [root(a)host3.tld.com ~]# rpm -qa | grep vdo
> libblockdev-vdo-2.24-1.el8.x86_64
> vdo-6.2.3.114-14.el8.x86_64
> kmod-kvdo-6.2.2.117-65.el8.x86_64
> [root(a)host3.tld.com ~]#
>
> I found
>
https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-...
> which pointed to
https://bugs.centos.org/view.php?id=17928. As
> suggested on the CentOS bug tracker I attempted to manually install
>
> vdo-support-6.2.4.14-14.el8.x86_64
> vdo-6.2.4.14-14.el8.x86_64
> kmod-kvdo-6.2.3.91-73.el8.x86_64
>
> but there was a dependency that kernel-core be greater than what I was
> installed, so I manually upgraded kernel-core to
> kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to
>
> vdo-6.2.4.14-14.el8.x86_64.rpm
> kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm
>
> and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and
> redeploy I am now back at Gluster deploy failing at
>
> TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on]
> **********
> task path:
> /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
> failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'engine',
'brick':
> '/gluster_bricks/engine/engine', 'arbiter': 0}) =>
{"ansible_loop_var":
> "item", "changed": true, "cmd": ["gluster",
"volume", "heal", "engine",
> "granular-entry-heal", "enable"], "delta":
"0:00:10.098573", "end":
> "2021-01-11 19:27:05.333720", "item": {"arbiter": 0,
"brick":
> "/gluster_bricks/engine/engine", "volname": "engine"},
"msg": "non-zero
> return code", "rc": 107, "start": "2021-01-11
19:26:55.235147", "stderr":
> "", "stderr_lines": [], "stdout": "One or more
bricks could be down. Please
> execute the command again after bringing all bricks online and finishing
> any pending heals\nVolume heal failed.", "stdout_lines": ["One or
more
> bricks could be down. Please execute the command again after bringing all
> bricks online and finishing any pending heals", "Volume heal
failed."]}
> failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'data',
'brick':
> '/gluster_bricks/data/data', 'arbiter': 0}) =>
{"ansible_loop_var": "item",
> "changed": true, "cmd": ["gluster", "volume",
"heal", "data",
> "granular-entry-heal", "enable"], "delta":
"0:00:10.099670", "end":
> "2021-01-11 19:27:20.564554", "item": {"arbiter": 0,
"brick":
> "/gluster_bricks/data/data", "volname": "data"},
"msg": "non-zero return
> code", "rc": 107, "start": "2021-01-11
19:27:10.464884", "stderr": "",
> "stderr_lines": [], "stdout": "One or more bricks could be
down. Please
> execute the command again after bringing all bricks online and finishing
> any pending heals\nVolume heal failed.", "stdout_lines": ["One or
more
> bricks could be down. Please execute the command again after bringing all
> bricks online and finishing any pending heals", "Volume heal
failed."]}
> failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore',
'brick':
> '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) =>
{"ansible_loop_var":
> "item", "changed": true, "cmd": ["gluster",
"volume", "heal", "vmstore",
> "granular-entry-heal", "enable"], "delta":
"0:00:10.104624", "end":
> "2021-01-11 19:27:35.774230", "item": {"arbiter": 0,
"brick":
> "/gluster_bricks/vmstore/vmstore", "volname":
"vmstore"}, "msg": "non-zero
> return code", "rc": 107, "start": "2021-01-11
19:27:25.669606", "stderr":
> "", "stderr_lines": [], "stdout": "One or more
bricks could be down. Please
> execute the command again after bringing all bricks online and finishing
> any pending heals\nVolume heal failed.", "stdout_lines": ["One or
more
> bricks could be down. Please execute the command again after bringing all
> bricks online and finishing any pending heals", "Volume heal
failed."]}
>
> NO MORE HOSTS LEFT
> *************************************************************
>
> NO MORE HOSTS LEFT
> *************************************************************
>
> PLAY RECAP
> *********************************************************************
>
fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0
> failed=1 skipped=188 rescued=0 ignored=1
>
fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0
> failed=0 skipped=163 rescued=0 ignored=1
>
fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0
> failed=0 skipped=163 rescued=0 ignored=1
>
> Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for
> more informations.
>
> I doubled-back to Strahil's recommendation to restart Gluster and enable
> granular-entry-heal. This fails, example:
>
> [root@host1 ~]# gluster volume heal data granular-entry-heal enable
> One or more bricks could be down. Please execute the command again after
> bringing all bricks online and finishing any pending heals
> Volume heal failed.
>
> I have followed Ritesh's suggestion:
>
> [root@host1~]# ansible-playbook
>
/etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml
> -i /etc/ansible/hc_wizard_inventory.yml
>
> which appeared to execute successfully:
>
> PLAY RECAP
>
**********************************************************************************************************
>
fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0
> failed=0 skipped=2 rescued=0 ignored=0
>
fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0
> failed=0 skipped=1 rescued=0 ignored=0
>
fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0
> failed=0 skipped=1 rescued=0 ignored=0
>
So after this have you tried gluster deployment..?
>
> Here is the info Strahil requested when I first reported this issue on
> December 18th, re-run today, January 11:
>
> [root@host1 ~]# gluster pool list
> UUID Hostname State
> 4964020a-9632-43eb-9468-798920e98559
host2.domain.com Connected
> f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b
host3.domain.com Connected
> 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected
> [root@host1 ~]# gluster volume list
> data
> engine
> vmstore
> [root@host1 ~]# for i in $(gluster volume list); do gluster volume
> status $i; gluster volume info $i; echo
>
"###########################################################################################################";done
> Status of volume: data
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick host1.domain.com:/gluster_bricks
> /data/data 49153 0 Y
> 406272
> Brick host2.domain.com:/gluster_bricks
> /data/data 49153 0 Y
> 360300
> Brick host3.domain.com:/gluster_bricks
> /data/data 49153 0 Y
> 360082
> Self-heal Daemon on localhost N/A N/A Y
> 413227
> Self-heal Daemon on
host2.domain.com N/A N/A Y
> 360223
> Self-heal Daemon on
host3.domain.com N/A N/A Y
> 360003
>
> Task Status of Volume data
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> Volume Name: data
> Type: Replicate
> Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: host1.domain.com:/gluster_bricks/data/data
> Brick2: host2.domain.com:/gluster_bricks/data/data
> Brick3: host3.domain.com:/gluster_bricks/data/data
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> cluster.choose-local: off
> client.event-threads: 4
> server.event-threads: 4
> storage.owner-uid: 36
> storage.owner-gid: 36
> network.ping-timeout: 30
> performance.strict-o-direct: on
>
>
###########################################################################################################
> Status of volume: engine
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick host1.domain.com:/gluster_bricks
> /engine/engine 49152 0 Y
> 404563
> Brick host2.domain.com:/gluster_bricks
> /engine/engine 49152 0 Y
> 360202
> Brick host3.domain.com:/gluster_bricks
> /engine/engine 49152 0 Y
> 359982
> Self-heal Daemon on localhost N/A N/A Y
> 413227
> Self-heal Daemon on
host3.domain.com N/A N/A Y
> 360003
> Self-heal Daemon on
host2.domain.com N/A N/A Y
> 360223
>
> Task Status of Volume engine
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> Volume Name: engine
> Type: Replicate
> Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: host1.domain.com:/gluster_bricks/engine/engine
> Brick2: host2.domain.com:/gluster_bricks/engine/engine
> Brick3: host3.domain.com:/gluster_bricks/engine/engine
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> cluster.choose-local: off
> client.event-threads: 4
> server.event-threads: 4
> storage.owner-uid: 36
> storage.owner-gid: 36
> network.ping-timeout: 30
> performance.strict-o-direct: on
>
>
###########################################################################################################
> Status of volume: vmstore
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick host1.domain.com:/gluster_bricks
> /vmstore/vmstore 49154 0 Y
> 407952
> Brick host2.domain.com:/gluster_bricks
> /vmstore/vmstore 49154 0 Y
> 360389
> Brick host3.domain.com:/gluster_bricks
> /vmstore/vmstore 49154 0 Y
> 360176
> Self-heal Daemon on localhost N/A N/A Y
> 413227
> Self-heal Daemon on
host2.domain.com N/A N/A Y
> 360223
> Self-heal Daemon on
host3.domain.com N/A N/A Y
> 360003
>
> Task Status of Volume vmstore
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> Volume Name: vmstore
> Type: Replicate
> Volume ID: 27c8346c-0374-4108-a33a-0024007a9527
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore
> Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore
> Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> cluster.choose-local: off
> client.event-threads: 4
> server.event-threads: 4
> storage.owner-uid: 36
> storage.owner-gid: 36
> network.ping-timeout: 30
> performance.strict-o-direct: on
>
>
###########################################################################################################
> [root@host1 ~]#
>
> Again, further suggestions for troubleshooting are VERY much appreciated!
>
> Respectfully,
> Charles
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQS...
>