Dear Strahil and Ritesh,
Thank you both. I am back where I started with:
"One or more bricks could be down. Please execute the command again after bringing
all bricks online and finishing any pending heals\nVolume heal failed.",
"stdout_lines": ["One or more bricks could be down. Please execute the
command again after bringing all bricks online and finishing any pending heals",
"Volume heal failed."]
Regarding my most recent issue:
"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module
kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"
Per Strahil's note, I checked for kvdo:
[root(a)host1.tld.com conf.d]# rpm -qa | grep vdo
libblockdev-vdo-2.24-1.el8.x86_64
vdo-6.2.3.114-14.el8.x86_64
kmod-kvdo-6.2.2.117-65.el8.x86_64
[root(a)host1.tld.com conf.d]#
[root(a)host2.tld.com conf.d]# rpm -qa | grep vdo
libblockdev-vdo-2.24-1.el8.x86_64
vdo-6.2.3.114-14.el8.x86_64
kmod-kvdo-6.2.2.117-65.el8.x86_64
[root(a)host2.tld.com conf.d]#
[root(a)host3.tld.com ~]# rpm -qa | grep vdo
libblockdev-vdo-2.24-1.el8.x86_64
vdo-6.2.3.114-14.el8.x86_64
kmod-kvdo-6.2.2.117-65.el8.x86_64
[root(a)host3.tld.com ~]#
I found
https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-...
which pointed to
https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS
bug tracker I attempted to manually install
vdo-support-6.2.4.14-14.el8.x86_64
vdo-6.2.4.14-14.el8.x86_64
kmod-kvdo-6.2.3.91-73.el8.x86_64
but there was a dependency that kernel-core be greater than what I was installed, so I
manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo
and kmod-kvdo to
vdo-6.2.4.14-14.el8.x86_64.rpm
kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm
and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now
back at Gluster deploy failing at
TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'engine',
'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"engine", "granular-entry-heal", "enable"],
"delta": "0:00:10.098573", "end": "2021-01-11
19:27:05.333720", "item": {"arbiter": 0, "brick":
"/gluster_bricks/engine/engine", "volname": "engine"},
"msg": "non-zero return code", "rc": 107, "start":
"2021-01-11 19:26:55.235147", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick':
'/gluster_bricks/data/data', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"data", "granular-entry-heal", "enable"], "delta":
"0:00:10.099670", "end": "2021-01-11 19:27:20.564554",
"item": {"arbiter": 0, "brick":
"/gluster_bricks/data/data", "volname": "data"},
"msg": "non-zero return code", "rc": 107, "start":
"2021-01-11 19:27:10.464884", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [
fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore',
'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"vmstore", "granular-entry-heal", "enable"],
"delta": "0:00:10.104624", "end": "2021-01-11
19:27:35.774230", "item": {"arbiter": 0, "brick":
"/gluster_bricks/vmstore/vmstore", "volname": "vmstore"},
"msg": "non-zero return code", "rc": 107, "start":
"2021-01-11 19:27:25.669606", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
NO MORE HOSTS LEFT *************************************************************
NO MORE HOSTS LEFT *************************************************************
PLAY RECAP *********************************************************************
fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188
rescued=0 ignored=1
fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163
rescued=0 ignored=1
fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163
rescued=0 ignored=1
Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more
informations.
I doubled-back to Strahil's recommendation to restart Gluster and enable
granular-entry-heal. This fails, example:
[root@host1 ~]# gluster volume heal data granular-entry-heal enable
One or more bricks could be down. Please execute the command again after bringing all
bricks online and finishing any pending heals
Volume heal failed.
I have followed Ritesh's suggestion:
[root@host1~]# ansible-playbook
/etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml
-i /etc/ansible/hc_wizard_inventory.yml
which appeared to execute successfully:
PLAY RECAP
**********************************************************************************************************
fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2
rescued=0 ignored=0
fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1
rescued=0 ignored=0
fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1
rescued=0 ignored=0
Here is the info Strahil requested when I first reported this issue on December 18th,
re-run today, January 11:
[root@host1 ~]# gluster pool list
UUID Hostname State
4964020a-9632-43eb-9468-798920e98559
host2.domain.com Connected
f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b
host3.domain.com Connected
6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected
[root@host1 ~]# gluster volume list
data
engine
vmstore
[root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster
volume info $i; echo
"###########################################################################################################";done
Status of volume: data
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick host1.domain.com:/gluster_bricks
/data/data 49153 0 Y 406272
Brick host2.domain.com:/gluster_bricks
/data/data 49153 0 Y 360300
Brick host3.domain.com:/gluster_bricks
/data/data 49153 0 Y 360082
Self-heal Daemon on localhost N/A N/A Y 413227
Self-heal Daemon on
host2.domain.com N/A N/A Y 360223
Self-heal Daemon on
host3.domain.com N/A N/A Y 360003
Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks
Volume Name: data
Type: Replicate
Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1.domain.com:/gluster_bricks/data/data
Brick2: host2.domain.com:/gluster_bricks/data/data
Brick3: host3.domain.com:/gluster_bricks/data/data
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
###########################################################################################################
Status of volume: engine
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick host1.domain.com:/gluster_bricks
/engine/engine 49152 0 Y 404563
Brick host2.domain.com:/gluster_bricks
/engine/engine 49152 0 Y 360202
Brick host3.domain.com:/gluster_bricks
/engine/engine 49152 0 Y 359982
Self-heal Daemon on localhost N/A N/A Y 413227
Self-heal Daemon on
host3.domain.com N/A N/A Y 360003
Self-heal Daemon on
host2.domain.com N/A N/A Y 360223
Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks
Volume Name: engine
Type: Replicate
Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1.domain.com:/gluster_bricks/engine/engine
Brick2: host2.domain.com:/gluster_bricks/engine/engine
Brick3: host3.domain.com:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
###########################################################################################################
Status of volume: vmstore
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick host1.domain.com:/gluster_bricks
/vmstore/vmstore 49154 0 Y 407952
Brick host2.domain.com:/gluster_bricks
/vmstore/vmstore 49154 0 Y 360389
Brick host3.domain.com:/gluster_bricks
/vmstore/vmstore 49154 0 Y 360176
Self-heal Daemon on localhost N/A N/A Y 413227
Self-heal Daemon on
host2.domain.com N/A N/A Y 360223
Self-heal Daemon on
host3.domain.com N/A N/A Y 360003
Task Status of Volume vmstore
------------------------------------------------------------------------------
There are no active volume tasks
Volume Name: vmstore
Type: Replicate
Volume ID: 27c8346c-0374-4108-a33a-0024007a9527
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore
Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore
Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
###########################################################################################################
[root@host1 ~]#
Again, further suggestions for troubleshooting are VERY much appreciated!
Respectfully,
Charles