Dear Friends:
I am still stuck at
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
"One or more bricks could be down. Please execute the command again after bringing
all bricks online and finishing any pending heals", "Volume heal failed."
I refined /etc/lvm/lvm.conf to:
filter =
["a|^/dev/disk/by-id/lvm-pv-uuid-F1kxJk-F1wV-QqOR-Tbb1-Pefh-4vod-IVYaz6$|",
"a|^/dev/nvme.n1|", "a|^/dev/dm-1.|", "r|.*|"]
and have also rebuilt the servers again. The output of gluster volume status shows bricks
up but no ports for self-heal daemon:
[root@fmov1n2 ~]# gluster volume status data
Status of volume: data
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick host1.company.com:/gluster_bricks
/data/data 49153 0 Y 244103
Brick host2.company.com:/gluster_bricks
/data/data 49155 0 Y 226082
Brick host3.company.com:/gluster_bricks
/data/data 49155 0 Y 225948
Self-heal Daemon on localhost N/A N/A Y 224255
Self-heal Daemon on
host2.company.com N/A N/A Y 233992
Self-heal Daemon on
host3.company.com N/A N/A Y 224245
Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks
The output of gluster volume heal <volname> info shows connected to the local
self-heal daemon but transport endpoint is not connected to the two remote daemons. This
is the same for all three hosts.
I have followed the solutions here:
https://access.redhat.com/solutions/5089741
and also here:
https://access.redhat.com/solutions/3237651
with no success.
I have changed to a different DNS/DHCP server and still have the same issues. Could this
somehow be related to the direct cabling for my storage/Gluster network (no switch)?
/etc/nsswitch.conf is set to file dns and pings all work, but dig and does not for storage
(I understand this is to be expected).
Again, as always, any pointers or wisdom is greatly appreciated. I am out of ideas.
Thank you!
Charles