[ovirt-users] Re: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

13 Jan 2021

      Dear Friends:

I am still stuck at

task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."

I refined /etc/lvm/lvm.conf to:

filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-F1kxJk-F1wV-QqOR-Tbb1-Pefh-4vod-IVYaz6$|", "a|^/dev/nvme.n1|", "a|^/dev/dm-1.|", "r|.*|"]

and have also rebuilt the servers again.  The output of gluster volume status shows bricks up but no ports for self-heal daemon:

[root@fmov1n2 ~]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick host1.company.com:/gluster_bricks
/data/data                                  49153     0          Y       244103
Brick host2.company.com:/gluster_bricks
/data/data                                  49155     0          Y       226082
Brick host3.company.com:/gluster_bricks
/data/data                                  49155     0          Y       225948
Self-heal Daemon on localhost               N/A       N/A        Y       224255
Self-heal Daemon on host2.company.com   N/A       N/A        Y       233992
Self-heal Daemon on host3.company.com   N/A       N/A        Y       224245

Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks

The output of gluster volume heal <volname> info shows connected to the local self-heal daemon but transport endpoint is not connected to the two remote daemons.  This is the same for all three hosts.

I have followed the solutions here: https://access.redhat.com/solutions/5089741
and also here: https://access.redhat.com/solutions/3237651

with no success.

I have changed to a different DNS/DHCP server and still have the same issues.  Could this somehow be related to the direct cabling for my storage/Gluster network (no switch)?  /etc/nsswitch.conf is set to file dns and pings all work, but dig and does not for storage (I understand this is to be expected).

Again, as always, any pointers or wisdom is greatly appreciated.  I am out of ideas.

Thank you!
Charles

[ovirt-users] Re: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Charles Lam