New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

newer
New setup - Failing to Activate...

Charles Lam

18 Dec 2020 18 Dec '20

3:43 p.m.

Dear friends, Thanks to Donald and Strahil, my earlier Gluster deploy issue was resolved by disabling multipath on the nvme drives. The Gluster deployment is now failing on the three node hyperconverged oVirt v4.3.3 deployment at: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 with: "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed." Specifically: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.112451", "end": "2020-12-18 19:50:22.818741", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:12.706290", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.110165", "end": "2020-12-18 19:50:38.260277", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:28.150112", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.113203", "end": "2020-12-18 19:50:53.767864", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:43.654661", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} Any suggestions regarding troubleshooting, insight or recommendations for reading are greatly appreciated. I apologize for all the email and am only creating this as a separate thread as it is a new, presumably unrelated issue. I welcome any recommendations if I can improve my forum etiquette. Respectfully, Charles

Show replies by date

Strahil Nikolov

19 Dec 19 Dec

12:56 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

You will need to check the gluster volumes status. Can you provide output of the following (from 1 node ): gluster pool list gluster volume list for i in $(gluster volume list); do gluster volume status $i ; gluster volume info $i; echo "#######################################################################################" ; done Best Regards, Strahil Nikolov В петък, 18 декември 2020 г., 22:44:48 Гринуич+2, Charles Lam <clam2718@gmail.com> написа: Dear friends, Thanks to Donald and Strahil, my earlier Gluster deploy issue was resolved by disabling multipath on the nvme drives. The Gluster deployment is now failing on the three node hyperconverged oVirt v4.3.3 deployment at: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 with: "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed." Specifically: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.112451", "end": "2020-12-18 19:50:22.818741", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:12.706290", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.110165", "end": "2020-12-18 19:50:38.260277", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:28.150112", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.113203", "end": "2020-12-18 19:50:53.767864", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:43.654661", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} Any suggestions regarding troubleshooting, insight or recommendations for reading are greatly appreciated. I apologize for all the email and am only creating this as a separate thread as it is a new, presumably unrelated issue. I welcome any recommendations if I can improve my forum etiquette. Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZRER63XZ3XF6HG...

Charles Lam

21 Dec 21 Dec

10:53 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Thanks so very much Strahil for your continued assistance! [root@fmov1n1 conf.d]# gluster pool list UUID Hostname State 16e921fb-99d3-4a2e-81e6-ba095dbc14ca host2.fqdn.tld Connected d4488961-c854-449a-a211-1593810df52f host3.fqdn.tld Connected f9f9282c-0c1d-405a-a3d3-815e5c6b2606 localhost Connected [root@fmov1n1 conf.d]# gluster volume list data engine vmstore [root@fmov1n1 conf.d]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "#######################################################################################"; done Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 899467 Brick host2.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 820456 Brick host3.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 820482 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: data Type: Replicate Volume ID: b4e984c8-7c43-4faa-92e1-84351a645408 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/data/data Brick2: host2.fqdn.tld:/gluster_bricks/data/data Brick3: host3.fqdn.tld:/gluster_bricks/data/data Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 897767 Brick host2.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 820346 Brick host3.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 820385 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: engine Type: Replicate Volume ID: 75cc04e6-d1cb-4069-aa25-81550b7878db Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/engine/engine Brick2: host2.fqdn.tld:/gluster_bricks/engine/engine Brick3: host3.fqdn.tld:/gluster_bricks/engine/engine Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 901139 Brick host2.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 820544 Brick host3.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 820587 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Task Status of Volume vmstore ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: vmstore Type: Replicate Volume ID: 9810637b-2bae-48ae-8e4e-786bb92d18d7 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/vmstore/vmstore Brick2: host2.fqdn.tld:/gluster_bricks/vmstore/vmstore Brick3: host3.fqdn.tld:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### [root@fmov1n1 conf.d]# Respectfully, Charles

Strahil Nikolov

1:15 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

I see that the selfheal daemon is not running. Just try the following from host1: systemctl stop glusterd; sleep 5; systemctl start glusterd for i in $(gluster volume list); do gluster volume set $i cluster.granular-entry-heal enable ; done And then rerun the Ansible flow. Best Regards, Strahil Nikolov В понеделник, 21 декември 2020 г., 17:54:42 Гринуич+2, Charles Lam <clam2718@gmail.com> написа: Thanks so very much Strahil for your continued assistance! [root@fmov1n1 conf.d]# gluster pool list UUID Hostname State 16e921fb-99d3-4a2e-81e6-ba095dbc14ca host2.fqdn.tld Connected d4488961-c854-449a-a211-1593810df52f host3.fqdn.tld Connected f9f9282c-0c1d-405a-a3d3-815e5c6b2606 localhost Connected [root@fmov1n1 conf.d]# gluster volume list data engine vmstore [root@fmov1n1 conf.d]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "#######################################################################################"; done Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 899467 Brick host2.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 820456 Brick host3.fqdn.tld:/gluster_bricks /data/data 49153 0 Y 820482 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: data Type: Replicate Volume ID: b4e984c8-7c43-4faa-92e1-84351a645408 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/data/data Brick2: host2.fqdn.tld:/gluster_bricks/data/data Brick3: host3.fqdn.tld:/gluster_bricks/data/data Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 897767 Brick host2.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 820346 Brick host3.fqdn.tld:/gluster_bricks /engine/engine 49152 0 Y 820385 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: engine Type: Replicate Volume ID: 75cc04e6-d1cb-4069-aa25-81550b7878db Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/engine/engine Brick2: host2.fqdn.tld:/gluster_bricks/engine/engine Brick3: host3.fqdn.tld:/gluster_bricks/engine/engine Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 901139 Brick host2.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 820544 Brick host3.fqdn.tld:/gluster_bricks /vmstore/vmstore 49154 0 Y 820587 Self-heal Daemon on localhost N/A N/A Y 897788 Self-heal Daemon on host2.fqdn.tld N/A N/A Y 820367 Self-heal Daemon on host3.fqdn.tld N/A N/A Y 820406 Task Status of Volume vmstore ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: vmstore Type: Replicate Volume ID: 9810637b-2bae-48ae-8e4e-786bb92d18d7 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.fqdn.tld:/gluster_bricks/vmstore/vmstore Brick2: host2.fqdn.tld:/gluster_bricks/vmstore/vmstore Brick3: host3.fqdn.tld:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on ####################################################################################### [root@fmov1n1 conf.d]# Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IOBHS6OZAGOI7A...

Charles Lam

5:38 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Still not able to deploy Gluster on oVirt Node Hyperconverged - same error; upgraded to v4.4.4 and "kvdo not installed" Tried suggestion and per https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht... I also tried "gluster volume heal VOLNAME granular-entry-heal enable" then "gluster volume heal VOLNAME" received "transport endpoint is not connected" double checked networking, restarted the volume per https://access.redhat.com/solutions/5089741 using "gluster volume start VOLNAME force" also checked Gluster server and client versions per https://access.redhat.com/solutions/5380531 --> self-heal daemon shows process ID and local status but not peer status updated to oVirt v4.4.4 and now am receiving "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" This appears to be a recent issue. I am going to re-image nodes with oVirt Node v4.4.4 and rebuild networking and see if that helps, if not I will revert to v4.4.2 (most recent successful on this cluster) and see if I can build it there. I am using local direct cable between three host nodes for storage network, with statically assigned IPs on local network adapters along with Hosts file and 3x /30 subnets for routing. Management network is DHCP and set up as before when successful. I have confirmed "files" is listed first in nsswitch.conf and have not had any issues with ssh to storage network or management network --> could anything related to direct cabling be reason for peer connection issue with self-heal daemon even though "gluster peer status" and "gluster peer probe" are successful? Thanks again, I will update after rebuilding with oVirt Node v4.4.4 Respectfully, Charles

Strahil Nikolov

5:57 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

I'm also using direct cabling ,so I doubt that is the problem. Starting fresh is wise, but keep in mind: - wipe your bricks before installing gluster - check with 'lsblk -t' the PHY-SEC. If it's not 512, use vdo with " --emulate512" flag - Ensure that name resolution is working and each node can reach the other nodes in the pool Best Regards, Strahil Nikolov В 22:38 +0000 на 21.12.2020 (пн), Charles Lam написа:

...

Still not able to deploy Gluster on oVirt Node Hyperconverged - same error; upgraded to v4.4.4 and "kvdo not installed"

Tried suggestion and per https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht... I also tried "gluster volume heal VOLNAME granular-entry-heal enable" then "gluster volume heal VOLNAME" received "transport endpoint is not connected" double checked networking, restarted the volume per https://access.redhat.com/solutions/5089741 using "gluster volume start VOLNAME force" also checked Gluster server and client versions per https://access.redhat.com/solutions/5380531 --> self-heal daemon shows process ID and local status but not peer status

updated to oVirt v4.4.4 and now am receiving

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

This appears to be a recent issue. I am going to re-image nodes with oVirt Node v4.4.4 and rebuild networking and see if that helps, if not I will revert to v4.4.2 (most recent successful on this cluster) and see if I can build it there.

I am using local direct cable between three host nodes for storage network, with statically assigned IPs on local network adapters along with Hosts file and 3x /30 subnets for routing. Management network is DHCP and set up as before when successful. I have confirmed "files" is listed first in nsswitch.conf and have not had any issues with ssh to storage network or management network --> could anything related to direct cabling be reason for peer connection issue with self-heal daemon even though "gluster peer status" and "gluster peer probe" are successful?

Thanks again, I will update after rebuilding with oVirt Node v4.4.4

Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICVEYLCC677BYG...

Charles Lam

8 Jan 8 Jan

6:23 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Dear Strahil, I have rebuilt everything fresh, switches, hosts, cabling - PHY-SEC shows 512 for all nvme drives being used as bricks. Name resolution via /etc/hosts for direct connect storage network works for all hosts to all hosts. I am still blocked by the same "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" Any further suggestions are MOST appreciated. Thank you and respectfully, Charles

Strahil Nikolov

9 Jan 9 Jan

1:28 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

What is the output of 'rpm -qa | grep vdo' ? Most probably the ansible flow is not deploying kvdo , but it's necessary at a later stage.Try to overcome via "yum search kvdo" and then 'yum install kmod-kvdo" (replace kmod-kvdo with the package for EL8). Also, I think that you can open a github issue at https://github.com/oVirt/ovirt-ansible-engine-setup/issues Best Regards, Strahil Nikolov В събота, 9 януари 2021 г., 01:24:56 Гринуич+2, Charles Lam <clam2718@gmail.com> написа: Dear Strahil, I have rebuilt everything fresh, switches, hosts, cabling - PHY-SEC shows 512 for all nvme drives being used as bricks. Name resolution via /etc/hosts for direct connect storage network works for all hosts to all hosts. I am still blocked by the same "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" Any further suggestions are MOST appreciated. Thank you and respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4ZPHST3IYCIPY...

Ritesh Chikatwar

1:54 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Hello, Can you try once cleaning the gluster deployment you can do this by running this command on one of the hosts. ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml And then rerun the Ansible flow. On Sat, Jan 9, 2021 at 12:00 PM Strahil Nikolov via Users <users@ovirt.org> wrote:

...

What is the output of 'rpm -qa | grep vdo' ? Most probably the ansible flow is not deploying kvdo , but it's necessary at a later stage.Try to overcome via "yum search kvdo" and then 'yum install kmod-kvdo" (replace kmod-kvdo with the package for EL8).

Also, I think that you can open a github issue at https://github.com/oVirt/ovirt-ansible-engine-setup/issues

Best Regards, Strahil Nikolov

В събота, 9 януари 2021 г., 01:24:56 Гринуич+2, Charles Lam < clam2718@gmail.com> написа:

Dear Strahil,

I have rebuilt everything fresh, switches, hosts, cabling - PHY-SEC shows 512 for all nvme drives being used as bricks. Name resolution via /etc/hosts for direct connect storage network works for all hosts to all hosts. I am still blocked by the same

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

Any further suggestions are MOST appreciated.

Thank you and respectfully,

Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4ZPHST3IYCIPY... _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCBLZGNYBDOB5B...

Charles Lam

11 Jan 11 Jan

3:32 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Dear Strahil and Ritesh, Thank you both. I am back where I started with: "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."] Regarding my most recent issue: "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" Per Strahil's note, I checked for kvdo: [root@host1.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host1.tld.com conf.d]# [root@host2.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host2.tld.com conf.d]# [root@host3.tld.com ~]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host3.tld.com ~]# I found https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-cre... which pointed to https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS bug tracker I attempted to manually install vdo-support-6.2.4.14-14.el8.x86_64 vdo-6.2.4.14-14.el8.x86_64 kmod-kvdo-6.2.3.91-73.el8.x86_64 but there was a dependency that kernel-core be greater than what I was installed, so I manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to vdo-6.2.4.14-14.el8.x86_64.rpm kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now back at Gluster deploy failing at TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188 rescued=0 ignored=1 fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more informations. I doubled-back to Strahil's recommendation to restart Gluster and enable granular-entry-heal. This fails, example: [root@host1 ~]# gluster volume heal data granular-entry-heal enable One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals Volume heal failed. I have followed Ritesh's suggestion: [root@host1~]# ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml which appeared to execute successfully: PLAY RECAP ********************************************************************************************************** fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 Here is the info Strahil requested when I first reported this issue on December 18th, re-run today, January 11: [root@host1 ~]# gluster pool list UUID Hostname State 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected [root@host1 ~]# gluster volume list data engine vmstore [root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "###########################################################################################################";done Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /data/data 49153 0 Y 406272 Brick host2.domain.com:/gluster_bricks /data/data 49153 0 Y 360300 Brick host3.domain.com:/gluster_bricks /data/data 49153 0 Y 360082 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: data Type: Replicate Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/data/data Brick2: host2.domain.com:/gluster_bricks/data/data Brick3: host3.domain.com:/gluster_bricks/data/data Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /engine/engine 49152 0 Y 404563 Brick host2.domain.com:/gluster_bricks /engine/engine 49152 0 Y 360202 Brick host3.domain.com:/gluster_bricks /engine/engine 49152 0 Y 359982 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: engine Type: Replicate Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/engine/engine Brick2: host2.domain.com:/gluster_bricks/engine/engine Brick3: host3.domain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 407952 Brick host2.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360389 Brick host3.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360176 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Task Status of Volume vmstore ------------------------------------------------------------------------------ There are no active volume tasks Volume Name: vmstore Type: Replicate Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### [root@host1 ~]# Again, further suggestions for troubleshooting are VERY much appreciated! Respectfully, Charles

Ritesh Chikatwar

10:02 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

On Tue, Jan 12, 2021, 2:04 AM Charles Lam <clam2718@gmail.com> wrote:

...

Dear Strahil and Ritesh,

Thank you both. I am back where I started with:

"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]

Regarding my most recent issue:

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

Per Strahil's note, I checked for kvdo:

[root@host1.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host1.tld.com conf.d]#

[root@host2.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host2.tld.com conf.d]#

[root@host3.tld.com ~]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host3.tld.com ~]#

I found https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-cre... which pointed to https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS bug tracker I attempted to manually install

vdo-support-6.2.4.14-14.el8.x86_64 vdo-6.2.4.14-14.el8.x86_64 kmod-kvdo-6.2.3.91-73.el8.x86_64

but there was a dependency that kernel-core be greater than what I was installed, so I manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to

vdo-6.2.4.14-14.el8.x86_64.rpm kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm

and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now back at Gluster deploy failing at

TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}

NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP ********************************************************************* fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188 rescued=0 ignored=1 fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1

Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more informations.

I doubled-back to Strahil's recommendation to restart Gluster and enable granular-entry-heal. This fails, example:

[root@host1 ~]# gluster volume heal data granular-entry-heal enable One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals Volume heal failed.

I have followed Ritesh's suggestion:

[root@host1~]# ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml

which appeared to execute successfully:

PLAY RECAP ********************************************************************************************************** fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

So after this have you tried gluster deployment..?

...

Here is the info Strahil requested when I first reported this issue on December 18th, re-run today, January 11:

[root@host1 ~]# gluster pool list UUID Hostname State 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected [root@host1 ~]# gluster volume list data engine vmstore [root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "###########################################################################################################";done Status of volume: data Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /data/data 49153 0 Y 406272 Brick host2.domain.com:/gluster_bricks /data/data 49153 0 Y 360300 Brick host3.domain.com:/gluster_bricks /data/data 49153 0 Y 360082 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume data

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: data Type: Replicate Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/data/data Brick2: host2.domain.com:/gluster_bricks/data/data Brick3: host3.domain.com:/gluster_bricks/data/data Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /engine/engine 49152 0 Y 404563 Brick host2.domain.com:/gluster_bricks /engine/engine 49152 0 Y 360202 Brick host3.domain.com:/gluster_bricks /engine/engine 49152 0 Y 359982 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Self-heal Daemon on host2.domain.com N/A N/A Y 360223

Task Status of Volume engine

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: engine Type: Replicate Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/engine/engine Brick2: host2.domain.com:/gluster_bricks/engine/engine Brick3: host3.domain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 407952 Brick host2.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360389 Brick host3.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360176 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume vmstore

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: vmstore Type: Replicate Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### [root@host1 ~]#

Again, further suggestions for troubleshooting are VERY much appreciated!

Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQSXFS...

Charles Lam

11:45 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Hi Ritesh, Yes, I have tried Gluster deployment several times. I was able to resolve the kvdo not installed issue, but no matter what I have tried to date recently I cannot get Gluster to deploy. I had a hyperconverged oVirt cluster/Gluster with VDO successfully running on this hardware and switches before. What I have changed since then was switching the storage to direct connect and now installing with oVirt v4.4. I was last successful with oVirt v4.2. I tried Gluster deployment after cleaning within the Cockpit web console, using the suggested ansible-playbook and fresh image with oVirt Node v4.4 ISO. Ping from each host to the other two works for both mgmt and storage networks. I am using DHCP for management network, hosts file for direct connect storage network. Thanks again for your help, Charles On Mon, Jan 11, 2021 at 10:03 PM Ritesh Chikatwar <rchikatw@redhat.com> wrote:

...

On Tue, Jan 12, 2021, 2:04 AM Charles Lam <clam2718@gmail.com> wrote:

...
Dear Strahil and Ritesh,

Thank you both. I am back where I started with:

"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]

Regarding my most recent issue:

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

Per Strahil's note, I checked for kvdo:

[root@host1.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host1.tld.com conf.d]#

[root@host2.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host2.tld.com conf.d]#

[root@host3.tld.com ~]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host3.tld.com ~]#

I found https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-cre... which pointed to https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS bug tracker I attempted to manually install

vdo-support-6.2.4.14-14.el8.x86_64 vdo-6.2.4.14-14.el8.x86_64 kmod-kvdo-6.2.3.91-73.el8.x86_64

but there was a dependency that kernel-core be greater than what I was installed, so I manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to

vdo-6.2.4.14-14.el8.x86_64.rpm kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm

and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now back at Gluster deploy failing at

TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}

NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP ********************************************************************* fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188 rescued=0 ignored=1 fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1

Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more informations.

I doubled-back to Strahil's recommendation to restart Gluster and enable granular-entry-heal. This fails, example:

[root@host1 ~]# gluster volume heal data granular-entry-heal enable One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals Volume heal failed.

I have followed Ritesh's suggestion:

[root@host1~]# ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml

which appeared to execute successfully:

PLAY RECAP ********************************************************************************************************** fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

So after this have you tried gluster deployment..?

...
Here is the info Strahil requested when I first reported this issue on December 18th, re-run today, January 11:

[root@host1 ~]# gluster pool list UUID Hostname State 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected [root@host1 ~]# gluster volume list data engine vmstore [root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "###########################################################################################################";done Status of volume: data Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /data/data 49153 0 Y 406272 Brick host2.domain.com:/gluster_bricks /data/data 49153 0 Y 360300 Brick host3.domain.com:/gluster_bricks /data/data 49153 0 Y 360082 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume data

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: data Type: Replicate Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/data/data Brick2: host2.domain.com:/gluster_bricks/data/data Brick3: host3.domain.com:/gluster_bricks/data/data Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /engine/engine 49152 0 Y 404563 Brick host2.domain.com:/gluster_bricks /engine/engine 49152 0 Y 360202 Brick host3.domain.com:/gluster_bricks /engine/engine 49152 0 Y 359982 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Self-heal Daemon on host2.domain.com N/A N/A Y 360223

Task Status of Volume engine

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: engine Type: Replicate Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/engine/engine Brick2: host2.domain.com:/gluster_bricks/engine/engine Brick3: host3.domain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 407952 Brick host2.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360389 Brick host3.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360176 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume vmstore

------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: vmstore Type: Replicate Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on

########################################################################################################### [root@host1 ~]#

Again, further suggestions for troubleshooting are VERY much appreciated!

Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQSXFS...

Strahil Nikolov

12 Jan 12 Jan

12:11 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

...

I tried Gluster deployment after cleaning within the Cockpit web console, using the suggested ansible-playbook and fresh image with oVirt Node v4.4 ISO. Ping from each host to the other two works for both mgmt and storage networks. I am using DHCP for management network, hosts file for direct connect storage network.

I've tested the command on a test Gluster 8.3 cluster and it passes.Have you checked the gluster logs in '/var/log/gluster' ? I know that there is LVM filtering on oVirt 4.4 enabled, so can you take a look in the lvm conf : grep -Ev "^$|#" /etc/lvm/lvm.conf | grep filter VDO is using lvm.conf too, so it could cause strange issues. What happens when the deployment fails and you rerun (ansible should be idempotent) ? Best Regards, Strahil Nikolov

Charles Lam

7:59 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

I will check ‘/var/log/gluster’. I had commented out the filter in ‘/etc/lvm/lvm.conf’ - if I don’t the creation of volume groups fails because lvm drives are excluded by filter. Should I not be commenting it out but modifying it in some way? Thanks! Charles On Tue, Jan 12, 2021 at 12:11 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:

...

I tried Gluster deployment after cleaning within the Cockpit web console, using the suggested ansible-playbook and fresh image with oVirt Node v4.4 ISO. Ping from each host to the other two works for both mgmt and storage networks. I am using DHCP for management network, hosts file for direct connect storage network.

I've tested the command on a test Gluster 8.3 cluster and it passes.Have you checked the gluster logs in '/var/log/gluster' ? I know that there is LVM filtering on oVirt 4.4 enabled, so can you take a look in the lvm conf : grep -Ev "^$|#" /etc/lvm/lvm.conf | grep filter VDO is using lvm.conf too, so it could cause strange issues. What happens when the deployment fails and you rerun (ansible should be idempotent) ?

Best Regards, Strahil Nikolov

Charles Lam

13 Jan 13 Jan

6:37 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Dear Friends: I am still stuck at task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed." I refined /etc/lvm/lvm.conf to: filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-F1kxJk-F1wV-QqOR-Tbb1-Pefh-4vod-IVYaz6$|", "a|^/dev/nvme.n1|", "a|^/dev/dm-1.|", "r|.*|"] and have also rebuilt the servers again. The output of gluster volume status shows bricks up but no ports for self-heal daemon: [root@fmov1n2 ~]# gluster volume status data Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.company.com:/gluster_bricks /data/data 49153 0 Y 244103 Brick host2.company.com:/gluster_bricks /data/data 49155 0 Y 226082 Brick host3.company.com:/gluster_bricks /data/data 49155 0 Y 225948 Self-heal Daemon on localhost N/A N/A Y 224255 Self-heal Daemon on host2.company.com N/A N/A Y 233992 Self-heal Daemon on host3.company.com N/A N/A Y 224245 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks The output of gluster volume heal <volname> info shows connected to the local self-heal daemon but transport endpoint is not connected to the two remote daemons. This is the same for all three hosts. I have followed the solutions here: https://access.redhat.com/solutions/5089741 and also here: https://access.redhat.com/solutions/3237651 with no success. I have changed to a different DNS/DHCP server and still have the same issues. Could this somehow be related to the direct cabling for my storage/Gluster network (no switch)? /etc/nsswitch.conf is set to file dns and pings all work, but dig and does not for storage (I understand this is to be expected). Again, as always, any pointers or wisdom is greatly appreciated. I am out of ideas. Thank you! Charles

Strahil Nikolov

14 Jan 14 Jan

12:30 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

As those are brand new, try to install the gluster v8 repo and update the nodes to 8.3 and then rerun the deployment: yum install centos-release-gluster8.noarch yum update Best Regards, Strahil Nikolov В 23:37 +0000 на 13.01.2021 (ср), Charles Lam написа:

...

Dear Friends:

I am still stuck at

task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volum es.yml:67 "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."

I refined /etc/lvm/lvm.conf to:

filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-F1kxJk-F1wV-QqOR-Tbb1-Pefh- 4vod-IVYaz6$|", "a|^/dev/nvme.n1|", "a|^/dev/dm-1.|", "r|.*|"]

and have also rebuilt the servers again. The output of gluster volume status shows bricks up but no ports for self-heal daemon:

[root@fmov1n2 ~]# gluster volume status data Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------- ----------- Brick host1.company.com:/gluster_bricks /data/data 49153 0 Y 244103 Brick host2.company.com:/gluster_bricks /data/data 49155 0 Y 226082 Brick host3.company.com:/gluster_bricks /data/data 49155 0 Y 225948 Self-heal Daemon on localhost N/A N/A Y 224255 Self-heal Daemon on host2.company.com N/A N/A Y 233992 Self-heal Daemon on host3.company.com N/A N/A Y 224245

Task Status of Volume data ------------------------------------------------------------------- ----------- There are no active volume tasks

The output of gluster volume heal <volname> info shows connected to the local self-heal daemon but transport endpoint is not connected to the two remote daemons. This is the same for all three hosts.

I have followed the solutions here: https://access.redhat.com/solutions/5089741 and also here: https://access.redhat.com/solutions/3237651

with no success.

I have changed to a different DNS/DHCP server and still have the same issues. Could this somehow be related to the direct cabling for my storage/Gluster network (no switch)? /etc/nsswitch.conf is set to file dns and pings all work, but dig and does not for storage (I understand this is to be expected).

Again, as always, any pointers or wisdom is greatly appreciated. I am out of ideas.

Thank you! Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OE7EUSWMBTRINH...

Charles Lam

1:17 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Thank you Strahil. I have installed/updated: dnf install --enablerepo="baseos" --enablerepo="appstream" --enablerepo="extras" --enablerepo="ha" --enablerepo="plus" centos-release-gluster8.noarch centos-release-storage-common.noarch dnf upgrade --enablerepo="baseos" --enablerepo="appstream" --enablerepo="extras" --enablerepo="ha" --enablerepo="plus" Cleaned and re-ran Ansible. Still receiving the same (below). As always, if you or anyone else has any ideas for troubleshooting - Gratefully, Charles TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.100254", "end": "2021-01-14 18:07:16.192067", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:06.091813", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.103147", "end": "2021-01-14 18:07:31.431419", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:21.328272", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.102582", "end": "2021-01-14 18:07:46.612788", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:36.510206", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}

Charles Lam

5:39 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Dear Friends, Resolved! Gluster just deployed for me successfully. Turns out it was two typos in my /etc/hosts file. Why or how ping resolved properly and worked I am not sure. Special thanks to Ritesh and most especially Strahil Nikolov for their assistance in resolving other issues along the way. Gratefully, Charles

Strahil Nikolov

11:47 p.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Can you share both ovirt and gluster logs ? Best Regards, Strahil Nikolov В четвъртък, 14 януари 2021 г., 20:18:03 Гринуич+2, Charles Lam <clam2718@gmail.com> написа: Thank you Strahil. I have installed/updated: dnf install --enablerepo="baseos" --enablerepo="appstream" --enablerepo="extras" --enablerepo="ha" --enablerepo="plus" centos-release-gluster8.noarch centos-release-storage-common.noarch dnf upgrade --enablerepo="baseos" --enablerepo="appstream" --enablerepo="extras" --enablerepo="ha" --enablerepo="plus" Cleaned and re-ran Ansible. Still receiving the same (below). As always, if you or anyone else has any ideas for troubleshooting - Gratefully, Charles TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.100254", "end": "2021-01-14 18:07:16.192067", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:06.091813", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.103147", "end": "2021-01-14 18:07:31.431419", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:21.328272", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.102582", "end": "2021-01-14 18:07:46.612788", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-14 18:07:36.510206", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BDLFPRPYPAY3UH...

Sverker Abrahamsson

27 Jan 27 Jan

10:06 a.m.

New subject: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

We ran in to this issue as well when trying to install Ovirt Hyperconverged. The root issue is that kmod-kvdo in Centos 8 (and probably upstream) is built for a specific kernel and if you don't run that kernel it is not found. This is a major issue even if you match the kernel version then if the kernel is updated then your volume will fail because a rpm package for kmod-kvdo for that specific kernel would have to be built. It doesn't even declare a rpm dependency on the kernel version it works with. I cloned the git repo from https://github.com/dm-vdo/kvdo and built a rpm from there, it uses dkms so it will build a module for the running kernel and if the kernel is updated then a new module for that version will be built. Works like a charm every time, but I haven't yet tried to run the hyperconverged wizard again. /Sverker Den 2021-01-11 kl. 21:32, skrev Charles Lam:

...

Dear Strahil and Ritesh,

Thank you both. I am back where I started with:

"One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]

Regarding my most recent issue:

"vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: FATAL: Module kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"

Per Strahil's note, I checked for kvdo:

[root@host1.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host1.tld.com conf.d]#

[root@host2.tld.com conf.d]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host2.tld.com conf.d]#

[root@host3.tld.com ~]# rpm -qa | grep vdo libblockdev-vdo-2.24-1.el8.x86_64 vdo-6.2.3.114-14.el8.x86_64 kmod-kvdo-6.2.2.117-65.el8.x86_64 [root@host3.tld.com ~]#

I found https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-cre... which pointed to https://bugs.centos.org/view.php?id=17928. As suggested on the CentOS bug tracker I attempted to manually install

vdo-support-6.2.4.14-14.el8.x86_64 vdo-6.2.4.14-14.el8.x86_64 kmod-kvdo-6.2.3.91-73.el8.x86_64

but there was a dependency that kernel-core be greater than what I was installed, so I manually upgraded kernel-core to kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to

vdo-6.2.4.14-14.el8.x86_64.rpm kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm

and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and redeploy I am now back at Gluster deploy failing at

TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]}

NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP ********************************************************************* fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 failed=1 skipped=188 rescued=0 ignored=1 fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1 fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 failed=0 skipped=163 rescued=0 ignored=1

Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for more informations.

I doubled-back to Strahil's recommendation to restart Gluster and enable granular-entry-heal. This fails, example:

[root@host1 ~]# gluster volume heal data granular-entry-heal enable One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals Volume heal failed.

I have followed Ritesh's suggestion:

[root@host1~]# ansible-playbook /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml -i /etc/ansible/hc_wizard_inventory.yml

which appeared to execute successfully:

PLAY RECAP ********************************************************************************************************** fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

Here is the info Strahil requested when I first reported this issue on December 18th, re-run today, January 11:

[root@host1 ~]# gluster pool list UUID Hostname State 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected [root@host1 ~]# gluster volume list data engine vmstore [root@host1 ~]# for i in $(gluster volume list); do gluster volume status $i; gluster volume info $i; echo "###########################################################################################################";done Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /data/data 49153 0 Y 406272 Brick host2.domain.com:/gluster_bricks /data/data 49153 0 Y 360300 Brick host3.domain.com:/gluster_bricks /data/data 49153 0 Y 360082 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: data Type: Replicate Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/data/data Brick2: host2.domain.com:/gluster_bricks/data/data Brick3: host3.domain.com:/gluster_bricks/data/data Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /engine/engine 49152 0 Y 404563 Brick host2.domain.com:/gluster_bricks /engine/engine 49152 0 Y 360202 Brick host3.domain.com:/gluster_bricks /engine/engine 49152 0 Y 359982 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host3.domain.com N/A N/A Y 360003 Self-heal Daemon on host2.domain.com N/A N/A Y 360223

Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: engine Type: Replicate Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/engine/engine Brick2: host2.domain.com:/gluster_bricks/engine/engine Brick3: host3.domain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick host1.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 407952 Brick host2.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360389 Brick host3.domain.com:/gluster_bricks /vmstore/vmstore 49154 0 Y 360176 Self-heal Daemon on localhost N/A N/A Y 413227 Self-heal Daemon on host2.domain.com N/A N/A Y 360223 Self-heal Daemon on host3.domain.com N/A N/A Y 360003

Task Status of Volume vmstore ------------------------------------------------------------------------------ There are no active volume tasks

Volume Name: vmstore Type: Replicate Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on ########################################################################################################### [root@host1 ~]#

Again, further suggestions for troubleshooting are VERY much appreciated!

Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQSXFS...

1760

Age (days ago)

1800

Last active (days ago)

List overview

Download

19 comments

4 participants

participants (4)

Charles Lam
Ritesh Chikatwar
Strahil Nikolov
Sverker Abrahamsson

New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

tags

participants (4)