Issues with Gluster Domain

Hello, I recently added 6 hosts to an existing oVirt compute/gluster cluster. Prior to this attempted addition, my cluster had 3 Hypervisor hosts and 3 gluster bricks which made up a single gluster volume (replica 3 volume) . I added the additional hosts and made a brick on 3 of the new hosts and attempted to make a new replica 3 volume. I had difficulty creating the new volume. So, I decided that I would make a new compute/gluster cluster for each set of 3 new hosts. I removed the 6 new hosts from the existing oVirt Compute/Gluster Cluster leaving the 3 original hosts in place with their bricks. At that point my original bricks went down and came back up . The volume showed entries that needed healing. At that point I ran gluster volume heal images3 full, etc. The volume shows no unhealed entries. I also corrected some peer errors. However, I am unable to copy disks, move disks to another domain, export disks, etc. It appears that the engine cannot locate disks properly and I get storage I/O errors. I have detached and removed the oVirt Storage Domain. I reimported the domain and imported 2 VMs, But the VM disks exhibit the same behaviour and won't run from the hard disk. I get errors such as this VDSM ov05 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.24.18:_images3/5fe3ad3f-2d21-404c-832e-4dc7318ca10d/images/3ea5afbd-0fe0-4c09-8d39-e556c66a8b3d/fe6eab63-3b22-4815-bfe6-4a0ade292510', '-O', 'raw', u'/rhev/data-center/mnt/192.168.24.13:_stor_import1/1ab89386-a2ba-448b-90ab-bc816f55a328/images/f707a218-9db7-4e23-8bbd-9b12972012b6/d6591ec5-3ede-443d-bd40-93119ca7c7d5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 135168: Transport endpoint is not connected\\nqemu-img: error while reading sector 131072: Transport endpoint is not connected\\nqemu-img: error while reading sector 139264: Transport endpoint is not connected\\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\\nqemu-img: error while reading sector 155648: Transport endpoint is not connected\\nqemu-img: error while reading sector 151552: Transport endpoint is not connected\\nqemu-img: error while reading sector 159744: Transport endpoint is not connected\\n')",) oVirt version is 4.3.82-1.el7 OS CentOS Linux release 7.7.1908 (Core) The Gluster Cluster has been working very well until this incident. Please help. Thank You Charles Williams

Log to the oVirt cluster and provide the output of: gluster pool list gluster volume list for i in $(gluster volume list); do echo $i;echo; gluster volume info $i; echo;echo;gluster volume status $i;echo;echo;echo;done ls -l /rhev/data-center/mnt/glusterSD/ Best Regards, Strahil Nikolov На 18 юни 2020 г. 19:17:46 GMT+03:00, C Williams <cwilliams3320@gmail.com> написа:
Hello,
I recently added 6 hosts to an existing oVirt compute/gluster cluster.
Prior to this attempted addition, my cluster had 3 Hypervisor hosts and 3 gluster bricks which made up a single gluster volume (replica 3 volume) . I added the additional hosts and made a brick on 3 of the new hosts and attempted to make a new replica 3 volume. I had difficulty creating the new volume. So, I decided that I would make a new compute/gluster cluster for each set of 3 new hosts.
I removed the 6 new hosts from the existing oVirt Compute/Gluster Cluster leaving the 3 original hosts in place with their bricks. At that point my original bricks went down and came back up . The volume showed entries that needed healing. At that point I ran gluster volume heal images3 full, etc. The volume shows no unhealed entries. I also corrected some peer errors.
However, I am unable to copy disks, move disks to another domain, export disks, etc. It appears that the engine cannot locate disks properly and I get storage I/O errors.
I have detached and removed the oVirt Storage Domain. I reimported the domain and imported 2 VMs, But the VM disks exhibit the same behaviour and won't run from the hard disk.
I get errors such as this
VDSM ov05 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.24.18:_images3/5fe3ad3f-2d21-404c-832e-4dc7318ca10d/images/3ea5afbd-0fe0-4c09-8d39-e556c66a8b3d/fe6eab63-3b22-4815-bfe6-4a0ade292510', '-O', 'raw', u'/rhev/data-center/mnt/192.168.24.13:_stor_import1/1ab89386-a2ba-448b-90ab-bc816f55a328/images/f707a218-9db7-4e23-8bbd-9b12972012b6/d6591ec5-3ede-443d-bd40-93119ca7c7d5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 135168: Transport endpoint is not connected\\nqemu-img: error while reading sector 131072: Transport endpoint is not connected\\nqemu-img: error while reading sector 139264: Transport endpoint is not connected\\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\\nqemu-img: error while reading sector 155648: Transport endpoint is not connected\\nqemu-img: error while reading sector 151552: Transport endpoint is not connected\\nqemu-img: error while reading sector 159744: Transport endpoint is not connected\\n')",)
oVirt version is 4.3.82-1.el7 OS CentOS Linux release 7.7.1908 (Core)
The Gluster Cluster has been working very well until this incident.
Please help.
Thank You
Charles Williams
participants (2)
-
C Williams
-
Strahil Nikolov