Issues with Gluster Domain

18 Jun 2020

      Hello,

I recently added 6 hosts to an existing oVirt compute/gluster cluster.

Prior to this attempted addition, my cluster had 3 Hypervisor hosts and 3
gluster bricks which made up a single gluster volume (replica 3 volume) . I
added the additional hosts and made a brick on 3 of the new hosts and
attempted to make a new replica 3 volume. I had  difficulty creating the
new volume. So, I decided that I would make a new compute/gluster cluster
for each set of 3 new hosts.

I removed the 6 new hosts from the existing oVirt Compute/Gluster Cluster
leaving the 3 original hosts in place with their bricks. At that point my
original bricks went down and came back up . The volume showed entries that
needed healing. At that point I ran gluster volume heal images3 full, etc.
The volume shows no unhealed entries. I also corrected some peer errors.

However, I am unable to copy disks, move disks to another domain, export
disks, etc. It appears that the engine cannot locate disks properly and I
get storage I/O errors.

I have detached and removed the oVirt Storage Domain. I reimported the
domain and imported 2 VMs, But the VM disks exhibit the same behaviour and
won't run from the hard disk.

I get errors such as this

VDSM ov05 command HSMGetAllTasksStatusesVDS failed: low level Image copy
failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none',
'-T', 'none', '-f', 'raw',
u'/rhev/data-center/mnt/glusterSD/192.168.24.18:_images3/5fe3ad3f-2d21-404c-832e-4dc7318ca10d/images/3ea5afbd-0fe0-4c09-8d39-e556c66a8b3d/fe6eab63-3b22-4815-bfe6-4a0ade292510',
'-O', 'raw', u'/rhev/data-center/mnt/192.168.24.13:_stor_import1/1ab89386-a2ba-448b-90ab-bc816f55a328/images/f707a218-9db7-4e23-8bbd-9b12972012b6/d6591ec5-3ede-443d-bd40-93119ca7c7d5']
failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
sector 135168: Transport endpoint is not connected\\nqemu-img: error while
reading sector 131072: Transport endpoint is not connected\\nqemu-img:
error while reading sector 139264: Transport endpoint is not
connected\\nqemu-img: error while reading sector 143360: Transport endpoint
is not connected\\nqemu-img: error while reading sector 147456: Transport
endpoint is not connected\\nqemu-img: error while reading sector 155648:
Transport endpoint is not connected\\nqemu-img: error while reading sector
151552: Transport endpoint is not connected\\nqemu-img: error while reading
sector 159744: Transport endpoint is not connected\\n')",)

oVirt version  is 4.3.82-1.el7
OS CentOS Linux release 7.7.1908 (Core)

The Gluster Cluster has been working very well until this incident.

Please help.

Thank You

Charles Williams

C Williams

Strahil Nikolov

tags

participants (2)