On Sat, Mar 13, 2021 at 4:56 PM Ben <gravyfish@gmail.com> wrote:

Hi, I could use some help with a problem I'm having with the Gluster storage servers I use in my oVirt data center. I first noticed the problem when files would constantly heal after rebooting one of the Gluster nodes -- in the replica 2/arbiter, the node that remained online and the arbiter would begin healing files and never finish.

I raised the issue with the helpful folks over at Gluster: https://github.com/gluster/glusterfs/issues/2226

The short version is this: after running a tcpdump and noticing malformed RPC calls to Gluster from one of my oVirt nodes, they're looking for a stack trace of whatever process is running I/O on the Gluster cluster from oVirt in order to figure out what it's doing and if the write problems could cause the indefinite healing I'm seeing. After checking the qemu PIDs, it doesn't look like they are actually performing the writes -- is there a particular part of the oVirt stack I can look at to find the write operations to Gluster? I don't see anything else doing read/write on the VM image files on the Gluster mount, but I could be missing something.

NB: I'm using a traditional Gluster setup with the FUSE client, not hyperconverged.

Thanks in advance for any assistance.