Well, I'll be -- you're absolutely right, and I'm a bit embarrassed I
didn't consider that before. The node that's not healing shows connections
from 2 FUSE clients, which I expect, and 3 glustershd, which I also expect.
[root@gluster1 ~]# gluster volume status ssd-san client-list
Client connections for volume ssd-san
Name count
----- ------
glustershd 3
fuse 2
total clients for volume ssd-san : 5
-----------------------------------------------------------------
But the secondary node, which is constantly healing, *shows that it's
missing a FUSE connection*:
[root@gluster2 ~]# gluster volume status ssd-san client-list
Client connections for volume ssd-san
Name count
----- ------
glustershd 3
fuse 1
total clients for volume ssd-san : 4
-----------------------------------------------------------------
I had to restart the glusterd service on node 2 twice before the FUSE
client reconnected and stayed connected.
Thanks a ton, I really appreciate your help!
On Mon, Mar 22, 2021 at 12:37 AM Strahil Nikolov <hunter86_bg(a)yahoo.com>
wrote:
Healing must happen only after a maintenance (for example patch +
reboot)
on any of the nodes .
Once the node is up , the FUSE client (any host) should reconnect to all
gluster bricks and write to all bricks simultaneously.
If you got constant healing, this indicates that a client is not writing
to all bricks.
Check with the following command if there is such client:
'gluster volume status *VOLNAME* clients'
Best Regards,
Strahil Nikolov
On Mon, Mar 22, 2021 at 3:24, Ben
<gravyfish(a)gmail.com> wrote:
Sorry, just saw this -- I'm not sure I understand what you mean, but in
any case, the healing process does complete when I stop all of my VMs,
which I believe indicates that something about the oVirt writes to Gluster
is causing the problem in the first place.
On Sun, Mar 14, 2021 at 8:06 AM Strahil Nikolov <hunter86_bg(a)yahoo.com>
wrote:
Are you sure that gluster volume client's count is the same on all nodes ?
Best Regards,
Strahil Nikolov
On Sat, Mar 13, 2021 at 23:58, Ben
<gravyfish(a)gmail.com> wrote:
Hi, I could use some help with a problem I'm having with the Gluster
storage servers I use in my oVirt data center. I first noticed the problem
when files would constantly heal after rebooting one of the Gluster nodes
-- in the replica 2/arbiter, the node that remained online and the arbiter
would begin healing files and never finish.
I raised the issue with the helpful folks over at Gluster:
https://github.com/gluster/glusterfs/issues/2226
The short version is this: after running a tcpdump and noticing malformed
RPC calls to Gluster from one of my oVirt nodes, they're looking for a
stack trace of whatever process is running I/O on the Gluster cluster from
oVirt in order to figure out what it's doing and if the write problems
could cause the indefinite healing I'm seeing. After checking the qemu
PIDs, it doesn't look like they are actually performing the writes -- is
there a particular part of the oVirt stack I can look at to find the write
operations to Gluster? I don't see anything else doing read/write on the VM
image files on the Gluster mount, but I could be missing something.
NB: I'm using a traditional Gluster setup with the FUSE client, not
hyperconverged.
Thanks in advance for any assistance.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/76H5BY6IH55...