Upgrading some nodes today, and noticed that vdsmd restarts glusterd on a node when it
activates it. This is causing a short break in healing when the shd gets disconnected,
forcing some extra healing when the healing process reports “Transport Endpoint
Disconnected” (N/A in the ovirt gui).
This is on a converged cluster (3 nodes, gluster replica volume across all 3, ovirt-engine
running elsewhere). Centos 7 install, just upgraded to Ovirt 4.1.2, running cluster 3.10
from the Centos SIG.
The process I’m observing:
Place a node into maintenance via GUI
Update node from command line
Reboot node (kernel update)
Watch gluster heal itself after reboot
Activate node in GUI
gluster is completely stopped on this node
gluster is started on this node
healing begins again, but isn’t working
“gluster vol heal XXXX info” reports this node’s information not available because
“Transport endpoint not connected”.
This clears up in 5-10 minutes, then volume heals normally
Someone with a similar setup want to check this and see if it’s something specific to my
nodes, or just a general problem with the way it’s restarting gluster? Looking for a
little confirmation before I file a bug report on it.
Or a dev want to comment on why it stops and starts gluster, instead of a restart which
would presumably leave the brick processes and shd running and not causing this effect?
Thanks,
-Darrell