[ovirt-users] vdsm (4.1) restarts glusterd when activating a node, even if it's already running

2 Jul 2017

      Upgrading some nodes today, and noticed that vdsmd restarts glusterd on a node when it activates it. This is causing a short break in healing when the shd gets disconnected, forcing some extra healing when the healing process reports “Transport Endpoint Disconnected” (N/A in the ovirt gui).

This is on a converged cluster (3 nodes, gluster replica volume across all 3, ovirt-engine running elsewhere). Centos 7 install, just upgraded to Ovirt 4.1.2, running cluster 3.10 from the Centos SIG.

The process I’m observing:

Place a node into maintenance via GUI
Update node from command line
Reboot node (kernel update)
Watch gluster heal itself after reboot
Activate node in GUI
gluster is completely stopped on this node
gluster is started on this node
healing begins again, but isn’t working
“gluster vol heal XXXX info” reports this node’s information not available because “Transport endpoint not connected”.
This clears up in 5-10 minutes, then volume heals normally

Someone with a similar setup want to check this and see if it’s something specific to my nodes, or just a general problem with the way it’s restarting gluster? Looking for a little confirmation before I file a bug report on it.

Or a dev want to comment on why it stops and starts gluster, instead of a restart which would presumably leave the brick processes and shd running and not causing this effect?

Thanks,

  -Darrell

[ovirt-users] vdsm (4.1) restarts glusterd when activating a node, even if it's already running

Darrell Budic