[ovirt-users] [Gluster-users] Gluster services won't start any more

Krishnan Parthasarathi kparthas at redhat.com
Fri Mar 13 02:49:27 EDT 2015


> > [glusterd-store.c:2063:glusterd_restore_op_version] 0-management: Detected
> > new install. Setting op-version to maximum : 30600

The above message indicates that /var/lib/glusterd/glusterd.info file, carrying the
identify (UUID) of the node and the operating version of the glusterd binary, was empty.
This _shouldn't_ happen. We need to check for messages in glusterd log around the time
/var/ filesystem was full to understand why this happened.

> > [2015-03-12 09:08:15.166709] E [xlator.c:425:xlator_init] 0-management:
> > Initialization of volume 'management' failed, review your volfile again
> > [2015-03-12 09:08:15.166729] E [graph.c:322:glusterfs_graph_init]
> > 0-management: initializing translator failed
> > [2015-03-12 09:08:15.166737] E [graph.c:525:glusterfs_graph_activate]
> > 0-graph: init failed

As part of the 'init' process, glusterd resolves identities of daemons
that need to be spawned as part of hosting volumes. The resolution would fail
if the identity of this node changes between a stop and start of glusterd service.
Glusterd wouldn't start until the point this inconsistency is resolved.

> > [2015-03-12 09:08:15.166987] W [glusterfsd.c:1194:cleanup_and_exit] (-->
> > 0-: received signum (0), shutting down
> >
> > Can you please help?

To get out of this situation, we need to reconstruct the configuration files
that are 'out of date' with respect to the cluster. This could be tedious but
possible if other nodes didn't have their /var filesystem getting filled.
Each glusterd maintains its copy of volume and peer configuration under /var/lib/glusterd.

* /var/lib/glusterd/peers - Holds one file for every peer, excluding 'self'.
  This implies that with the help of remaining nodes in the cluster, we can determine this
  node's identity. This means we can reconstruct /var/lib/glusterd/glusterd.info on this
  node.

For other files under /var/lib/glusterd that are empty, we could use the fact that
each node has a copy of the configuration and it can be used to reconstruct.

Hope that helps,
kp


More information about the Users mailing list