[ovirt-users] [Gluster-users] Gluster services won't start any more

RASTELLI Alessandro alessandro.rastelli at skytv.it
Fri Mar 13 08:30:33 UTC 2015


Hi,
I've reconstructed the glusterd.info file and the peer files, that solved the issue.
Next time it happens, I'll collect the glusterd log that filled up the filesystem for further analysis.
Thank you

A.


-----Original Message-----
From: Krishnan Parthasarathi [mailto:kparthas at redhat.com] 
Sent: venerdì 13 marzo 2015 07:49
To: RASTELLI Alessandro
Cc: users at ovirt.org; gluster-users; BREGA Martino; Sahina Bose
Subject: Re: [Gluster-users] [ovirt-users] Gluster services won't start any more

> > [glusterd-store.c:2063:glusterd_restore_op_version] 0-management: 
> > Detected new install. Setting op-version to maximum : 30600

The above message indicates that /var/lib/glusterd/glusterd.info file, carrying the identify (UUID) of the node and the operating version of the glusterd binary, was empty.
This _shouldn't_ happen. We need to check for messages in glusterd log around the time /var/ filesystem was full to understand why this happened.

> > [2015-03-12 09:08:15.166709] E [xlator.c:425:xlator_init] 0-management:
> > Initialization of volume 'management' failed, review your volfile 
> > again
> > [2015-03-12 09:08:15.166729] E [graph.c:322:glusterfs_graph_init]
> > 0-management: initializing translator failed
> > [2015-03-12 09:08:15.166737] E 
> > [graph.c:525:glusterfs_graph_activate]
> > 0-graph: init failed

As part of the 'init' process, glusterd resolves identities of daemons that need to be spawned as part of hosting volumes. The resolution would fail if the identity of this node changes between a stop and start of glusterd service.
Glusterd wouldn't start until the point this inconsistency is resolved.

> > [2015-03-12 09:08:15.166987] W [glusterfsd.c:1194:cleanup_and_exit] 
> > (-->
> > 0-: received signum (0), shutting down
> >
> > Can you please help?

To get out of this situation, we need to reconstruct the configuration files that are 'out of date' with respect to the cluster. This could be tedious but possible if other nodes didn't have their /var filesystem getting filled.
Each glusterd maintains its copy of volume and peer configuration under /var/lib/glusterd.

* /var/lib/glusterd/peers - Holds one file for every peer, excluding 'self'.
  This implies that with the help of remaining nodes in the cluster, we can determine this
  node's identity. This means we can reconstruct /var/lib/glusterd/glusterd.info on this
  node.

For other files under /var/lib/glusterd that are empty, we could use the fact that each node has a copy of the configuration and it can be used to reconstruct.

Hope that helps,
kp


More information about the Users mailing list