Hi,
I've reconstructed the glusterd.info file and the peer files, that solved the issue.
Next time it happens, I'll collect the glusterd log that filled up the filesystem for
further analysis.
Thank you
A.
-----Original Message-----
From: Krishnan Parthasarathi [mailto:kparthas@redhat.com]
Sent: venerdì 13 marzo 2015 07:49
To: RASTELLI Alessandro
Cc: users(a)ovirt.org; gluster-users; BREGA Martino; Sahina Bose
Subject: Re: [Gluster-users] [ovirt-users] Gluster services won't start any more
> [glusterd-store.c:2063:glusterd_restore_op_version]
0-management:
> Detected new install. Setting op-version to maximum : 30600
The above message indicates that /var/lib/glusterd/glusterd.info file, carrying the
identify (UUID) of the node and the operating version of the glusterd binary, was empty.
This _shouldn't_ happen. We need to check for messages in glusterd log around the time
/var/ filesystem was full to understand why this happened.
> [2015-03-12 09:08:15.166709] E [xlator.c:425:xlator_init]
0-management:
> Initialization of volume 'management' failed, review your volfile
> again
> [2015-03-12 09:08:15.166729] E [graph.c:322:glusterfs_graph_init]
> 0-management: initializing translator failed
> [2015-03-12 09:08:15.166737] E
> [graph.c:525:glusterfs_graph_activate]
> 0-graph: init failed
As part of the 'init' process, glusterd resolves identities of daemons that need
to be spawned as part of hosting volumes. The resolution would fail if the identity of
this node changes between a stop and start of glusterd service.
Glusterd wouldn't start until the point this inconsistency is resolved.
> [2015-03-12 09:08:15.166987] W
[glusterfsd.c:1194:cleanup_and_exit]
> (-->
> 0-: received signum (0), shutting down
>
> Can you please help?
To get out of this situation, we need to reconstruct the configuration files that are
'out of date' with respect to the cluster. This could be tedious but possible if
other nodes didn't have their /var filesystem getting filled.
Each glusterd maintains its copy of volume and peer configuration under
/var/lib/glusterd.
* /var/lib/glusterd/peers - Holds one file for every peer, excluding 'self'.
This implies that with the help of remaining nodes in the cluster, we can determine
this
node's identity. This means we can reconstruct /var/lib/glusterd/glusterd.info on
this
node.
For other files under /var/lib/glusterd that are empty, we could use the fact that each
node has a copy of the configuration and it can be used to reconstruct.
Hope that helps,
kp