[ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Thu Jul 20 09:34:00 UTC 2017

On 07/20/2017 02:20 PM, yayo (j) wrote:
> Hi,
>
> Thank you for the answer and sorry for delay:
>
> 2017-07-19 16:55 GMT+02:00 Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>>:
>
>     1. What does the glustershd.log say on all 3 nodes when you run
>     the command? Does it complain anything about these files?
>
>
> No, glustershd.log is clean, no extra log after command on all 3 nodes

Could you check if the self-heal daemon on all nodes is connected to the 
3 bricks? You will need to check the glustershd.log for that.
If it is not connected, try restarting the shd using `gluster volume 
start engine force`, then launch the heal command like you did earlier 
and see if heals happen.

If it doesn't, please provide the getfattr outputs of the 12 files from 
all 3 nodes using `getfattr -d -m . -e hex 
//gluster/engine/brick//path-to-file` ?

Thanks,
Ravi

>     2. Are these 12 files also present in the 3rd data brick?
>
>
> I've checked right now: all files exists in all 3 nodes
>
>     3. Can you provide the output of `gluster volume info` for the
>     this volume?
>
>
>
>     /Volume Name: engine/
>     /Type: Replicate/
>     /Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
>     /Status: Started/
>     /Snapshot Count: 0/
>     /Number of Bricks: 1 x 3 = 3/
>     /Transport-type: tcp/
>     /Bricks:/
>     /Brick1: node01:/gluster/engine/brick/
>     /Brick2: node02:/gluster/engine/brick/
>     /Brick3: node04:/gluster/engine/brick/
>     /Options Reconfigured:/
>     /nfs.disable: on/
>     /performance.readdir-ahead: on/
>     /transport.address-family: inet/
>     /storage.owner-uid: 36/
>     /performance.quick-read: off/
>     /performance.read-ahead: off/
>     /performance.io-cache: off/
>     /performance.stat-prefetch: off/
>     /performance.low-prio-threads: 32/
>     /network.remote-dio: off/
>     /cluster.eager-lock: enable/
>     /cluster.quorum-type: auto/
>     /cluster.server-quorum-type: server/
>     /cluster.data-self-heal-algorithm: full/
>     /cluster.locking-scheme: granular/
>     /cluster.shd-max-threads: 8/
>     /cluster.shd-wait-qlength: 10000/
>     /features.shard: on/
>     /user.cifs: off/
>     /storage.owner-gid: 36/
>     /features.shard-block-size: 512MB/
>     /network.ping-timeout: 30/
>     /performance.strict-o-direct: on/
>     /cluster.granular-entry-heal: on/
>     /auth.allow: */
>
>           server.allow-insecure: on
>
>
>
>
>>         Some extra info:
>>
>>         We have recently changed the gluster from: 2 (full
>>         repliacated) + 1 arbiter to 3 full replicated cluster
>>
>
>     Just curious, how did you do this? `remove-brick` of arbiter
>     brick  followed by an `add-brick` to increase to replica-3?
>
>
> Yes
>
>
> #gluster volume remove-brick engine replica 2 
> node03:/gluster/data/brick force *(OK!)*
>
> #gluster volume heal engine info *(no entries!)*
>
> #gluster volume add-brick engine replica 3 
> node04:/gluster/engine/brick *(OK!)*
>
> *After some minutes*
>
> [root at node01 ~]#  gluster volume heal engine info
> Brick node01:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
> Brick node02:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
> Brick node04:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
>     Thanks,
>     Ravi
>
>
> Another extra info (I don't know if this can be the problem): Five 
> days ago A black out has suddenly shut down the networks switch (also 
> gluster network) of node 03 and 04 ... But I don't know this problem 
> is in place after this black out
>
> Thank you!
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170720/2be61256/attachment-0001.html>