[ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Sat Jul 22 06:13:29 UTC 2017

On 07/21/2017 11:41 PM, yayo (j) wrote:
> Hi,
>
> Sorry for follow up again, but, checking the ovirt interface I've 
> found that ovirt report the "engine" volume as an "arbiter" 
> configuration and the "data" volume as full replicated volume. Check 
> these screenshots:

This is probably some refresh bug in the UI, Sahina might be able to 
tell you.
>
> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing
>
> But the "gluster volume info" command report that all 2 volume are 
> full replicated:
>
>
>     /Volume Name: data/
>     /Type: Replicate/
>     /Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/
>     /Status: Started/
>     /Snapshot Count: 0/
>     /Number of Bricks: 1 x 3 = 3/
>     /Transport-type: tcp/
>     /Bricks:/
>     /Brick1: gdnode01:/gluster/data/brick/
>     /Brick2: gdnode02:/gluster/data/brick/
>     /Brick3: gdnode04:/gluster/data/brick/
>     /Options Reconfigured:/
>     /nfs.disable: on/
>     /performance.readdir-ahead: on/
>     /transport.address-family: inet/
>     /storage.owner-uid: 36/
>     /performance.quick-read: off/
>     /performance.read-ahead: off/
>     /performance.io-cache: off/
>     /performance.stat-prefetch: off/
>     /performance.low-prio-threads: 32/
>     /network.remote-dio: enable/
>     /cluster.eager-lock: enable/
>     /cluster.quorum-type: auto/
>     /cluster.server-quorum-type: server/
>     /cluster.data-self-heal-algorithm: full/
>     /cluster.locking-scheme: granular/
>     /cluster.shd-max-threads: 8/
>     /cluster.shd-wait-qlength: 10000/
>     /features.shard: on/
>     /user.cifs: off/
>     /storage.owner-gid: 36/
>     /features.shard-block-size: 512MB/
>     /network.ping-timeout: 30/
>     /performance.strict-o-direct: on/
>     /cluster.granular-entry-heal: on/
>     /auth.allow: */
>     /server.allow-insecure: on/
>
>
>
>
>
>     /Volume Name: engine/
>     /Type: Replicate/
>     /Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
>     /Status: Started/
>     /Snapshot Count: 0/
>     /Number of Bricks: 1 x 3 = 3/
>     /Transport-type: tcp/
>     /Bricks:/
>     /Brick1: gdnode01:/gluster/engine/brick/
>     /Brick2: gdnode02:/gluster/engine/brick/
>     /Brick3: gdnode04:/gluster/engine/brick/
>     /Options Reconfigured:/
>     /nfs.disable: on/
>     /performance.readdir-ahead: on/
>     /transport.address-family: inet/
>     /storage.owner-uid: 36/
>     /performance.quick-read: off/
>     /performance.read-ahead: off/
>     /performance.io-cache: off/
>     /performance.stat-prefetch: off/
>     /performance.low-prio-threads: 32/
>     /network.remote-dio: off/
>     /cluster.eager-lock: enable/
>     /cluster.quorum-type: auto/
>     /cluster.server-quorum-type: server/
>     /cluster.data-self-heal-algorithm: full/
>     /cluster.locking-scheme: granular/
>     /cluster.shd-max-threads: 8/
>     /cluster.shd-wait-qlength: 10000/
>     /features.shard: on/
>     /user.cifs: off/
>     /storage.owner-gid: 36/
>     /features.shard-block-size: 512MB/
>     /network.ping-timeout: 30/
>     /performance.strict-o-direct: on/
>     /cluster.granular-entry-heal: on/
>     /auth.allow: */
>
>           server.allow-insecure: on
>
>
> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com 
> <mailto:jaganz at gmail.com>>:
>
>     2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at redhat.com
>     <mailto:ravishankar at redhat.com>>:
>
>
>         But it does  say something. All these gfids of completed heals
>         in the log below are the for the ones that you have given the
>         getfattr output of. So what is likely happening is there is an
>         intermittent connection problem between your mount and the
>         brick process, leading to pending heals again after the heal
>         gets completed, which is why the numbers are varying each
>         time. You would need to check why that is the case.
>         Hope this helps,
>         Ravi
>
>
>>
>>             /[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>             [afr-self-heal-common.c:1254:afr_log_selfheal]
>>             0-engine-replicate-0: Completed data selfheal on
>>             e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
>>             /[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>             [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>             0-engine-replicate-0: performing metadata selfheal on
>>             f05b9742-2771-484a-85fc-5b6974bcef81/
>>             /[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>             [afr-self-heal-common.c:1254:afr_log_selfheal]
>>             0-engine-replicate-0: Completed metadata selfheal on
>>             f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/
>>
>
>
>     Hi,
>
>     following your suggestion, I've checked the "peer" status and I
>     found that there is too many name for the hosts, I don't know if
>     this can be the problem or part of it:
>
>         /*gluster peer status on NODE01:*/
>         /Number of Peers: 2/
>         /
>         /
>         /Hostname: dnode02.localdomain.local/
>         /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
>         /State: Peer in Cluster (Connected)/
>         /Other names:/
>         /192.168.10.52/
>         /dnode02.localdomain.local/
>         /10.10.20.90/
>         /10.10.10.20/
>         /
>         /
>         /
>         /
>         /
>         /
>         /
>         /
>         */gluster peer status on //NODE02:/*
>         /Number of Peers: 2/
>         /
>         /
>         /Hostname: dnode01.localdomain.local/
>         /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/
>         /State: Peer in Cluster (Connected)/
>         /Other names:/
>         /gdnode01/
>         /10.10.10.10/
>         /
>         /
>         /Hostname: gdnode04/
>         /Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828/
>         /State: Peer in Cluster (Connected)/
>         /Other names:/
>         /192.168.10.54/
>         /10.10.10.40/
>         /
>         /
>         /*
>         */
>         */gluster peer status on //NODE04:/*
>         /Number of Peers: 2/
>         /
>         /
>         /Hostname: dnode02.neridom.dom/
>         /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
>         /State: Peer in Cluster (Connected)/
>         /Other names:/
>         /10.10.20.90/
>         /gdnode02/
>         /192.168.10.52/
>         /10.10.10.20/
>         /
>         /
>         /Hostname: dnode01.localdomain.local/
>         /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/
>         /State: Peer in Cluster (Connected)/
>         /Other names:/
>         /gdnode01/
>         /10.10.10.10/
>
>     /
>     /
>     /
>     /
>     All these ip are pingable and hosts resolvible across all 3 nodes
>     but, only the 10.10.10.0 network is the decidated network for
>     gluster  (rosolved using gdnode* host names) ... You think that
>     remove other entries can fix the problem? So, sorry, but, how can
>     I remove other entries?
>
I don't think having extra entries could be a problem. Did you check the 
fuse mount logs for disconnect messages that I referred to in the other 
email?
>
>
>     And, what about the selinux?
>
Not sure about this. See if there are disconnect messages in the mount 
logs first.
-Ravi
>
>
>     Thank you
>
>
>
>
>
> -- 
> Linux User: 369739 http://counter.li.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170722/d43218ed/attachment-0001.html>