On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar@redhat.com> wrote:

On 07/21/2017 11:41 PM, yayo (j) wrote:

Hi,

Sorry for follow up again, but, checking the ovirt interface I've found that ovirt report the "engine" volume as an "arbiter" configuration and the "data" volume as full replicated volume. Check these screenshots:

This is probably some refresh bug in the UI, Sahina might be able to tell you.

https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are full replicated:

Volume Name: data

Type: Replicate

Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: gdnode01:/gluster/data/brick

Brick2: gdnode02:/gluster/data/brick

Brick3: gdnode04:/gluster/data/brick

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

storage.owner-uid: 36

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

performance.low-prio-threads: 32

network.remote-dio: enable

cluster.eager-lock: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

cluster.data-self-heal-algorithm: full

cluster.locking-scheme: granular

cluster.shd-max-threads: 8

cluster.shd-wait-qlength: 10000

features.shard: on

user.cifs: off

storage.owner-gid: 36

features.shard-block-size: 512MB

network.ping-timeout: 30

performance.strict-o-direct: on

cluster.granular-entry-heal: on

auth.allow: *

server.allow-insecure: on

Volume Name: engine

Type: Replicate

Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: gdnode01:/gluster/engine/brick

Brick2: gdnode02:/gluster/engine/brick

Brick3: gdnode04:/gluster/engine/brick

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

storage.owner-uid: 36

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

performance.low-prio-threads: 32

network.remote-dio: off

cluster.eager-lock: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

cluster.data-self-heal-algorithm: full

cluster.locking-scheme: granular

cluster.shd-max-threads: 8

cluster.shd-wait-qlength: 10000

features.shard: on

user.cifs: off

storage.owner-gid: 36

features.shard-block-size: 512MB

network.ping-timeout: 30

performance.strict-o-direct: on

cluster.granular-entry-heal: on

auth.allow: *

server.allow-insecure: on

2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz@gmail.com>:

2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar@redhat.com>:

But it does say something. All these gfids of completed heals in the log below are the for the ones that you have given the getfattr output of. So what is likely happening is there is an intermittent connection problem between your mount and the brick process, leading to pending heals again after the heal gets completed, which is why the numbers are varying each time. You would need to check why that is the case.
Hope this helps,
Ravi

[2017-07-20 09:58:46.573079] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2

[2017-07-20 09:59:22.995003] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81

[2017-07-20 09:59:22.999372] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1 sinks=2

Hi,

following your suggestion, I've checked the "peer" status and I found that there is too many name for the hosts, I don't know if this can be the problem or part of it:

gluster peer status on NODE01:

Number of Peers: 2

Hostname: dnode02.localdomain.local

Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd

State: Peer in Cluster (Connected)

Other names:

192.168.10.52

dnode02.localdomain.local

10.10.20.90

10.10.10.20

gluster peer status on NODE02:

Number of Peers: 2

Hostname: dnode01.localdomain.local

Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12

State: Peer in Cluster (Connected)

Other names:

gdnode01

10.10.10.10

Hostname: gdnode04

Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828

State: Peer in Cluster (Connected)

Other names:

192.168.10.54

10.10.10.40

gluster peer status on NODE04:

Number of Peers: 2

Hostname: dnode02.neridom.dom

Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd

State: Peer in Cluster (Connected)

Other names:

10.10.20.90

gdnode02

192.168.10.52

10.10.10.20

Hostname: dnode01.localdomain.local

Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12

State: Peer in Cluster (Connected)

Other names:

gdnode01

10.10.10.10

All these ip are pingable and hosts resolvible across all 3 nodes but, only the 10.10.10.0 network is the decidated network for gluster (rosolved using gdnode* host names) ... You think that remove other entries can fix the problem? So, sorry, but, how can I remove other entries?

I don't think having extra entries could be a problem. Did you check the fuse mount logs for disconnect messages that I referred to in the other email?

And, what about the selinux?

Not sure about this. See if there are disconnect messages in the mount logs first.
-Ravi

Thank you

--

Linux User: 369739 http://counter.li.org

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users