Hi Jiffin,

I'm observing the same behaviour like last time .
This time after upgrading from gluster v 7.0 to 7.2 as follows:


[2020-01-30 23:42:26.039822] T [MSGID: 0] [posix.c:2994:pl_lookup] 0-stack-trace: stack-address: 0x7f7b50003958, winding from data_fast-locks to data_f
ast-access-control 
[2020-01-30 23:42:26.039837] I [MSGID: 139001] [posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:868c1fba-1f38-4
e91-a664-6e49fddb3f27-GRAPH_ID:0-PID:9525-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-1-RECON_NO:-1, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, r
eq(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
[2020-01-30 23:42:26.039857] D [MSGID: 0] [posix-acl.c:1055:posix_acl_lookup] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-access-control re
turned -1 error: Permission denied [Permission denied]
[2020-01-30 23:42:26.039868] D [MSGID: 0] [posix.c:2985:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-locks returned -1 error:
 Permission denied [Permission denied]
[2020-01-30 23:42:26.039879] D [MSGID: 0] [upcall.c:756:up_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-upcall returned -1 error
: Permission denied [Permission denied]
[2020-01-30 23:42:26.039889] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-io-threads retur
ned -1 error: Permission denied [Permission denied]
[2020-01-30 23:42:26.039901] T [marker.c:2907:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied
[2020-01-30 23:42:26.039908] D [MSGID: 0] [marker.c:2944:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-marker returned -1 
error: Permission denied [Permission denied]
[2020-01-30 23:42:26.039922] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-index returned -1 err
or: Permission denied [Permission denied]
[2020-01-30 23:42:26.039933] D [MSGID: 0] [io-stats.c:2228:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, /gluster_bricks/data_fast
/data_fast returned -1 error: Permission denied [Permission denied]
[2020-01-30 23:42:26.039949] E [MSGID: 115050] [server-rpc-fops_v2.c:157:server4_lookup_cbk] 0-data_fast-server: 16268: LOOKUP /.shard/f476698a-d8d2-4a
b2-b9c4-4c276c2eef43.79 (be318638-e8a0-4c6d-977d-7a937aa84806/f476698a-d8d2-4ab2-b9c4-4c276c2eef43.79), client: CTX_ID:868c1fba-1f38-4e91-a664-6e49fddb
3f27-GRAPH_ID:0-PID:9525-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-1-RECON_NO:-1, error-xlator: data_fast-access-control [Permission denied]
[2020-01-30 23:42:26.039964] T [rpcsvc.c:1529:rpcsvc_submit_generic] 0-rpc-service: Tx message: 344
[2020-01-30 23:42:26.039971] T [rpcsvc.c:1065:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 368, payload: 344, rpc hdr: 24
[2020-01-30 23:42:26.040007] T [rpcsvc.c:1581:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x24c1, Program: GlusterFS 4.
x v1, ProgVers: 400, Proc: 27) to rpc-transport (tcp.data_fast-server)


Are you sure it's fixed?
If yes, then this looks like a regresion.

By the way, I'm not using ACL - how can I permanently disable it ,so even if ACL gets corrupted -> my oVirt will survive ?

Thanks in advance for your answer.

Best Regards,
Strahil Nikolov



В сряда, 4 декември 2019 г., 13:28:39 ч. Гринуич+2, Jiffin Thottan <jthottan@redhat.com> написа:


Hi Krutika,

Apparently, in context acl info got corrupted see brick logs

[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:107,gid:107,perm:1,ngrps:4),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]

which resulted in the situation. There was one bug similar was reported https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and

it got fixed in 6.6 release IMO https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw the issue when

he upgraded from 6.5 to 6.6

One way to workaround is to perform a dummy setfacl(preferably using root) on the corrupted files which will forcefully fetch the acl

info again from backend and update the context. Another approach to restart brick process(kill and vol start force)

Regards,
Jiffin

----- Original Message -----
From: "Krutika Dhananjay" <kdhananj@redhat.com>
To: "Strahil Nikolov" <hunter86_bg@yahoo.com>, "Jiffin Thottan" <jthottan@redhat.com>, "raghavendra talur" <rtalur@redhat.com>
Cc: "Nir Soffer" <nsoffer@redhat.com>, "Rafi Kavungal Chundattu Parambil" <rkavunga@redhat.com>, "users" <users@ovirt.org>, "gluster-user" <gluster-users@gluster.org>
Sent: Monday, December 2, 2019 11:48:22 AM
Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

Sorry about the late response.

I looked at the logs. These errors are originating from posix-acl
translator -



*[2019-11-17 07:55:47.090065] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090174] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090299] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]*

Jiffin/Raghavendra Talur,
Can you help?

-Krutika

On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg@yahoo.com>
wrote:

> Hi Nir,All,
>
> it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> screenshot of oVirt running on v7 gluster).
> It seems strange that both my serious issues with oVirt are related to
> gluster issue (1st gluster v3  to v5 migration and now this one).
>
> I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> nodes.
> Now both Engine and all my VMs are back online - so if you hit issues with
> 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> deciding to wipe everything.
>
> @Krutika,
>
> I guess you will ask for the logs, so let's switch to gluster-users about
> this one ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> hunter86_bg@yahoo.com> написа:
>
>
> Hi Krutika,
>
> I have enabled TRACE log level for the volume data_fast,
>
> but the issue is not much clear:
> FUSE reports:
>
> [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=6d9ed2e5-d4f2-4749-839b-2f1
> 3a68ed472 from backend
> [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-25 21:32:43.565653] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-25 21:32:43.565689] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-25 21:32:43.565770] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
> [Permission denied]
> [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae
> fd=0x7fbf40005ea8 (Permission denied)
>
>
> While the BRICK logs on ovirt1/gluster1 report:
> 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule]
> 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop
> [2019-11-25 21:32:43.564194] T [MSGID: 0]
> [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall
> [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall
> to data_fast-leases
> [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases
> to data_fast-read-only
> [2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from
> data_fast-read-only to data_fast-worm
> [2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm
> to data_fast-locks
> [2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks
> to data_fast-access-control
> [2019-11-25 21:32:43.564254] I [MSGID: 139001]
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
> client:
> CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
> req(uid:107,gid:107,perm:1,ngrps:4),
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
> [Permission denied]
> [2019-11-25 21:32:43.564268] D [MSGID: 0]
> [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission
> denied [Permission denied]
> [2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1
> error: Permission denied [Permission denied]
> [2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1
> error: Permission denied [Permission denied]
> [2019-11-25 21:32:43.564302] D [MSGID: 0]
> [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied
> [Permission denied]
> [2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk]
> 0-data_fast-marker: lookup failed with Permission denied
> [2019-11-25 21:32:43.564320] D [MSGID: 0]
> [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied
> [Permission denied]
> [2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1
> error: Permission denied [Permission denied]
>
>
>
> В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov <
> hunter86_bg@yahoo.com> написа:
>
>
> Hi Krutika,
>
> thanks for your assistance.
> Let me summarize some info about the volume:
>
> Volume Name: data_fast
> Type: Replicate
> Volume ID: 378804bf-2975-44d8-84c2-b541aa87f9ef
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster1:/gluster_bricks/data_fast/data_fast
> Brick2: gluster2:/gluster_bricks/data_fast/data_fast
> Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: on
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> cluster.choose-local: on
> client.event-threads: 4
> server.event-threads: 4
> storage.owner-uid: 36
> storage.owner-gid: 36
> performance.strict-o-direct: on
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
> cluster.enable-shared-storage: enable
>
>
> [root@ovirt1 ~]# gluster volume get engine all | grep shard
> features.shard                          on
> features.shard-block-size              64MB
> features.shard-lru-limit                16384
> features.shard-deletion-rate            100
>
>
> On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
>
> On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com>
> wrote:
>
> On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
>
>
>
> On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com>
> wrote:
>
> Hi All,
>
> another clue in the logs :
> [2019-11-21 00:29:50.536631] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:29:50.536798] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:29:50.536959] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:29:50.537007] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
> [Permission denied]
> [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12458: READ => -1
> gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission
> denied)
> [2019-11-21 00:30:01.177665] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend
> [2019-11-21 00:30:13.132756] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:13.132824] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:13.133217] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:13.133238] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c
> [Permission denied]
> [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12660: READ => -1
> gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission
> denied)
> [2019-11-21 00:30:38.489449] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:38.489520] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:38.489669] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (00000000-0000-0000-0000-000000000000) [Permission denied]
> [2019-11-21 00:30:38.489717] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4
> [Permission denied]
> [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12928: READ => -1
> gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission
> denied)
>
>
> Anyone got an idea why is it happening?
> I checked user/group and selinux permissions - all OK
>
>
> >Can you share the commands (and output) used to check this?
> I first thought that the file is cached in memory and that's why vdsm user
> can read the file , but the following shows opposite:
>
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll
> total 562145
> -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29
> 5b1d3113-5cca-4582-9029-634b16338a2f
> -rw-rw----. 1 vdsm kvm    1048576 Nov 11 14:11
> 5b1d3113-5cca-4582-9029-634b16338a2f.lease
> -rw-r--r--. 1 vdsm kvm        313 Nov 11 14:11
> 5b1d3113-5cca-4582-9029-634b16338a2f.meta
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
>
> /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 >
> /proc/sys/vm/drop_caches
>
>
> I would use iflag=direct instead, no need to mess with caches. Vdsm always
> use direct I/O.
>
>
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
> dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
>
>
> You got permissions denied...
>
>
> 16+0 records in
> 16+0 records out
> 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
>
>
> >Seems like it could read upto ~67MB successfully before it encountered
> 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks
> like all the shards under /.shard could not be accessed.
> >Could you share the following pieces of information:
> >1.  brick logs of data_fast
> Attached in data_fast-brick-logs.tgz
>
> >2. ls -la of .shard relative to the bricks (NOT the mount) on all the
> bricks of data_fast
> Not sure if I understood you correctly, so I ran "ls -lad
> /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to
> see - just correct me.
> I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" ,
> just to be sure that this is not happening.
> >3. and ls -la of all shards under .shard of data_fast (perhaps a handful
> of them have root permission assigned somehow which is causing access to be
> denied? Perhaps while resolving pending heals manually? )
> All shards seem to be owned by "vdsm:kvm" with 660.
>
>
>
> -Krutika
>
>
> And dd continue to read data?!
>
> I have never seen anything like this.
>
> It will be helpful to run this with strace:
>
>    strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct
> bs=8M status=progress
>
> And share dd.strace.
>
> Logs in /var/log/glusterfs/exportname.log will contain useful info for
> this test.
>
>
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd
> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
> 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s
> 1280+0 records in
> 1280+0 records out
> 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
> 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s
> 1280+0 records in
> 1280+0 records out
> 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 >
> /proc/sys/vm/drop_caches
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
> 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s
> 1280+0 records in
> 1280+0 records out
> 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
>
> As you can see , once root user reads the file -> vdsm user can also do
> that.
>
>
> Smells like issue on gluster side.
>
>
> >I would try this on the hypervisor to check what vdsm/qemu see:
>
> >$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path
> I'm attaching the output of the find I run, but this one should be enough:
> [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/
> -not -user vdsm -print
>
>
> A full output of ls -lahRZ, showing user, group, permissions bits, and
> selinux label
> of the entire tree will be more useful.
>
>
> >Also, to make sure we don't have selinux issue on the hypervisor, you can
> change
> >selinux to permissive mode:
>
>  >  setenforce 0
>
> This is the first thing I did and the systems were still in permissive
> when I tried again.I'm 99.99% sure it's not selinux.
>
>
> >And then try again. If this was selinux issue the permission denied issue
> will disappear.
> >If this is the case please provide the output of:
>
>  >  ausearh -m AVC -ts today
>
> >If the issue still exists, we eliminated selinux, and you can enable it
> again:
>
>  >  setenforce 1
>
> [root@ovirt3 ~]# ausearch -m AVC -ts today
> <no matches>
> [root@ovirt2 ~]# ausearch -m AVC -ts today
> <no matches>
> [root@ovirt1 ~]# ausearch -m AVC -ts today
> <no matches>
>
>
> So this is not selinux on the hypervisor. I wonder if it can be selinux on
> the gluster side?
>
>
> I have a vague feeling that the issue is related to gluster v6.5 to 6.6
> upgrade which I several days before... So if any logs are needed (or debug
> enabled), just mention.
>
>
> If this is the last change, and it worked before, most likely.
>
> Nir
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB264RA5PLQMGWNG3GD3/
>
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJLBNDFKU27J24MPINAND5MJDDG6SG2A/