Hi Krutika,

I have enabled TRACE log level for the volume data_fast,

but the issue is not much clear:
FUSE reports:

[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f1
3a68ed472 from backend
[2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied]
[2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied)


While the BRICK logs on ovirt1/gluster1 report:
2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop
[2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall
[2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases
[2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only
[2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm
[2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks
[2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control
[2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
[2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied]
[2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied]
[2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied]
[2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied]
[2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied
[2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied]
[2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied]



В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа:


Hi Krutika,

thanks for your assistance.
Let me summarize some info about the volume:

Volume Name: data_fast
Type: Replicate
Volume ID: 378804bf-2975-44d8-84c2-b541aa87f9ef
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1:/gluster_bricks/data_fast/data_fast
Brick2: gluster2:/gluster_bricks/data_fast/data_fast
Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: on
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: on
client.event-threads: 4
server.event-threads: 4
storage.owner-uid: 36
storage.owner-gid: 36
performance.strict-o-direct: on
network.ping-timeout: 30
cluster.granular-entry-heal: enable
cluster.enable-shared-storage: enable


[root@ovirt1 ~]# gluster volume get engine all | grep shard
features.shard                          on
features.shard-block-size               64MB
features.shard-lru-limit                16384
features.shard-deletion-rate            100


On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:


On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,

another clue in the logs :
[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied]
[2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)
[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend
[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied]
[2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)
[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied]
[2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)


Anyone got an idea why is it happening?
I checked user/group and selinux permissions - all OK

>Can you share the commands (and output) used to check this?
I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:

[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll
total 562145
-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f
-rw-rw----. 1 vdsm kvm    1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease
-rw-r--r--. 1 vdsm kvm        313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches 

I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
 
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied

You got permissions denied...
 
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
 
>Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed.
>Could you share the following pieces of information:
>1.  brick logs of data_fast
Attached in data_fast-brick-logs.tgz

>2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast
Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me.
I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening.
>3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? )
All shards seem to be owned by "vdsm:kvm" with 660.



-Krutika


And dd continue to read data?!

I have never seen anything like this.

It will be helpful to run this with strace:

    strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress

And share dd.strace.

Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
 
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches 
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s

As you can see , once root user reads the file -> vdsm user can also do that.

Smells like issue on gluster side.
 
>I would try this on the hypervisor to check what vdsm/qemu see:

>$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path
I'm attaching the output of the find I run, but this one should be enough:
[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print

A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label
of the entire tree will be more useful.
 
>Also, to make sure we don't have selinux issue on the hypervisor, you can change
>selinux to permissive mode:

  >  setenforce 0

This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.


>And then try again. If this was selinux issue the permission denied issue will disappear.
>If this is the case please provide the output of:

  >  ausearh -m AVC -ts today

>If the issue still exists, we eliminated selinux, and you can enable it again:

  >  setenforce 1

[root@ovirt3 ~]# ausearch -m AVC -ts today
<no matches>
[root@ovirt2 ~]# ausearch -m AVC -ts today
<no matches>
[root@ovirt1 ~]# ausearch -m AVC -ts today
<no matches>

So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
 
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.

If this is the last change, and it worked before, most likely.

Nir
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB264RA5PLQMGWNG3GD3/