
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely. Nir