Hi Nir,All,
it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of
oVirt running on v7 gluster).It seems strange that both my serious issues with oVirt are
related to gluster issue (1st gluster v3 to v5 migration and now this one).
I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes.Now both
Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a
try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything.
@Krutika,
I guess you will ask for the logs, so let's switch to gluster-users about this one ?
Best Regards,Strahil Nikolov
В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov
<hunter86_bg(a)yahoo.com> написа:
Hi Krutika,
I have enabled TRACE log level for the volume data_fast,
but the issue is not much clear:FUSE reports:
[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards]
0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f13a68ed472 from
backend[2019-11-25 21:32:43.564694] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation
failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565653] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565689] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565770] E
[MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission
denied][2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk]
0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae
fd=0x7fbf40005ea8 (Permission denied)
While the BRICK logs on ovirt1/gluster1 report:2019-11-25 21:32:43.564177] D [MSGID: 0]
[io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority
fop[2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume]
0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to
data_fast-upcall[2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup]
0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to
data_fast-leases[2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup]
0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to
data_fast-read-only[2019-11-25 21:32:43.564222] T [MSGID: 0]
[defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding
from data_fast-read-only to data_fast-worm[2019-11-25 21:32:43.564230] T [MSGID: 0]
[defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding
from data_fast-worm to data_fast-locks[2019-11-25 21:32:43.564241] T [MSGID: 0]
[posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from
data_fast-locks to data_fast-access-control[2019-11-25 21:32:43.564254] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client:
CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission
denied][2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup]
0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error:
Permission denied [Permission denied][2019-11-25 21:32:43.564279] D [MSGID: 0]
[posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks
returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564289] D
[MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8,
data_fast-upcall returned -1 error: Permission denied [Permission denied][2019-11-25
21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace:
stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied
[Permission denied][2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk]
0-data_fast-marker: lookup failed with Permission denied[2019-11-25 21:32:43.564320] D
[MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8,
data_fast-marker returned -1 error: Permission denied [Permission denied][2019-11-25
21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace:
stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied
[Permission denied]
В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov
<hunter86_bg(a)yahoo.com> написа:
Hi Krutika,
thanks for your assistance.Let me summarize some info about the volume:
Volume Name: data_fastType: ReplicateVolume ID:
378804bf-2975-44d8-84c2-b541aa87f9efStatus: StartedSnapshot Count: 0Number of Bricks: 1 x
(2 + 1) = 3Transport-type: tcpBricks:Brick1:
gluster1:/gluster_bricks/data_fast/data_fastBrick2:
gluster2:/gluster_bricks/data_fast/data_fastBrick3:
ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)Options
Reconfigured:performance.client-io-threads: onnfs.disable: ontransport.address-family:
inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache:
offperformance.low-prio-threads: 32network.remote-dio: oncluster.eager-lock:
enablecluster.quorum-type: autocluster.server-quorum-type:
servercluster.data-self-heal-algorithm: fullcluster.locking-scheme:
granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard:
onuser.cifs: offcluster.choose-local: onclient.event-threads: 4server.event-threads:
4storage.owner-uid: 36storage.owner-gid: 36performance.strict-o-direct:
onnetwork.ping-timeout: 30cluster.granular-entry-heal:
enablecluster.enable-shared-storage: enable
[root@ovirt1 ~]# gluster volume get engine all | grep shardfeatures.shard
onfeatures.shard-block-size 64MBfeatures.shard-lru-limit
16384features.shard-deletion-rate 100
On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose(a)redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
Hi All,
another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation
failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E
[MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission
denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk]
0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae
fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022]
[shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E
[MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission
denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk]
0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c
fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation
failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W
[MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
(00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E
[MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission
denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk]
0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4
fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening?I checked user/group and selinux permissions - all
OK
Can you share the commands (and output) used to check this?I first
thought that the file is cached in memory and that's why vdsm user can read the file ,
but the following shows opposite:
[root@ovirt1
94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov
12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11
14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11
14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1
94f763e9-fd96-4bee-a6b2-31af841a918b]#
pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1
94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct
I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error
reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
>Seems like it could read upto ~67MB successfully before it encountered
'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like
all the shards under /.shard could not be accessed.>Could you share the following
pieces of information:>1. brick logs of data_fastAttached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the
bricks of data_fast
Not sure if I understood you correctly, so I ran "ls -lad
/gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see -
just correct me.I have run multiple "find" commands with "-exec chown
vdsm:kvm {} \;" , just to be sure that this is not happening.>3. and ls -la of all
shards under .shard of data_fast (perhaps a handful of them have root permission assigned
somehow which is causing access to be denied? Perhaps while resolving pending heals
manually? )
All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika
And dd continue to read data?!
I have never seen anything like this.
It will be helpful to run this with strace:
strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M
status=progress
And share dd.strace.
Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd
if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes
(5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes
(5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]#
sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M
status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records
in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1
94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1
94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes
(5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes
(5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm
attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]#
find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux labelof the
entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor,
you can change>selinux to permissive mode:
> setenforce 0
This is the first thing I did and the systems were still in permissive when I tried
again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied
issue will disappear.>If this is the case please provide the output of:
>
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable
it again:
> setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m
AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no
matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster
side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I
several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely.
Nir_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NE...