Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

Hi Jiffin, Thanks for the info. As I'm now on Gluster v7 , I hope it won't happen again. It's nice to know it got fixed. Best Regards, Strahil NikolovOn Dec 2, 2019 11:32, Jiffin Thottan <jthottan@redhat.com> wrote:
Hi Krutika,
Apparently, in context acl info got corrupted see brick logs
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
which resulted in the situation. There was one bug similar was reported https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and
it got fixed in 6.6 release IMO https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw the issue when
he upgraded from 6.5 to 6.6
One way to workaround is to perform a dummy setfacl(preferably using root) on the corrupted files which will forcefully fetch the acl
info again from backend and update the context. Another approach to restart brick process(kill and vol start force)
Regards, Jiffin
----- Original Message ----- From: "Krutika Dhananjay" <kdhananj@redhat.com> To: "Strahil Nikolov" <hunter86_bg@yahoo.com>, "Jiffin Thottan" <jthottan@redhat.com>, "raghavendra talur" <rtalur@redhat.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "Rafi Kavungal Chundattu Parambil" <rkavunga@redhat.com>, "users" <users@ovirt.org>, "gluster-user" <gluster-users@gluster.org> Sent: Monday, December 2, 2019 11:48:22 AM Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
Sorry about the late response.
I looked at the logs. These errors are originating from posix-acl translator -
*[2019-11-17 07:55:47.090065] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090174] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090299] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]*
Jiffin/Raghavendra Talur, Can you help?
-Krutika
On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Nir,All,
it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of oVirt running on v7 gluster). It seems strange that both my serious issues with oVirt are related to gluster issue (1st gluster v3 to v5 migration and now this one).
I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes. Now both Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything.
@Krutika,
I guess you will ask for the logs, so let's switch to gluster-users about this one ?
Best Regards, Strahil Nikolov
В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
I have enabled TRACE log level for the volume data_fast,
but the issue is not much clear: FUSE reports:
[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f1 3a68ed472 from backend [2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path:
participants (1)
-
Strahil