Hi ,
This one really looks like the ACL bug I was hit with when I updated from Gluster v6.5 to
6.6 and later from 7.0 to 7.2.
Did you update your setup recently ? Did you upgrade gluster also ?
You have to check the gluster logs in order to verify that, so you can try:
1. Set Gluster logs to trace level (for details check:
)
2. Power up a VM that was already off , or retry the procedure from the logs you sent.
3. Stop the trace level of the logs
4. Check libvirt logs on the host that was supposed to power up the VM (in case a VM was
powered on)
5. Check the gluster brick logs on all nodes for ACL errors.
Here is a sample from my old logs:
gluster_bricks-data_fast4-data_fast4.log:[2020-03-18 13:19:41.489047] I [MSGID: 139001]
[posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast4-access-control: client:
CTX_ID:4a654305-d2e4-
4a10-bad9-58d670d99a97-GRAPH_ID:0-PID:32412-HOST:ovirt1.localdomain-PC_NAME:data_fast4-client-0-RECON_NO:-19,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx
(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
gluster_bricks-data_fast4-data_fast4.log:[2020-03-18 13:22:51.818796] I [MSGID: 139001]
[posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast4-access-control: client:
CTX_ID:4a654305-d2e4-
4a10-bad9-58d670d99a97-GRAPH_ID:0-PID:32412-HOST:ovirt1.localdomain-PC_NAME:data_fast4-client-0-RECON_NO:-19,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx
(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
gluster_bricks-data_fast4-data_fast4.log:[2020-03-18 13:24:43.732856] I [MSGID: 139001]
[posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast4-access-control: client:
CTX_ID:4a654305-d2e4-
4a10-bad9-58d670d99a97-GRAPH_ID:0-PID:32412-HOST:ovirt1.localdomain-PC_NAME:data_fast4-client-0-RECON_NO:-19,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx
(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
gluster_bricks-data_fast4-data_fast4.log:[2020-03-18 13:26:50.758178] I [MSGID: 139001]
[posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast4-access-control: client:
CTX_ID:4a654305-d2e4-
4a10-bad9-58d670d99a97-GRAPH_ID:0-PID:32412-HOST:ovirt1.localdomain-PC_NAME:data_fast4-client-0-RECON_NO:-19,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx
(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
In my case , the workaround was to downgrade the gluster packages on all nodes (and reboot
each node 1 by 1 ) if the major version is the same, but if you upgraded to v7.X - then
you can try the v7.0 .
Best Regards,
Strahil Nikolov
В събота, 20 юни 2020 г., 18:48:42 ч. Гринуич+3, C Williams
<cwilliams3320(a)gmail.com> написа:
Hello,
Here are additional log tiles as well as a tree of the problematic Gluster storage domain.
During this time I attempted to copy a virtual disk to another domain, move a virtual disk
to another domain and run a VM where the virtual hard disk would be used.
The copies/moves failed and the VM went into pause mode when the virtual HDD was
involved.
Please check these out.
Thank You For Your Help !
On Sat, Jun 20, 2020 at 9:54 AM C Williams <cwilliams3320(a)gmail.com> wrote:
Strahil,
I understand. Please keep me posted.
Thanks For The Help !
On Sat, Jun 20, 2020 at 4:36 AM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
> Hey C Williams,
>
> sorry for the delay, but I couldn't get somw time to check your logs. Will try
a little bit later.
>
> Best Regards,
> Strahil Nikolov
>
> На 20 юни 2020 г. 2:37:22 GMT+03:00, C Williams <cwilliams3320(a)gmail.com>
написа:
>>Hello,
>>
>>Was wanting to follow up on this issue. Users are impacted.
>>
>>Thank You
>>
>>On Fri, Jun 19, 2020 at 9:20 AM C Williams <cwilliams3320(a)gmail.com>
>>wrote:
>>
>>> Hello,
>>>
>>> Here are the logs (some IPs are changed )
>>>
>>> ov05 is the SPM
>>>
>>> Thank You For Your Help !
>>>
>>> On Thu, Jun 18, 2020 at 11:31 PM Strahil Nikolov
>><hunter86_bg(a)yahoo.com>
>>> wrote:
>>>
>>>> Check on the hosts tab , which is your current SPM (last column in
>>Admin
>>>> UI).
>>>> Then open the /var/log/vdsm/vdsm.log and repeat the operation.
>>>> Then provide the log from that host and the engine's log (on the
>>>> HostedEngine VM or on your standalone engine).
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> На 18 юни 2020 г. 23:59:36 GMT+03:00, C Williams
>><cwilliams3320(a)gmail.com>
>>>> написа:
>>>> >Resending to eliminate email issues
>>>> >
>>>> >---------- Forwarded message ---------
>>>> >From: C Williams <cwilliams3320(a)gmail.com>
>>>> >Date: Thu, Jun 18, 2020 at 4:01 PM
>>>> >Subject: Re: [ovirt-users] Fwd: Issues with Gluster Domain
>>>> >To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
>>>> >
>>>> >
>>>> >Here is output from mount
>>>> >
>>>> >192.168.24.12:/stor/import0 on
>>>> >/rhev/data-center/mnt/192.168.24.12:_stor_import0
>>>> >type nfs4
>>>>
>>>>
>>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.12)
>>>> >192.168.24.13:/stor/import1 on
>>>> >/rhev/data-center/mnt/192.168.24.13:_stor_import1
>>>> >type nfs4
>>>>
>>>>
>>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
>>>> >192.168.24.13:/stor/iso1 on
>>>> >/rhev/data-center/mnt/192.168.24.13:_stor_iso1
>>>> >type nfs4
>>>>
>>>>
>>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
>>>> >192.168.24.13:/stor/export0 on
>>>> >/rhev/data-center/mnt/192.168.24.13:_stor_export0
>>>> >type nfs4
>>>>
>>>>
>>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
>>>> >192.168.24.15:/images on
>>>> >/rhev/data-center/mnt/glusterSD/192.168.24.15:_images
>>>> >type fuse.glusterfs
>>>>
>>>>
>>>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>>>> >192.168.24.18:/images3 on
>>>> >/rhev/data-center/mnt/glusterSD/192.168.24.18:_images3
>>>> >type fuse.glusterfs
>>>>
>>>>
>>>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>>>> >tmpfs on /run/user/0 type tmpfs
>>>> >(rw,nosuid,nodev,relatime,seclabel,size=13198392k,mode=700)
>>>> >[root@ov06 glusterfs]#
>>>> >
>>>> >Also here is a screenshot of the console
>>>> >
>>>> >[image: image.png]
>>>> >The other domains are up
>>>> >
>>>> >Import0 and Import1 are NFS . GLCL0 is gluster. They all are
>>running
>>>> >VMs
>>>> >
>>>> >Thank You For Your Help !
>>>> >
>>>> >On Thu, Jun 18, 2020 at 3:51 PM Strahil Nikolov
>><hunter86_bg(a)yahoo.com>
>>>> >wrote:
>>>> >
>>>> >> I don't see
'/rhev/data-center/mnt/192.168.24.13:_stor_import1'
>>>> >mounted
>>>> >> at all .
>>>> >> What is the status of all storage domains ?
>>>> >>
>>>> >> Best Regards,
>>>> >> Strahil Nikolov
>>>> >>
>>>> >> На 18 юни 2020 г. 21:43:44 GMT+03:00, C Williams
>>>> ><cwilliams3320(a)gmail.com>
>>>> >> написа:
>>>> >> > Resending to deal with possible email issues
>>>> >> >
>>>> >> >---------- Forwarded message ---------
>>>> >> >From: C Williams <cwilliams3320(a)gmail.com>
>>>> >> >Date: Thu, Jun 18, 2020 at 2:07 PM
>>>> >> >Subject: Re: [ovirt-users] Issues with Gluster Domain
>>>> >> >To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
>>>> >> >
>>>> >> >
>>>> >> >More
>>>> >> >
>>>> >> >[root@ov06 ~]# for i in $(gluster volume list); do echo
>>$i;echo;
>>>> >> >gluster
>>>> >> >volume info $i; echo;echo;gluster volume status
>>>> >$i;echo;echo;echo;done
>>>> >> >images3
>>>> >> >
>>>> >> >
>>>> >> >Volume Name: images3
>>>> >> >Type: Replicate
>>>> >> >Volume ID: 0243d439-1b29-47d0-ab39-d61c2f15ae8b
>>>> >> >Status: Started
>>>> >> >Snapshot Count: 0
>>>> >> >Number of Bricks: 1 x 3 = 3
>>>> >> >Transport-type: tcp
>>>> >> >Bricks:
>>>> >> >Brick1: 192.168.24.18:/bricks/brick04/images3
>>>> >> >Brick2: 192.168.24.19:/bricks/brick05/images3
>>>> >> >Brick3: 192.168.24.20:/bricks/brick06/images3
>>>> >> >Options Reconfigured:
>>>> >> >performance.client-io-threads: on
>>>> >> >nfs.disable: on
>>>> >> >transport.address-family: inet
>>>> >> >user.cifs: off
>>>> >> >auth.allow: *
>>>> >> >performance.quick-read: off
>>>> >> >performance.read-ahead: off
>>>> >> >performance.io-cache: off
>>>> >> >performance.low-prio-threads: 32
>>>> >> >network.remote-dio: off
>>>> >> >cluster.eager-lock: enable
>>>> >> >cluster.quorum-type: auto
>>>> >> >cluster.server-quorum-type: server
>>>> >> >cluster.data-self-heal-algorithm: full
>>>> >> >cluster.locking-scheme: granular
>>>> >> >cluster.shd-max-threads: 8
>>>> >> >cluster.shd-wait-qlength: 10000
>>>> >> >features.shard: on
>>>> >> >cluster.choose-local: off
>>>> >> >client.event-threads: 4
>>>> >> >server.event-threads: 4
>>>> >> >storage.owner-uid: 36
>>>> >> >storage.owner-gid: 36
>>>> >> >performance.strict-o-direct: on
>>>> >> >network.ping-timeout: 30
>>>> >> >cluster.granular-entry-heal: enable
>>>> >> >
>>>> >> >
>>>> >> >Status of volume: images3
>>>> >> >Gluster process TCP Port RDMA
Port
>>>> >Online
>>>> >> > Pid
>>>> >>
>>>> >>
>>>>
>>>>
>>>>------------------------------------------------------------------------------
>>>> >> >Brick 192.168.24.18:/bricks/brick04/images3 49152 0
>>Y
>>>> >> >6666
>>>> >> >Brick 192.168.24.19:/bricks/brick05/images3 49152 0
>>Y
>>>> >> >6779
>>>> >> >Brick 192.168.24.20:/bricks/brick06/images3 49152 0
>>Y
>>>> >> >7227
>>>> >> >Self-heal Daemon on localhost N/A N/A
>>Y
>>>> >> >6689
>>>> >> >Self-heal Daemon on ov07.ntc.srcle.com N/A N/A
>>Y
>>>> >> >6802
>>>> >> >Self-heal Daemon on ov08.ntc.srcle.com N/A N/A
>>Y
>>>> >> >7250
>>>> >> >
>>>> >> >Task Status of Volume images3
>>>> >>
>>>> >>
>>>>
>>>>
>>>>------------------------------------------------------------------------------
>>>> >> >There are no active volume tasks
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >[root@ov06 ~]# ls -l /rhev/data-center/mnt/glusterSD/
>>>> >> >total 16
>>>> >> >drwxr-xr-x. 5 vdsm kvm 8192 Jun 18 14:04
192.168.24.15:_images
>>>> >> >drwxr-xr-x. 5 vdsm kvm 8192 Jun 18 14:05
192.168.24.18:_images3
>>>> >> >[root@ov06 ~]#
>>>> >> >
>>>> >> >On Thu, Jun 18, 2020 at 2:03 PM C Williams
>><cwilliams3320(a)gmail.com>
>>>> >> >wrote:
>>>> >> >
>>>> >> >> Strahil,
>>>> >> >>
>>>> >> >> Here you go -- Thank You For Your Help !
>>>> >> >>
>>>> >> >> BTW -- I can write a test file to gluster and it
replicates
>>>> >properly.
>>>> >> >> Thinking something about the oVirt Storage Domain ?
>>>> >> >>
>>>> >> >> [root@ov08 ~]# gluster pool list
>>>> >> >> UUID Hostname
>>>> >State
>>>> >> >> 5b40c659-d9ab-43c3-9af8-18b074ea0b83 ov06
>>>> >> >Connected
>>>> >> >> 36ce5a00-6f65-4926-8438-696944ebadb5
ov07.ntc.srcle.com
>>>> >> >Connected
>>>> >> >> c7e7abdb-a8f4-4842-924c-e227f0db1b29 localhost
>>>> >> >Connected
>>>> >> >> [root@ov08 ~]# gluster volume list
>>>> >> >> images3
>>>> >> >>
>>>> >> >> On Thu, Jun 18, 2020 at 1:13 PM Strahil Nikolov
>>>> >> ><hunter86_bg(a)yahoo.com>
>>>> >> >> wrote:
>>>> >> >>
>>>> >> >>> Log to the oVirt cluster and provide the output
of:
>>>> >> >>> gluster pool list
>>>> >> >>> gluster volume list
>>>> >> >>> for i in $(gluster volume list); do echo $i;echo;
gluster
>>>> >volume
>>>> >> >info
>>>> >> >>> $i; echo;echo;gluster volume status
$i;echo;echo;echo;done
>>>> >> >>>
>>>> >> >>> ls -l /rhev/data-center/mnt/glusterSD/
>>>> >> >>>
>>>> >> >>> Best Regards,
>>>> >> >>> Strahil Nikolov
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> На 18 юни 2020 г. 19:17:46 GMT+03:00, C Williams
>>>> >> ><cwilliams3320(a)gmail.com>
>>>> >> >>> написа:
>>>> >> >>> >Hello,
>>>> >> >>> >
>>>> >> >>> >I recently added 6 hosts to an existing oVirt
>>compute/gluster
>>>> >> >cluster.
>>>> >> >>> >
>>>> >> >>> >Prior to this attempted addition, my cluster
had 3
>>Hypervisor
>>>> >hosts
>>>> >> >and
>>>> >> >>> >3
>>>> >> >>> >gluster bricks which made up a single gluster
volume
>>(replica 3
>>>> >> >volume)
>>>> >> >>> >. I
>>>> >> >>> >added the additional hosts and made a brick on
3 of the new
>>>> >hosts
>>>> >> >and
>>>> >> >>> >attempted to make a new replica 3 volume. I
had difficulty
>>>> >> >creating
>>>> >> >>> >the
>>>> >> >>> >new volume. So, I decided that I would make a
new
>>>> >compute/gluster
>>>> >> >>> >cluster
>>>> >> >>> >for each set of 3 new hosts.
>>>> >> >>> >
>>>> >> >>> >I removed the 6 new hosts from the existing
oVirt
>>>> >Compute/Gluster
>>>> >> >>> >Cluster
>>>> >> >>> >leaving the 3 original hosts in place with
their bricks. At
>>that
>>>> >> >point
>>>> >> >>> >my
>>>> >> >>> >original bricks went down and came back up .
The volume
>>showed
>>>> >> >entries
>>>> >> >>> >that
>>>> >> >>> >needed healing. At that point I ran gluster
volume heal
>>images3
>>>> >> >full,
>>>> >> >>> >etc.
>>>> >> >>> >The volume shows no unhealed entries. I also
corrected some
>>peer
>>>> >> >>> >errors.
>>>> >> >>> >
>>>> >> >>> >However, I am unable to copy disks, move disks
to another
>>>> >domain,
>>>> >> >>> >export
>>>> >> >>> >disks, etc. It appears that the engine cannot
locate disks
>>>> >properly
>>>> >> >and
>>>> >> >>> >I
>>>> >> >>> >get storage I/O errors.
>>>> >> >>> >
>>>> >> >>> >I have detached and removed the oVirt Storage
Domain. I
>>>> >reimported
>>>> >> >the
>>>> >> >>> >domain and imported 2 VMs, But the VM disks
exhibit the same
>>>> >> >behaviour
>>>> >> >>> >and
>>>> >> >>> >won't run from the hard disk.
>>>> >> >>> >
>>>> >> >>> >
>>>> >> >>> >I get errors such as this
>>>> >> >>> >
>>>> >> >>> >VDSM ov05 command HSMGetAllTasksStatusesVDS
failed: low
>>level
>>>> >Image
>>>> >> >>> >copy
>>>> >> >>> >failed: ("Command
['/usr/bin/qemu-img', 'convert', '-p',
>>'-t',
>>>> >> >'none',
>>>> >> >>> >'-T', 'none', '-f',
'raw',
>>>> >> >>>
>u'/rhev/data-center/mnt/glusterSD/192.168.24.18:
>>>> >> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>_images3/5fe3ad3f-2d21-404c-832e-4dc7318ca10d/images/3ea5afbd-0fe0-4c09-8d39-e556c66a8b3d/fe6eab63-3b22-4815-bfe6-4a0ade292510',
>>>> >> >>> >'-O', 'raw',
>>>> >> >>> >u'/rhev/data-center/mnt/192.168.24.13:
>>>> >> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>_stor_import1/1ab89386-a2ba-448b-90ab-bc816f55a328/images/f707a218-9db7-4e23-8bbd-9b12972012b6/d6591ec5-3ede-443d-bd40-93119ca7c7d5']
>>>> >> >>> >failed with rc=1 out=''
err=bytearray(b'qemu-img: error
>>while
>>>> >> >reading
>>>> >> >>> >sector 135168: Transport endpoint is not
>>connected\\nqemu-img:
>>>> >> >error
>>>> >> >>> >while
>>>> >> >>> >reading sector 131072: Transport endpoint is
not
>>>> >> >connected\\nqemu-img:
>>>> >> >>> >error while reading sector 139264: Transport
endpoint is not
>>>> >> >>> >connected\\nqemu-img: error while reading
sector 143360:
>>>> >Transport
>>>> >> >>> >endpoint
>>>> >> >>> >is not connected\\nqemu-img: error while
reading sector
>>147456:
>>>> >> >>> >Transport
>>>> >> >>> >endpoint is not connected\\nqemu-img: error
while reading
>>sector
>>>> >> >>> >155648:
>>>> >> >>> >Transport endpoint is not connected\\nqemu-img:
error while
>>>> >reading
>>>> >> >>> >sector
>>>> >> >>> >151552: Transport endpoint is not
connected\\nqemu-img:
>>error
>>>> >while
>>>> >> >>> >reading
>>>> >> >>> >sector 159744: Transport endpoint is not
connected\\n')",)
>>>> >> >>> >
>>>> >> >>> >oVirt version is 4.3.82-1.el7
>>>> >> >>> >OS CentOS Linux release 7.7.1908 (Core)
>>>> >> >>> >
>>>> >> >>> >The Gluster Cluster has been working very well
until this
>>>> >incident.
>>>> >> >>> >
>>>> >> >>> >Please help.
>>>> >> >>> >
>>>> >> >>> >Thank You
>>>> >> >>> >
>>>> >> >>> >Charles Williams
>>>> >> >>>
>>>> >> >>
>>>> >>
>>>>
>>>
>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement: