On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:


On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,

another clue in the logs :
[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied]
[2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)
[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend
[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied]
[2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)
[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied]
[2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied]
[2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)


Anyone got an idea why is it happening?
I checked user/group and selinux permissions - all OK

>Can you share the commands (and output) used to check this?
I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:

[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll
total 562145
-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f
-rw-rw----. 1 vdsm kvm    1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease
-rw-r--r--. 1 vdsm kvm        313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches 
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches 
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s
1280+0 records in
1280+0 records out
5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s

As you can see , once root user reads the file -> vdsm user can also do that.


>I would try this on the hypervisor to check what vdsm/qemu see:

>$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path
I'm attaching the output of the find I run, but this one should be enough:
[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
[root@ovirt1 ~]# 


>Also, to make sure we don't have selinux issue on the hypervisor, you can change
>selinux to permissive mode:

  >  setenforce 0

This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.


>And then try again. If this was selinux issue the permission denied issue will disappear.
>If this is the case please provide the output of:

  >  ausearh -m AVC -ts today

>If the issue still exists, we eliminated selinux, and you can enable it again:

  >  setenforce 1

[root@ovirt3 ~]# ausearch -m AVC -ts today
<no matches>
[root@ovirt2 ~]# ausearch -m AVC -ts today
<no matches>
[root@ovirt1 ~]# ausearch -m AVC -ts today
<no matches>

I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.



Nir
 
I even replaced one of the disks and healed , but the result is the same for all my VMs. 

Have you checked the permission for user/group are set correctly across all the bricks in the cluster?
What does ls -la on the images directory from mount of the volume show you.

Adding Krutika and Rafi as they ran into a similar issue in the past.


Best Regards,
Strahil Nikolov


В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа:


Hello All,

my engine is back online , but I'm still having difficulties to make vdsm powerup the systems.
I think that the events generated today can lead me to the right direction(just an example , many more are there):

VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory')

I will try to collect a fresh log and see what is it complaining about this time.

Best Regards,
Strahil Nikolov

>Hi Sahina,

>I have a strange situation:
>1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB.
>2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.

>My guess is that I need to check what's going on the bricks themselve ...

>Best Regards,
>Strahil Nikolov



В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <sabose@redhat.com> написа:




On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Sahina,

Sadly engine logs have no errors.
I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".
During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.
Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.
It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started.

After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot).



Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ?

The vdsm logs have this:
2019-11-17 10:21:23,892+0200 INFO  (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075)
2019-11-17 10:21:23,892+0200 INFO  (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062)
2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181)

Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade?

Are you able to copy this file to a different location and try running a VM with this image?

Any errors in the mount log of gluster1:_data__fast volume?


Best Regards,
Strahil Nikolov 




В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа:




On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
+Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here?


Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto:
So far,

I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.
It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.


Any errors in vdsm logs? engine logs?



Best Regards,
Strahil Nikolov

В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа:


I got upgraded  to  RC3 and now  cannot power any VM .
Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.

Removing the HighAvailability doesn't help.

I guess I should restore  the engine from the gluster snapshot and rollback via 'yum history undo last'.

Does anyone else have my issues  ?

Best Regards,
Strahil Nikolov

On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:


Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto:


Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto:
Hello,

I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.

I'm talking about this bug:

I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.


Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa
ovirt-engine-4.3.6.6
ovirt-engine-4.3.6.7
ovirt-engine-4.3.7.0
ovirt-engine-4.3.7.1

So the fix is already included in release oVirt 4.3.6.

Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143@Ryan Barry can you please review?

 




 

BR Florian Schmid


Von: "Sandro Bonazzola" <sbonazzo@redhat.com>
An: "users" <users@ovirt.org>
Gesendet: Mittwoch, 13. November 2019 13:34:59
Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.

This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series.
This is pre-release software. This pre-release should not to be used in production.

This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 7.7 or later (but <8)
* CentOS Linux (or similar) 7.7 or later (but <8)

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for:
* Red Hat Enterprise Linux 7.7 or later (but <8)
* CentOS Linux (or similar) 7.7 or later (but <8)
* oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release

See the release notes [1] for known issues, new features and bugs fixed.

While testing this release candidate please note that oVirt node now includes:
- ansible 2.9.0
- GlusterFS 6.6

Notes:
- oVirt Appliance is already available
- oVirt Node is already available

Additional Resources:
* Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/
* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/

[1] http://www.ovirt.org/release/4.3.7/
[2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA

sbonazzo@redhat.com   

Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMHLDYBUDVZML2DEF7PKQ/


--


--


--
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4YLTOEMD2REQHCYEGZINTJRFF6B3VROV/