Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil NikolovOn Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review?
BR Florian Schmid
________________________________ Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA
sbonazzo@redhat.com
Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA
sbonazzo@redhat.com
Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA
sbonazzo@redhat.com
Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

+Sahina Bose <sabose@redhat.com> +Gobinda Das <godas@redhat.com> +Nir Soffer <nsoffer@redhat.com> +Tal Nisan <tnisan@redhat.com> can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
So far,
I have rolled back the engine and the 3 hosts - still cannot manipulate the storage. It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.
Best Regards, Strahil Nikolov
В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil < hunter86_bg@yahoo.com> написа:
I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.
Removing the HighAvailability doesn't help.
I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'.
Does anyone else have my issues ?
Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola < sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid < fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry <rbarry@redhat.com> can you please review?
BR Florian Schmid
------------------------------ *Von: *"Sandro Bonazzola" <sbonazzo@redhat.com> *An: *"users" <users@ovirt.org> *Gesendet: *Mittwoch, 13. November 2019 13:34:59 *Betreff: *[ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
+Sahina Bose <sabose@redhat.com> +Gobinda Das <godas@redhat.com> +Nir Soffer <nsoffer@redhat.com> +Tal Nisan <tnisan@redhat.com> can you please help here?
Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
So far,
I have rolled back the engine and the 3 hosts - still cannot manipulate the storage. It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.
Any errors in vdsm logs? engine logs?
Best Regards, Strahil Nikolov
В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil < hunter86_bg@yahoo.com> написа:
I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.
Removing the HighAvailability doesn't help.
I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'.
Does anyone else have my issues ?
Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola < sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid < fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry <rbarry@redhat.com> can you please review?
BR Florian Schmid
------------------------------ *Von: *"Sandro Bonazzola" <sbonazzo@redhat.com> *An: *"users" <users@ovirt.org> *Gesendet: *Mittwoch, 13. November 2019 13:34:59 *Betreff: *[ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

Hi Sahina, Sadly engine logs have no errors.I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started. After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot). The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ? Best Regards,Strahil Nikolov В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа: On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote: +Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto: So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Any errors in vdsm logs? engine logs? Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Sahina,
Sadly engine logs have no errors. I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK". During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK. Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same. It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started.
After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot).
The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM
Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ?
The vdsm logs have this: 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181) Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade? Are you able to copy this file to a different location and try running a VM with this image? Any errors in the mount log of gluster1:_data__fast volume?
Best Regards, Strahil Nikolov
В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose < sabose@redhat.com> написа:
On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
+Sahina Bose <sabose@redhat.com> +Gobinda Das <godas@redhat.com> +Nir Soffer <nsoffer@redhat.com> +Tal Nisan <tnisan@redhat.com> can you please help here?
Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
So far,
I have rolled back the engine and the 3 hosts - still cannot manipulate the storage. It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.
Any errors in vdsm logs? engine logs?
Best Regards, Strahil Nikolov
В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil < hunter86_bg@yahoo.com> написа:
I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.
Removing the HighAvailability doesn't help.
I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'.
Does anyone else have my issues ?
Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola < sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid < fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry <rbarry@redhat.com> can you please review?
BR Florian Schmid
------------------------------ *Von: *"Sandro Bonazzola" <sbonazzo@redhat.com> *An: *"users" <users@ovirt.org> *Gesendet: *Mittwoch, 13. November 2019 13:34:59 *Betreff: *[ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

Hello All, my engine is back online , but I'm still having difficulties to make vdsm powerup the systems.I think that the events generated today can lead me to the right direction(just an example , many more are there): VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory') I will try to collect a fresh log and see what is it complaining about this time. Best Regards,Strahil Nikolov
Hi Sahina, I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <sabose@redhat.com> написа: On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi Sahina, Sadly engine logs have no errors.I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started. After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot). The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ? The vdsm logs have this:2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181) Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade? Are you able to copy this file to a different location and try running a VM with this image? Any errors in the mount log of gluster1:_data__fast volume? Best Regards,Strahil Nikolov В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа: On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote: +Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto: So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Any errors in vdsm logs? engine logs? Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OKI even replaced one of the disks and healed , but the result is the same for all my VMs. Best Regards,Strahil Nikolov В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hello All, my engine is back online , but I'm still having difficulties to make vdsm powerup the systems.I think that the events generated today can lead me to the right direction(just an example , many more are there): VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory') I will try to collect a fresh log and see what is it complaining about this time. Best Regards,Strahil Nikolov
Hi Sahina, I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <sabose@redhat.com> написа: On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi Sahina, Sadly engine logs have no errors.I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started. After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot). The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ? The vdsm logs have this:2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181) Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade? Are you able to copy this file to a different location and try running a VM with this image? Any errors in the mount log of gluster1:_data__fast volume? Best Regards,Strahil Nikolov В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа: On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote: +Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto: So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Any errors in vdsm logs? engine logs? Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK I even replaced one of the disks and healed , but the result is the same for all my VMs.
Have you checked the permission for user/group are set correctly across all the bricks in the cluster? What does ls -la on the images directory from mount of the volume show you. Adding Krutika and Rafi as they ran into a similar issue in the past.
Best Regards, Strahil Nikolov
В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hello All,
my engine is back online , but I'm still having difficulties to make vdsm powerup the systems. I think that the events generated today can lead me to the right direction(just an example , many more are there):
VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory')
I will try to collect a fresh log and see what is it complaining about this time.
Best Regards, Strahil Nikolov
Hi Sahina,
I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose < sabose@redhat.com> написа:
On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Sahina,
Sadly engine logs have no errors. I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK". During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK. Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same. It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started.
After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot).
The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM
Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ?
The vdsm logs have this: 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181)
Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade?
Are you able to copy this file to a different location and try running a VM with this image?
Any errors in the mount log of gluster1:_data__fast volume?
Best Regards, Strahil Nikolov
В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose < sabose@redhat.com> написа:
On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
+Sahina Bose <sabose@redhat.com> +Gobinda Das <godas@redhat.com> +Nir Soffer <nsoffer@redhat.com> +Tal Nisan <tnisan@redhat.com> can you please help here?
Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
So far,
I have rolled back the engine and the 3 hosts - still cannot manipulate the storage. It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.
Any errors in vdsm logs? engine logs?
Best Regards, Strahil Nikolov
В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil < hunter86_bg@yahoo.com> написа:
I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.
Removing the HighAvailability doesn't help.
I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'.
Does anyone else have my issues ?
Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola < sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid < fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry <rbarry@redhat.com> can you please review?
BR Florian Schmid
------------------------------ *Von: *"Sandro Bonazzola" <sbonazzo@redhat.com> *An: *"users" <users@ovirt.org> *Gesendet: *Mittwoch, 13. November 2019 13:34:59 *Betreff: *[ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I would try this on the hypervisor to check what vdsm/qemu see: $ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode: setenforce 0 And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of: ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: setenforce 1 Nir
I even replaced one of the disks and healed , but the result is the same
for all my VMs.
Have you checked the permission for user/group are set correctly across all the bricks in the cluster? What does ls -la on the images directory from mount of the volume show you.
Adding Krutika and Rafi as they ran into a similar issue in the past.
Best Regards, Strahil Nikolov
В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hello All,
my engine is back online , but I'm still having difficulties to make vdsm powerup the systems. I think that the events generated today can lead me to the right direction(just an example , many more are there):
VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory')
I will try to collect a fresh log and see what is it complaining about this time.
Best Regards, Strahil Nikolov
Hi Sahina,
I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose < sabose@redhat.com> написа:
On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Sahina,
Sadly engine logs have no errors. I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK". During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK. Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same. It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started.
After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot).
The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM
Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ?
The vdsm logs have this: 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181)
Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade?
Are you able to copy this file to a different location and try running a VM with this image?
Any errors in the mount log of gluster1:_data__fast volume?
Best Regards, Strahil Nikolov
В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose < sabose@redhat.com> написа:
On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
+Sahina Bose <sabose@redhat.com> +Gobinda Das <godas@redhat.com> +Nir Soffer <nsoffer@redhat.com> +Tal Nisan <tnisan@redhat.com> can you please help here?
Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
So far,
I have rolled back the engine and the 3 hosts - still cannot manipulate the storage. It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy.
Any errors in vdsm logs? engine logs?
Best Regards, Strahil Nikolov
В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil < hunter86_bg@yahoo.com> написа:
I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one.
Removing the HighAvailability doesn't help.
I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'.
Does anyone else have my issues ?
Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola < sbonazzo@redhat.com> ha scritto:
Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid < fschmid@ubimet.com> ha scritto:
Hello,
I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released.
I'm talking about this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1749202
I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug.
Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa ovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1
So the fix is already included in release oVirt 4.3.6.
Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry <rbarry@redhat.com> can you please review?
BR Florian Schmid
------------------------------ *Von: *"Sandro Bonazzola" <sbonazzo@redhat.com> *An: *"users" <users@ovirt.org> *Gesendet: *Mittwoch, 13. November 2019 13:34:59 *Betreff: *[ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019.
This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production.
This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release
See the release notes [1] for known issues, new features and bugs fixed.
While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6
Notes: - oVirt Appliance is already available - oVirt Node is already available
Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote: On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this?I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite: [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s As you can see , once root user reads the file -> vdsm user can also do that.
I would try this on the hypervisor to check what vdsm/qemu see: $ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print[root@ovirt1 ~]#
Also, to make sure we don't have selinux issue on the hypervisor, you can change>selinux to permissive mode: > setenforce 0 This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear.>If this is the case please provide the output of: > ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: > setenforce 1 [root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no matches> I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
Nir I even replaced one of the disks and healed , but the result is the same for all my VMs. Have you checked the permission for user/group are set correctly across all the bricks in the cluster?What does ls -la on the images directory from mount of the volume show you. Adding Krutika and Rafi as they ran into a similar issue in the past. Best Regards,Strahil Nikolov В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hello All, my engine is back online , but I'm still having difficulties to make vdsm powerup the systems.I think that the events generated today can lead me to the right direction(just an example , many more are there): VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory') I will try to collect a fresh log and see what is it complaining about this time. Best Regards,Strahil Nikolov
Hi Sahina, I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <sabose@redhat.com> написа: On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi Sahina, Sadly engine logs have no errors.I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started. After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot). The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ? The vdsm logs have this:2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181) Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade? Are you able to copy this file to a different location and try running a VM with this image? Any errors in the mount log of gluster1:_data__fast volume? Best Regards,Strahil Nikolov В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа: On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote: +Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto: So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Any errors in vdsm logs? engine logs? Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4YLTOEMD2REQHC...

On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely. Nir

Hi Nir, thanks for your feedback. My guess is that as gluster is setup to use shard - the first one (the volume file) is OK ,but then the shard is giving an error. As gluster1=ovirt1 and gluster2=ovirt2 & ovirt3 is the arbiter , I think it's not SELINUX issue (all nodes are still in permissive - just to be sure that nothing comes up) on gluster level. As I can't powerup any VM (in the data_fastX domains), I took more radical step and enabled Trace log on the Fuse mount and started a VM. As per the log , I got :[2019-11-23 17:15:16.961636] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449da e.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-23 17:15:16.961669] D [MSGID: 0] [client-rpc-fops_v2.c:2642:client4_0_lookup_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-client-2 returned -1 error: Permission denied [Permission denied] [2019-11-23 17:15:16.961705] D [MSGID: 0] [afr-common.c:2464:afr_lookup_done] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-replicate-0 returned -1 error: Permission denied [Permissi on denied] [2019-11-23 17:15:16.961735] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-23 17:15:16.961764] D [MSGID: 0] [shard.c:842:shard_common_failure_unwind] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-shard returned -1 error: Permission denied [Permissi on denied] [2019-11-23 17:15:16.961793] D [MSGID: 0] [utime-autogen-fops.c:100:gf_utime_readv_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-utime returned -1 error: Permission denied [Perm ission denied] [2019-11-23 17:15:16.961821] D [MSGID: 0] [defaults.c:1189:default_readv_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-write-behind returned -1 error: Permission denied [Permiss ion denied] [2019-11-23 17:15:16.961848] D [MSGID: 0] [defaults.c:1189:default_readv_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-open-behind returned -1 error: Permission denied [Permissi on denied] [2019-11-23 17:15:16.961876] D [MSGID: 0] [md-cache.c:2010:mdc_readv_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-md-cache returned -1 error: Permission denied [Permission deni ed] [2019-11-23 17:15:16.961902] D [MSGID: 0] [defaults.c:1189:default_readv_cbk] 0-stack-trace: stack-address: 0x7fbeb4005218, data_fast-io-threads returned -1 error: Permission denied [Permissio n denied] [2019-11-23 17:15:16.961938] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 703: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbeb4017018 (Permission denied) This indicates that the shard 79 for gfid b0af2b81-22cf-482e-9b2f-c431b6449dae has to be checked. So I first check the brick logs on ovirt1 from the moment when VM was powered up:[2019-11-23 17:15:16.962013] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:43c4e254-d99e-45b3-a6b2-dba28b9c2269-GRAPH_ID:0-PID:1842 4-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated- fop:INVALID, acl:-) [Permission denied] [2019-11-23 17:15:16.962113] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 541: LOOKUP /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (be318638-e8a0-4c6d -977d-7a937aa84806/b0af2b81-22cf-482e-9b2f-c431b6449dae.79), client: CTX_ID:43c4e254-d99e-45b3-a6b2-dba28b9c2269-GRAPH_ID:0-PID:18424-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_N O:-0, error-xlator: data_fast-access-control [Permission denied] [2019-11-23 17:15:16.962096] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:43c4e254-d99e-45b3-a6b2-dba28b9c2269-GRAPH_ID:0-PID:1842 4-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated- fop:INVALID, acl:-) [Permission denied] All of the above doesn't give me any clue what is it going about.And the actual shard: [root@ovirt1 .shard]# ls -lahRZ b0af2b81-22cf-482e-9b2f-c431b6449dae.79 -rw-rw----. vdsm kvm system_u:object_r:glusterd_brick_t:s0 b0af2b81-22cf-482e-9b2f-c431b6449dae.79 [root@ovirt1 .shard]# getfattr -d -e hex -m. b0af2b81-22cf-482e-9b2f-c431b6449dae.79 # file: b0af2b81-22cf-482e-9b2f-c431b6449dae.79 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xac073db9c9934d0bb70bb50dd11bee31 trusted.gfid2path.bedc7b97ff07378d=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f62306166326238312d323263662d343832652d396232662d6334333162363434396461652e3739 trusted.glusterfs.mdata=0x010000000000000000000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1 [root@ovirt2 .shard]# ls -lahRZ b0af2b81-22cf-482e-9b2f-c431b6449dae.79 -rw-rw----. vdsm kvm system_u:object_r:glusterd_brick_t:s0 b0af2b81-22cf-482e-9b2f-c431b6449dae.79 [root@ovirt2 .shard]# getfattr -d -e hex -m. b0af2b81-22cf-482e-9b2f-c431b6449dae.79 # file: b0af2b81-22cf-482e-9b2f-c431b6449dae.79 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xac073db9c9934d0bb70bb50dd11bee31 trusted.gfid2path.bedc7b97ff07378d=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f62306166326238312d323263662d343832652d396232662d6334333162363434396461652e3739 trusted.glusterfs.mdata=0x010000000000000000000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1 [root@ovirt3 .shard]# ls -lahRZ b0af2b81-22cf-482e-9b2f-c431b6449dae.79 -rw-rw----. vdsm kvm system_u:object_r:glusterd_brick_t:s0 b0af2b81-22cf-482e-9b2f-c431b6449dae.79 [root@ovirt3 .shard]# getfattr -d -e hex -m. b0af2b81-22cf-482e-9b2f-c431b6449dae.79 # file: b0af2b81-22cf-482e-9b2f-c431b6449dae.79 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xac073db9c9934d0bb70bb50dd11bee31 trusted.gfid2path.bedc7b97ff07378d=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f62306166326238312d323263662d343832652d396232662d6334333162363434396461652e3739 trusted.glusterfs.mdata=0x010000000000000000000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1000000005dc94ff0000000001eb592b1 Access for user qemu (uid 107/gid 107 -> gluster brick log on ovirt1) is OK: [root@ovirt1 .shard]# sudo -u qemu dd if=b0af2b81-22cf-482e-9b2f-c431b6449dae.79 of=/dev/null bs=8M iflag=direct status=progress 8+0 прочетени блока 8+0 записани блока изкопирани са 67108864 байта (67 MB), 0,0130595 s, 5,1 GB/s [root@ovirt2 .shard]# sudo -u qemu dd if=b0af2b81-22cf-482e-9b2f-c431b6449dae.79 of=/dev/null bs=8M iflag=direct status=progress 8+0 прочетени блока 8+0 записани блока изкопирани са 67108864 байта (67 MB), 0,0224745 s, 3,0 GB/s [root@ovirt3 .shard]# sudo -u qemu dd if=b0af2b81-22cf-482e-9b2f-c431b6449dae.79 of=/dev/null bs=8M iflag=direct status=progress 0+0 прочетени блока 0+0 записани блока изкопирани са 0 байта (0 B), 0,00103977 s, 0,0 kB/s [root@ovirt1 .shard]# grep 107 /etc/passwd /etc/group /etc/passwd:qemu:x:107:107:qemu user:/:/sbin/nologin /etc/group:qemu:x:107:vdsm,sanlock [root@ovirt2 ~]# grep 107 /etc/passwd /etc/group /etc/passwd:qemu:x:107:107:qemu user:/:/sbin/nologin /etc/group:qemu:x:107:vdsm,sanlock [root@ovirt3 .shard]# grep 107 /etc/passwd /etc/group /etc/passwd:qemu:x:107:107:qemu user:/:/sbin/nologin /etc/group:qemu:x:107:vdsm,sanlock I've created 3 strace files (1 as root and 2 as vdsm) and attached them to this e-mail: [root@ovirt2 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm strace -fvttTyyx -s 4096 -o /tmp/dd-strace-vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null iflag=direct bs=8M stat us=progress dd: грешка при четене на „5b1d3113-5cca-4582-9029-634b16338a2f“: Отказан достъп 8+0 прочетени блока 8+0 записани блока изкопирани са 67108864 байта (67 MB), 0,118223 s, 568 MB/s [root@ovirt2 94f763e9-fd96-4bee-a6b2-31af841a918b]# strace -fvttTyyx -s 4096 -o /tmp/dd-strace-root dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null iflag=direct bs=8M status=progress изкопирани са 5083496448 байта (5,1 GB), 10,144199 s, 501 MB/s 640+0 прочетени блока 640+0 записани блока изкопирани са 5368709120 байта (5,4 GB), 10,7444 s, 500 MB/s [root@ovirt2 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm strace -fvttTyyx -s 4096 -o /tmp/dd-strace-vdsm-2nd-try dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null iflag=direct bs =8M status=progress изкопирани са 4815060992 байта (4,8 GB), 8,056124 s, 598 MB/s 640+0 прочетени блока 640+0 записани блока изкопирани са 5368709120 байта (5,4 GB), 8,98862 s, 597 MB/s P.S.: Sorry for the cyrrilic , it's my wife's laptop :). First case gives an error and reads 67MB, while the second one works as expected.Then vdsm can access the volume without issue (quite strange)! В петък, 22 ноември 2019 г., 23:44:27 ч. Гринуич+2, Nir Soffer <nsoffer@redhat.com> написа: On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote: On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this?I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite: [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied... 16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side. I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label>>of the entire tree will be more useful. [root@ovirt2 gluster1:_data__fast]# tree . ├── 396604d9-2a9e-49cd-9563-fdc79981f67b │ ├── dom_md │ │ ├── ids │ │ ├── inbox │ │ ├── leases │ │ ├── metadata │ │ ├── outbox │ │ └── xleases │ ├── images │ │ ├── 225e4c45-0f70-40aa-8d51-7b705f259cd7 │ │ │ ├── 7592ce4c-ecb9-4dbe-ba67-a343ec4cdbc5 │ │ │ ├── 7592ce4c-ecb9-4dbe-ba67-a343ec4cdbc5.lease │ │ │ ├── 7592ce4c-ecb9-4dbe-ba67-a343ec4cdbc5.meta │ │ │ ├── b965dd4e-c814-4bb6-8bc5-d1bbb8dc211d │ │ │ ├── b965dd4e-c814-4bb6-8bc5-d1bbb8dc211d.lease │ │ │ └── b965dd4e-c814-4bb6-8bc5-d1bbb8dc211d.meta │ │ ├── 25107759-f56b-4d94-872e-0f9b145886d1 │ │ │ ├── 571a5eb4-eb92-479c-8305-79eaa0673bb2 │ │ │ ├── 571a5eb4-eb92-479c-8305-79eaa0673bb2.lease │ │ │ ├── 571a5eb4-eb92-479c-8305-79eaa0673bb2.meta │ │ │ └── 571a5eb4-eb92-479c-8305-79eaa0673bb2.meta_old │ │ ├── 2c79e7ab-47cc-4f41-a190-40cd94053d83 │ │ │ ├── 6a6d6ff8-92b2-4a14-b54c-77239e28254d │ │ │ ├── 6a6d6ff8-92b2-4a14-b54c-77239e28254d.lease │ │ │ ├── 6a6d6ff8-92b2-4a14-b54c-77239e28254d.meta │ │ │ └── 6a6d6ff8-92b2-4a14-b54c-77239e28254d.meta_old │ │ ├── 36fed07b-a08f-4fb9-a273-9dbf0ad1f27b │ │ │ ├── 44ac76fe-dd0d-471d-8ffe-cb9ad28b07a9 │ │ │ ├── 44ac76fe-dd0d-471d-8ffe-cb9ad28b07a9.lease │ │ │ └── 44ac76fe-dd0d-471d-8ffe-cb9ad28b07a9.meta │ │ ├── 7d11479e-1a02-4a74-a9be-14b4e56faaa1 │ │ │ ├── 3e9a41f6-652e-4439-bd24-3e7621c27f4a │ │ │ ├── 3e9a41f6-652e-4439-bd24-3e7621c27f4a.lease │ │ │ ├── 3e9a41f6-652e-4439-bd24-3e7621c27f4a.meta │ │ │ ├── a2b94064-9e34-47c9-8eab-867673c5e48f │ │ │ ├── a2b94064-9e34-47c9-8eab-867673c5e48f.lease │ │ │ └── a2b94064-9e34-47c9-8eab-867673c5e48f.meta │ │ ├── 85f18244-6d5b-4d02-ab66-140bfd2d734b │ │ │ ├── 0cc4fb07-8a08-4c16-bf3f-93d4462c7108 │ │ │ ├── 0cc4fb07-8a08-4c16-bf3f-93d4462c7108.lease │ │ │ ├── 0cc4fb07-8a08-4c16-bf3f-93d4462c7108.meta │ │ │ ├── 116191b2-99a1-437b-8b7d-ab1927b5f4aa │ │ │ ├── 116191b2-99a1-437b-8b7d-ab1927b5f4aa.lease │ │ │ └── 116191b2-99a1-437b-8b7d-ab1927b5f4aa.meta │ │ ├── 94f763e9-fd96-4bee-a6b2-31af841a918b │ │ │ ├── 5b1d3113-5cca-4582-9029-634b16338a2f │ │ │ ├── 5b1d3113-5cca-4582-9029-634b16338a2f.lease │ │ │ └── 5b1d3113-5cca-4582-9029-634b16338a2f.meta │ │ ├── 99c3c20c-a7bb-4851-8ae9-0b39c0c86b66 │ │ │ ├── 6ad3bd5e-6b21-4f10-83b4-07126ca14661 │ │ │ ├── 6ad3bd5e-6b21-4f10-83b4-07126ca14661.lease │ │ │ ├── 6ad3bd5e-6b21-4f10-83b4-07126ca14661.meta │ │ │ ├── 95c260da-8dfb-46be-9c4a-6a55eb3967d5 │ │ │ ├── 95c260da-8dfb-46be-9c4a-6a55eb3967d5.lease │ │ │ └── 95c260da-8dfb-46be-9c4a-6a55eb3967d5.meta │ │ ├── e56eb458-e459-4828-a268-82a9b91c60f8 │ │ │ ├── 5a7ca769-4657-4f65-92bd-72fa13cc2871 │ │ │ ├── 5a7ca769-4657-4f65-92bd-72fa13cc2871.lease │ │ │ └── 5a7ca769-4657-4f65-92bd-72fa13cc2871.meta │ │ └── f5336a0f-de34-482e-b437-47f9d276ed85 │ │ ├── 4f218ba9-a5a2-4f2f-ac6d-e5c9217d4ec4 │ │ ├── 4f218ba9-a5a2-4f2f-ac6d-e5c9217d4ec4.lease │ │ └── 4f218ba9-a5a2-4f2f-ac6d-e5c9217d4ec4.meta │ └── master └── __DIRECT_IO_TEST__
14 directories, 51 files Full find of "data_fastX" domains are attached.
Also, to make sure we don't have selinux issue on the hypervisor, you can change>selinux to permissive mode: > setenforce 0 This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear.>If this is the case please provide the output of: > ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: > setenforce 1 [root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side? I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely. Gluster was patched several days before RC3 , but all the storage-related issues lead me to that. The strage thing is that the HostedEngine is not affected, nor the "data" storage domain . Any ideas appreciated. I would like to find out what has caused this instead of just taking the shortcut to redeploy the whole setup. Best Regards,Strahil Nikolov

On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size of 64MB, looks like all the shards under /.shard could not be accessed. Could you share the following pieces of information: 1. brick logs of data_fast 2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast 3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) -Krutika
And dd continue to read data?!
I have never seen anything like this.
It will be helpful to run this with strace:
strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress
And share dd.strace.
Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely.
Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Hi Krutika, thanks for your assistance.Let me summarize some info about the volume: Volume Name: data_fastType: ReplicateVolume ID: 378804bf-2975-44d8-84c2-b541aa87f9efStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: gluster1:/gluster_bricks/data_fast/data_fastBrick2: gluster2:/gluster_bricks/data_fast/data_fastBrick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)Options Reconfigured:performance.client-io-threads: onnfs.disable: ontransport.address-family: inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.low-prio-threads: 32network.remote-dio: oncluster.eager-lock: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: fullcluster.locking-scheme: granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard: onuser.cifs: offcluster.choose-local: onclient.event-threads: 4server.event-threads: 4storage.owner-uid: 36storage.owner-gid: 36performance.strict-o-direct: onnetwork.ping-timeout: 30cluster.granular-entry-heal: enablecluster.enable-shared-storage: enable [root@ovirt1 ~]# gluster volume get engine all | grep shardfeatures.shard onfeatures.shard-block-size 64MBfeatures.shard-lru-limit 16384features.shard-deletion-rate 100 On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote: On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote: On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this?I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite: [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied You got permissions denied... 16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s >Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed.>Could you share the following pieces of information:>1. brick logs of data_fastAttached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me.I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening.>3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s As you can see , once root user reads the file -> vdsm user can also do that. Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux labelof the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change>selinux to permissive mode: > setenforce 0 This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear.>If this is the case please provide the output of: > ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: > setenforce 1 [root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side? I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention. If this is the last change, and it worked before, most likely. Nir_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Hi Krutika, I have enabled TRACE log level for the volume data_fast, but the issue is not much clear:FUSE reports: [2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f13a68ed472 from backend[2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied) While the BRICK logs on ovirt1/gluster1 report:2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop[2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall[2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases[2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only[2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm[2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks[2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control[2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied[2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied] В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hi Krutika, thanks for your assistance.Let me summarize some info about the volume: Volume Name: data_fastType: ReplicateVolume ID: 378804bf-2975-44d8-84c2-b541aa87f9efStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: gluster1:/gluster_bricks/data_fast/data_fastBrick2: gluster2:/gluster_bricks/data_fast/data_fastBrick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)Options Reconfigured:performance.client-io-threads: onnfs.disable: ontransport.address-family: inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.low-prio-threads: 32network.remote-dio: oncluster.eager-lock: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: fullcluster.locking-scheme: granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard: onuser.cifs: offcluster.choose-local: onclient.event-threads: 4server.event-threads: 4storage.owner-uid: 36storage.owner-gid: 36performance.strict-o-direct: onnetwork.ping-timeout: 30cluster.granular-entry-heal: enablecluster.enable-shared-storage: enable [root@ovirt1 ~]# gluster volume get engine all | grep shardfeatures.shard onfeatures.shard-block-size 64MBfeatures.shard-lru-limit 16384features.shard-deletion-rate 100 On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote: On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote: On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this?I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite: [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied You got permissions denied... 16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s >Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed.>Could you share the following pieces of information:>1. brick logs of data_fastAttached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me.I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening.>3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s As you can see , once root user reads the file -> vdsm user can also do that. Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux labelof the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change>selinux to permissive mode: > setenforce 0 This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear.>If this is the case please provide the output of: > ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: > setenforce 1 [root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side? I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention. If this is the last change, and it worked before, most likely. Nir_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Hi Nir,All, it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of oVirt running on v7 gluster).It seems strange that both my serious issues with oVirt are related to gluster issue (1st gluster v3 to v5 migration and now this one). I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes.Now both Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything. @Krutika, I guess you will ask for the logs, so let's switch to gluster-users about this one ? Best Regards,Strahil Nikolov В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hi Krutika, I have enabled TRACE log level for the volume data_fast, but the issue is not much clear:FUSE reports: [2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f13a68ed472 from backend[2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied) While the BRICK logs on ovirt1/gluster1 report:2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop[2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall[2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases[2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only[2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm[2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks[2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control[2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied[2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied][2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied] В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hi Krutika, thanks for your assistance.Let me summarize some info about the volume: Volume Name: data_fastType: ReplicateVolume ID: 378804bf-2975-44d8-84c2-b541aa87f9efStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: gluster1:/gluster_bricks/data_fast/data_fastBrick2: gluster2:/gluster_bricks/data_fast/data_fastBrick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter)Options Reconfigured:performance.client-io-threads: onnfs.disable: ontransport.address-family: inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.low-prio-threads: 32network.remote-dio: oncluster.eager-lock: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: fullcluster.locking-scheme: granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard: onuser.cifs: offcluster.choose-local: onclient.event-threads: 4server.event-threads: 4storage.owner-uid: 36storage.owner-gid: 36performance.strict-o-direct: onnetwork.ping-timeout: 30cluster.granular-entry-heal: enablecluster.enable-shared-storage: enable [root@ovirt1 ~]# gluster volume get engine all | grep shardfeatures.shard onfeatures.shard-block-size 64MBfeatures.shard-lru-limit 16384features.shard-deletion-rate 100 On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote: On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote: On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this?I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite: [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# lltotal 562145-rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f-rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease-rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progressdd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied You got permissions denied... 16+0 records in16+0 records out67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s >Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed.>Could you share the following pieces of information:>1. brick logs of data_fastAttached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me.I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening.>3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika And dd continue to read data?! I have never seen anything like this. It will be helpful to run this with strace: strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress And share dd.strace. Logs in /var/log/glusterfs/exportname.log will contain useful info for this test. [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s1280+0 records in1280+0 records out5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s As you can see , once root user reads the file -> vdsm user can also do that. Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_pathI'm attaching the output of the find I run, but this one should be enough:[root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux labelof the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change>selinux to permissive mode: > setenforce 0 This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear.>If this is the case please provide the output of: > ausearh -m AVC -ts today If the issue still exists, we eliminated selinux, and you can enable it again: > setenforce 1 [root@ovirt3 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt2 ~]# ausearch -m AVC -ts today<no matches>[root@ovirt1 ~]# ausearch -m AVC -ts today<no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side? I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention. If this is the last change, and it worked before, most likely. Nir_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Sorry about the late response. I looked at the logs. These errors are originating from posix-acl translator - *[2019-11-17 07:55:47.090065] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090174] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090299] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]* Jiffin/Raghavendra Talur, Can you help? -Krutika On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Nir,All,
it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of oVirt running on v7 gluster). It seems strange that both my serious issues with oVirt are related to gluster issue (1st gluster v3 to v5 migration and now this one).
I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes. Now both Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything.
@Krutika,
I guess you will ask for the logs, so let's switch to gluster-users about this one ?
Best Regards, Strahil Nikolov
В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
I have enabled TRACE log level for the volume data_fast,
but the issue is not much clear: FUSE reports:
[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f1 3a68ed472 from backend [2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied)
While the BRICK logs on ovirt1/gluster1 report: 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop [2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only [2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm [2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks [2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control [2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] [2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied [2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied]
В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
thanks for your assistance. Let me summarize some info about the volume:
Volume Name: data_fast Type: Replicate Volume ID: 378804bf-2975-44d8-84c2-b541aa87f9ef Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster1:/gluster_bricks/data_fast/data_fast Brick2: gluster2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: on cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: on client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on network.ping-timeout: 30 cluster.granular-entry-heal: enable cluster.enable-shared-storage: enable
[root@ovirt1 ~]# gluster volume get engine all | grep shard features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100
On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed. Could you share the following pieces of information: 1. brick logs of data_fast Attached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me. I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening. 3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika
And dd continue to read data?!
I have never seen anything like this.
It will be helpful to run this with strace:
strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress
And share dd.strace.
Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely.
Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Hi Krutika, Apparently, in context acl info got corrupted see brick logs [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] which resulted in the situation. There was one bug similar was reported https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and it got fixed in 6.6 release IMO https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw the issue when he upgraded from 6.5 to 6.6 One way to workaround is to perform a dummy setfacl(preferably using root) on the corrupted files which will forcefully fetch the acl info again from backend and update the context. Another approach to restart brick process(kill and vol start force) Regards, Jiffin ----- Original Message ----- From: "Krutika Dhananjay" <kdhananj@redhat.com> To: "Strahil Nikolov" <hunter86_bg@yahoo.com>, "Jiffin Thottan" <jthottan@redhat.com>, "raghavendra talur" <rtalur@redhat.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "Rafi Kavungal Chundattu Parambil" <rkavunga@redhat.com>, "users" <users@ovirt.org>, "gluster-user" <gluster-users@gluster.org> Sent: Monday, December 2, 2019 11:48:22 AM Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing Sorry about the late response. I looked at the logs. These errors are originating from posix-acl translator - *[2019-11-17 07:55:47.090065] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090174] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090299] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]* Jiffin/Raghavendra Talur, Can you help? -Krutika On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Nir,All,
it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of oVirt running on v7 gluster). It seems strange that both my serious issues with oVirt are related to gluster issue (1st gluster v3 to v5 migration and now this one).
I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes. Now both Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything.
@Krutika,
I guess you will ask for the logs, so let's switch to gluster-users about this one ?
Best Regards, Strahil Nikolov
В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
I have enabled TRACE log level for the volume data_fast,
but the issue is not much clear: FUSE reports:
[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f1 3a68ed472 from backend [2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied)
While the BRICK logs on ovirt1/gluster1 report: 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop [2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only [2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm [2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks [2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control [2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] [2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied [2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied]
В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
thanks for your assistance. Let me summarize some info about the volume:
Volume Name: data_fast Type: Replicate Volume ID: 378804bf-2975-44d8-84c2-b541aa87f9ef Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster1:/gluster_bricks/data_fast/data_fast Brick2: gluster2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: on cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: on client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on network.ping-timeout: 30 cluster.granular-entry-heal: enable cluster.enable-shared-storage: enable
[root@ovirt1 ~]# gluster volume get engine all | grep shard features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100
On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed. Could you share the following pieces of information: 1. brick logs of data_fast Attached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me. I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening. 3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika
And dd continue to read data?!
I have never seen anything like this.
It will be helpful to run this with strace:
strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress
And share dd.strace.
Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely.
Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...

Hi Jiffin, I'm observing the same behaviour like last time .This time after upgrading from gluster v 7.0 to 7.2 as follows: [2020-01-30 23:42:26.039822] T [MSGID: 0] [posix.c:2994:pl_lookup] 0-stack-trace: stack-address: 0x7f7b50003958, winding from data_fast-locks to data_fast-access-control [2020-01-30 23:42:26.039837] I [MSGID: 139001] [posix-acl.c:262:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:868c1fba-1f38-4e91-a664-6e49fddb3f27-GRAPH_ID:0-PID:9525-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-1-RECON_NO:-1, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2020-01-30 23:42:26.039857] D [MSGID: 0] [posix-acl.c:1055:posix_acl_lookup] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-access-control returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039868] D [MSGID: 0] [posix.c:2985:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-locks returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039879] D [MSGID: 0] [upcall.c:756:up_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-upcall returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039889] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-io-threads returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039901] T [marker.c:2907:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied[2020-01-30 23:42:26.039908] D [MSGID: 0] [marker.c:2944:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-marker returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039922] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, data_fast-index returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039933] D [MSGID: 0] [io-stats.c:2228:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f7b50003958, /gluster_bricks/data_fast/data_fast returned -1 error: Permission denied [Permission denied][2020-01-30 23:42:26.039949] E [MSGID: 115050] [server-rpc-fops_v2.c:157:server4_lookup_cbk] 0-data_fast-server: 16268: LOOKUP /.shard/f476698a-d8d2-4ab2-b9c4-4c276c2eef43.79 (be318638-e8a0-4c6d-977d-7a937aa84806/f476698a-d8d2-4ab2-b9c4-4c276c2eef43.79), client: CTX_ID:868c1fba-1f38-4e91-a664-6e49fddb3f27-GRAPH_ID:0-PID:9525-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-1-RECON_NO:-1, error-xlator: data_fast-access-control [Permission denied][2020-01-30 23:42:26.039964] T [rpcsvc.c:1529:rpcsvc_submit_generic] 0-rpc-service: Tx message: 344[2020-01-30 23:42:26.039971] T [rpcsvc.c:1065:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 368, payload: 344, rpc hdr: 24[2020-01-30 23:42:26.040007] T [rpcsvc.c:1581:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x24c1, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 27) to rpc-transport (tcp.data_fast-server) Are you sure it's fixed?If yes, then this looks like a regresion. By the way, I'm not using ACL - how can I permanently disable it ,so even if ACL gets corrupted -> my oVirt will survive ? Thanks in advance for your answer. Best Regards,Strahil Nikolov В сряда, 4 декември 2019 г., 13:28:39 ч. Гринуич+2, Jiffin Thottan <jthottan@redhat.com> написа: Hi Krutika, Apparently, in context acl info got corrupted see brick logs [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] which resulted in the situation. There was one bug similar was reported https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and it got fixed in 6.6 release IMO https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw the issue when he upgraded from 6.5 to 6.6 One way to workaround is to perform a dummy setfacl(preferably using root) on the corrupted files which will forcefully fetch the acl info again from backend and update the context. Another approach to restart brick process(kill and vol start force) Regards, Jiffin ----- Original Message ----- From: "Krutika Dhananjay" <kdhananj@redhat.com> To: "Strahil Nikolov" <hunter86_bg@yahoo.com>, "Jiffin Thottan" <jthottan@redhat.com>, "raghavendra talur" <rtalur@redhat.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "Rafi Kavungal Chundattu Parambil" <rkavunga@redhat.com>, "users" <users@ovirt.org>, "gluster-user" <gluster-users@gluster.org> Sent: Monday, December 2, 2019 11:48:22 AM Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing Sorry about the late response. I looked at the logs. These errors are originating from posix-acl translator - *[2019-11-17 07:55:47.090065] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090174] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497: LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7 (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7), client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, error-xlator: data_fast-access-control [Permission denied][2019-11-17 07:55:47.090299] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:36,gid:36,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]* Jiffin/Raghavendra Talur, Can you help? -Krutika On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Nir,All,
it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached screenshot of oVirt running on v7 gluster). It seems strange that both my serious issues with oVirt are related to gluster issue (1st gluster v3 to v5 migration and now this one).
I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all nodes. Now both Engine and all my VMs are back online - so if you hit issues with 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before deciding to wipe everything.
@Krutika,
I guess you will ask for the logs, so let's switch to gluster-users about this one ?
Best Regards, Strahil Nikolov
В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
I have enabled TRACE log level for the volume data_fast,
but the issue is not much clear: FUSE reports:
[2019-11-25 21:31:53.478130] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=6d9ed2e5-d4f2-4749-839b-2f1 3a68ed472 from backend [2019-11-25 21:32:43.564694] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565689] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-25 21:32:43.565770] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fbf40005ea8 (Permission denied)
While the BRICK logs on ovirt1/gluster1 report: 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop [2019-11-25 21:32:43.564194] T [MSGID: 0] [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall to data_fast-leases [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases to data_fast-read-only [2019-11-25 21:32:43.564222] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-read-only to data_fast-worm [2019-11-25 21:32:43.564230] T [MSGID: 0] [defaults.c:2766:default_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-worm to data_fast-locks [2019-11-25 21:32:43.564241] T [MSGID: 0] [posix.c:2897:pl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-locks to data_fast-access-control [2019-11-25 21:32:43.564254] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: client: CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] [2019-11-25 21:32:43.564268] D [MSGID: 0] [posix-acl.c:1057:posix_acl_lookup] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-access-control returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564279] D [MSGID: 0] [posix.c:2888:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-locks returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564289] D [MSGID: 0] [upcall.c:769:up_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-upcall returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564302] D [MSGID: 0] [defaults.c:1349:default_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-io-threads returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564313] T [marker.c:2918:marker_lookup_cbk] 0-data_fast-marker: lookup failed with Permission denied [2019-11-25 21:32:43.564320] D [MSGID: 0] [marker.c:2955:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-marker returned -1 error: Permission denied [Permission denied] [2019-11-25 21:32:43.564334] D [MSGID: 0] [index.c:2070:index_lookup_cbk] 0-stack-trace: stack-address: 0x7fc02c00bbf8, data_fast-index returned -1 error: Permission denied [Permission denied]
В понеделник, 25 ноември 2019 г., 23:10:41 ч. Гринуич+2, Strahil Nikolov < hunter86_bg@yahoo.com> написа:
Hi Krutika,
thanks for your assistance. Let me summarize some info about the volume:
Volume Name: data_fast Type: Replicate Volume ID: 378804bf-2975-44d8-84c2-b541aa87f9ef Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster1:/gluster_bricks/data_fast/data_fast Brick2: gluster2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: on cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: on client.event-threads: 4 server.event-threads: 4 storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on network.ping-timeout: 30 cluster.granular-entry-heal: enable cluster.enable-shared-storage: enable
[root@ovirt1 ~]# gluster volume get engine all | grep shard features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100
On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sabose@redhat.com> wrote:
On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi All,
another clue in the logs : [2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied] [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied) [2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend [2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied] [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied) [2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied] [2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied] [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied)
Anyone got an idea why is it happening? I checked user/group and selinux permissions - all OK
Can you share the commands (and output) used to check this? I first thought that the file is cached in memory and that's why vdsm user can read the file , but the following shows opposite:
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll total 562145 -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29 5b1d3113-5cca-4582-9029-634b16338a2f -rw-rw----. 1 vdsm kvm 1048576 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.lease -rw-r--r--. 1 vdsm kvm 313 Nov 11 14:11 5b1d3113-5cca-4582-9029-634b16338a2f.meta [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches
I would use iflag=direct instead, no need to mess with caches. Vdsm always use direct I/O.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission denied
You got permissions denied...
16+0 records in 16+0 records out 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
Seems like it could read upto ~67MB successfully before it encountered 'Permission denied' errors. Assuming a shard-block-size >of 64MB, looks like all the shards under /.shard could not be accessed. Could you share the following pieces of information: 1. brick logs of data_fast Attached in data_fast-brick-logs.tgz
2. ls -la of .shard relative to the bricks (NOT the mount) on all the bricks of data_fast Not sure if I understood you correctly, so I ran "ls -lad /gluster_bricks/data_fast/data_fast/.shard". If it's not what you wanted to see - just correct me. I have run multiple "find" commands with "-exec chown vdsm:kvm {} \;" , just to be sure that this is not happening. 3. and ls -la of all shards under .shard of data_fast (perhaps a handful of them have root permission assigned somehow which is causing access to be denied? Perhaps while resolving pending heals manually? ) All shards seem to be owned by "vdsm:kvm" with 660.
-Krutika
And dd continue to read data?!
I have never seen anything like this.
It will be helpful to run this with strace:
strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct bs=8M status=progress
And share dd.strace.
Logs in /var/log/glusterfs/exportname.log will contain useful info for this test.
[root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 > /proc/sys/vm/drop_caches [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s 1280+0 records in 1280+0 records out 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
As you can see , once root user reads the file -> vdsm user can also do that.
Smells like issue on gluster side.
I would try this on the hypervisor to check what vdsm/qemu see:
$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path I'm attaching the output of the find I run, but this one should be enough: [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/ -not -user vdsm -print
A full output of ls -lahRZ, showing user, group, permissions bits, and selinux label of the entire tree will be more useful.
Also, to make sure we don't have selinux issue on the hypervisor, you can change selinux to permissive mode:
> setenforce 0
This is the first thing I did and the systems were still in permissive when I tried again.I'm 99.99% sure it's not selinux.
And then try again. If this was selinux issue the permission denied issue will disappear. If this is the case please provide the output of:
> ausearh -m AVC -ts today
If the issue still exists, we eliminated selinux, and you can enable it again:
> setenforce 1
[root@ovirt3 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt2 ~]# ausearch -m AVC -ts today <no matches> [root@ovirt1 ~]# ausearch -m AVC -ts today <no matches>
So this is not selinux on the hypervisor. I wonder if it can be selinux on the gluster side?
I have a vague feeling that the issue is related to gluster v6.5 to 6.6 upgrade which I several days before... So if any logs are needed (or debug enabled), just mention.
If this is the last change, and it worked before, most likely.
Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJLBNDFKU27J24...

On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi All, another clue in the logs :[2019-11-21 00:29:50.536631] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536798] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.536959] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:29:50.537007] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae [Permission denied][2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12458: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission denied)[2019-11-21 00:30:01.177665] I [MSGID: 133022] [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend[2019-11-21 00:30:13.132756] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.132824] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133217] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:13.133238] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c [Permission denied][2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12660: READ => -1 gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission denied)[2019-11-21 00:30:38.489449] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489520] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489669] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2: remote operation failed. Path: /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6 (00000000-0000-0000-0000-000000000000) [Permission denied][2019-11-21 00:30:38.489717] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4 [Permission denied][2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 12928: READ => -1 gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission denied) Anyone got an idea why is it happening?I checked user/group and selinux permissions - all OKI even replaced one of the disks and healed , but the result is the same for all my VMs.
Have you checked the permission for user/group are set correctly across all the bricks in the cluster?Seem OK: root@ovirt1 ~]# for i in ovirt{1..3}; do echo $i; ssh $i "find /gluster_bricks/*/*/[0-9]* -not -user vdsm -not -type l -print" ; echo;echo; doneovirt1
ovirt2 ovirt3
What does ls -la on the images directory from mount of the volume show you.Attached the output. Adding Krutika and Rafi as they ran into a similar issue in the past.Most probably the root's dd is putting the image into the ram ,and then the second dd (via sudo -u vdsm) gets it from linux cache...
Best Regards,Strahil Nikolov В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov <hunter86_bg@yahoo.com> написа: Hello All, my engine is back online , but I'm still having difficulties to make vdsm powerup the systems.I think that the events generated today can lead me to the right direction(just an example , many more are there): VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases', offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or directory') I will try to collect a fresh log and see what is it complaining about this time. Best Regards,Strahil Nikolov
Hi Sahina, I have a strange situation: 1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test bs=4M' the command fails on aprox 60MB. 2. If I run same command as root , remove the file and then run again via vdsm user -> this time no i/o error reported.
My guess is that I need to check what's going on the bricks themselve ...
Best Regards, Strahil Nikolov
В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <sabose@redhat.com> написа: On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Hi Sahina, Sadly engine logs have no errors.I've got only an I/O error, but in the debug of the vdsm I can clearly see that "qemu-img" is giving an "OK".During the upgrade I got some metadata files pending heal, but I have recovered the conflict manually and should be OK.Today I have defined one of the VMs manually (virsh define) and then started it , but the issue is the same.It seems to be storage-related issue,as VMs that are on specific domain can be started , but most of my VMs are on the fast storage domains and none of them can be started. After the gluster snapshot restore , the engine is having issues and I have to separately investigate that (as I poweroff my HostedEngine before creating the snapshot). The logs can be find at : https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM Any ideas where to look at , as I can definitely read (using "dd if=disk" or qemu-img info) the disks of the rhel7 VM ? The vdsm logs have this:2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075) 2019-11-17 10:21:23,892+0200 INFO (libvirt/events) [virt.vm] (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError (vm:6062) 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events) [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830, "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch", "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name": "sda", "path": "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181) Can you check the permissions of the file /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f. Was it reset after upgrade? Are you able to copy this file to a different location and try running a VM with this image? Any errors in the mount log of gluster1:_data__fast volume? Best Regards,Strahil Nikolov В понеделник, 18 ноември 2019 г., 11:38:13 ч. Гринуич+2, Sahina Bose <sabose@redhat.com> написа: On Mon, Nov 18, 2019 at 2:58 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote: +Sahina Bose +Gobinda Das +Nir Soffer +Tal Nisan can you please help here? Il giorno dom 17 nov 2019 alle ore 16:00 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto: So far, I have rolled back the engine and the 3 hosts - still cannot manipulate the storage.It seems that gluster itself is working, but vdsm and the oVirt stack cannot access the storage - cannot create new VM disks, cannot start a VM and I'm on the verge of redeploy. Any errors in vdsm logs? engine logs? Best Regards,Strahil Nikolov В събота, 16 ноември 2019 г., 15:40:25 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil Nikolov On Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo@redhat.com> ha scritto: Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid@ubimet.com> ha scritto: Hello, I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. I'm talking about this bug:https://bugzilla.redhat.com/show_bug.cgi?id=1749202 I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaaovirt-engine-4.3.6.6 ovirt-engine-4.3.6.7 ovirt-engine-4.3.7.0 ovirt-engine-4.3.7.1 So the fix is already included in release oVirt 4.3.6. Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? BR Florian Schmid Von: "Sandro Bonazzola" <sbonazzo@redhat.com> An: "users" <users@ovirt.org> Gesendet: Mittwoch, 13. November 2019 13:34:59 Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release See the release notes [1] for known issues, new features and bugs fixed. While testing this release candidate please note that oVirt node now includes: - ansible 2.9.0 - GlusterFS 6.6 Notes: - oVirt Appliance is already available - oVirt Node is already available Additional Resources: * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.7/ [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHTSMH... -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3ZPGKJ4JJWVFD3...
participants (7)
-
Jiffin Thottan
-
Krutika Dhananjay
-
Nir Soffer
-
Sahina Bose
-
Sandro Bonazzola
-
Strahil
-
Strahil Nikolov