Re: Fwd: [Gluster-users] Announcing Gluster release 5.5

Hi Darrel, Will it fix the cluster brick sudden death issue ? Best Regards, Strahil NikolovOn Mar 21, 2019 21:56, Darrell Budic <budic@onholyground.com> wrote:
This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration problems many ovirt users have encountered.
I’ll try and test it out this weekend and report back. If anyone else gets a chance to check it out, let us know how it goes!
-Darrell
Begin forwarded message:
From: Shyam Ranganathan <srangana@redhat.com> Subject: [Gluster-users] Announcing Gluster release 5.5 Date: March 21, 2019 at 6:06:33 AM CDT To: announce@gluster.org, gluster-users Discussion List <gluster-users@gluster.org> Cc: GlusterFS Maintainers <maintainers@gluster.org>
The Gluster community is pleased to announce the release of Gluster 5.5 (packages available at [1]).
Release notes for the release can be found at [3].
Major changes, features and limitations addressed in this release:
- Release 5.4 introduced an incompatible change that prevented rolling upgrades, and hence was never announced to the lists. As a result we are jumping a release version and going to 5.5 from 5.3, that does not have the problem.
Thanks, Gluster community
[1] Packages for 5.5: https://download.gluster.org/pub/gluster/glusterfs/5/5.5/
[2] Release notes for 5.5: https://docs.gluster.org/en/latest/release-notes/5.5/ _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users

I’m not quite done with my test upgrade to ovirt 4.3.x with gluster 5.5, but so far it’s looking good. I have NOT encountered the upgrade bugs listed as resolved in the 5.5 release notes. Strahil, I didn’t encounter the brick death issue and don’t have a bug ID handy for it, but so far I haven’t had any bricks die. I’m moving the last node of my hyperconverged test environment over today, and will followup again tomorrow on it. Separately, I upgraded my production nodes from ovirt 4.3.1 to 4.3.2 (they have a separate gluster server cluster which is still on 3.12.15), which seems to have moved to the gluster 5.3.2 release. While 5.3.0 clients were not having any trouble talking to my 3.12.15 servers, 5.3.2 hit https://bugzilla.redhat.com/show_bug.cgi?id=1651246 <https://bugzilla.redhat.com/show_bug.cgi?id=1651246>, causing disconnects to one of my servers (but only one, oddly enough), raising the load on my other two servers and causing a lot of continuous healing. This lead to some stability issues with my hosted engine and general sluggishness of the ovirt UI. I also experienced problems migrating from 4.3.1 nodes, but that seems to have been related to the underlying gluster issues, as it seems to have cleared up onceI resolved the gluster problems. Since I was testing gluster 5.5 already, I moved my nodes to gluster 5.5 (instead of rolling them back) as the bug above was resolved in that version. That did the trick, and my cluster is back to normal and behaving properly again. So my gluster 5.5 experience has been positive so far, and it looks like 5.3 is a version for laying down and avoiding. I’ll update again tomorrow, and then flag the centos maintainers about 5.5 stability so it gets out of the -testing repo if all continues to go well. -Darrell
On Mar 21, 2019, at 3:39 PM, Strahil <hunter86_bg@yahoo.com> wrote:
Hi Darrel,
Will it fix the cluster brick sudden death issue ?
Best Regards, Strahil Nikolov
On Mar 21, 2019 21:56, Darrell Budic <budic@onholyground.com> wrote: This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration problems many ovirt users have encountered.
I’ll try and test it out this weekend and report back. If anyone else gets a chance to check it out, let us know how it goes!
-Darrell
Begin forwarded message:
From: Shyam Ranganathan <srangana@redhat.com <mailto:srangana@redhat.com>> Subject: [Gluster-users] Announcing Gluster release 5.5 Date: March 21, 2019 at 6:06:33 AM CDT To: announce@gluster.org <mailto:announce@gluster.org>, gluster-users Discussion List <gluster-users@gluster.org <mailto:gluster-users@gluster.org>> Cc: GlusterFS Maintainers <maintainers@gluster.org <mailto:maintainers@gluster.org>>
The Gluster community is pleased to announce the release of Gluster 5.5 (packages available at [1]).
Release notes for the release can be found at [3].
Major changes, features and limitations addressed in this release:
- Release 5.4 introduced an incompatible change that prevented rolling upgrades, and hence was never announced to the lists. As a result we are jumping a release version and going to 5.5 from 5.3, that does not have the problem.
Thanks, Gluster community
[1] Packages for 5.5: https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ <https://download.gluster.org/pub/gluster/glusterfs/5/5.5/>
[2] Release notes for 5.5: https://docs.gluster.org/en/latest/release-notes/5.5 <https://docs.gluster.org/en/latest/release-notes/5.5>/ _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>

Following up on this, my test/dev cluster is now completely upgraded to ovirt 4.3.2-1 and gluster5.5 and I’ve bumped the op-version on the gluster volumes. It’s behaving normally and gluster is happy, no excessive healing or crashing bricks. I did encounter https://bugzilla.redhat.com/show_bug.cgi?id=1677160 <https://bugzilla.redhat.com/show_bug.cgi?id=1677160> on my production cluster (with gluster 5.5 clients and 3.12.15 servers) and am proceeding to upgrade my gluster servers to 5.5 now that I’m happy with it on my dev cluster. A little quicker that I’d like, but it seems to be behaving and I was also in the middle of adding disk to my servers, and have to restart them (or at least gluster), so I’m going for it. After I finish this, I’ll test gluster 6 out. -Darrell
On Mar 25, 2019, at 11:04 AM, Darrell Budic <budic@onholyground.com> wrote:
I’m not quite done with my test upgrade to ovirt 4.3.x with gluster 5.5, but so far it’s looking good. I have NOT encountered the upgrade bugs listed as resolved in the 5.5 release notes. Strahil, I didn’t encounter the brick death issue and don’t have a bug ID handy for it, but so far I haven’t had any bricks die. I’m moving the last node of my hyperconverged test environment over today, and will followup again tomorrow on it.
Separately, I upgraded my production nodes from ovirt 4.3.1 to 4.3.2 (they have a separate gluster server cluster which is still on 3.12.15), which seems to have moved to the gluster 5.3.2 release. While 5.3.0 clients were not having any trouble talking to my 3.12.15 servers, 5.3.2 hit https://bugzilla.redhat.com/show_bug.cgi?id=1651246 <https://bugzilla.redhat.com/show_bug.cgi?id=1651246>, causing disconnects to one of my servers (but only one, oddly enough), raising the load on my other two servers and causing a lot of continuous healing. This lead to some stability issues with my hosted engine and general sluggishness of the ovirt UI. I also experienced problems migrating from 4.3.1 nodes, but that seems to have been related to the underlying gluster issues, as it seems to have cleared up onceI resolved the gluster problems. Since I was testing gluster 5.5 already, I moved my nodes to gluster 5.5 (instead of rolling them back) as the bug above was resolved in that version. That did the trick, and my cluster is back to normal and behaving properly again.
So my gluster 5.5 experience has been positive so far, and it looks like 5.3 is a version for laying down and avoiding. I’ll update again tomorrow, and then flag the centos maintainers about 5.5 stability so it gets out of the -testing repo if all continues to go well.
-Darrell
On Mar 21, 2019, at 3:39 PM, Strahil <hunter86_bg@yahoo.com <mailto:hunter86_bg@yahoo.com>> wrote:
Hi Darrel,
Will it fix the cluster brick sudden death issue ?
Best Regards, Strahil Nikolov
On Mar 21, 2019 21:56, Darrell Budic <budic@onholyground.com <mailto:budic@onholyground.com>> wrote: This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration problems many ovirt users have encountered.
I’ll try and test it out this weekend and report back. If anyone else gets a chance to check it out, let us know how it goes!
-Darrell
Begin forwarded message:
From: Shyam Ranganathan <srangana@redhat.com <mailto:srangana@redhat.com>> Subject: [Gluster-users] Announcing Gluster release 5.5 Date: March 21, 2019 at 6:06:33 AM CDT To: announce@gluster.org <mailto:announce@gluster.org>, gluster-users Discussion List <gluster-users@gluster.org <mailto:gluster-users@gluster.org>> Cc: GlusterFS Maintainers <maintainers@gluster.org <mailto:maintainers@gluster.org>>
The Gluster community is pleased to announce the release of Gluster 5.5 (packages available at [1]).
Release notes for the release can be found at [3].
Major changes, features and limitations addressed in this release:
- Release 5.4 introduced an incompatible change that prevented rolling upgrades, and hence was never announced to the lists. As a result we are jumping a release version and going to 5.5 from 5.3, that does not have the problem.
Thanks, Gluster community
[1] Packages for 5.5: https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ <https://download.gluster.org/pub/gluster/glusterfs/5/5.5/>
[2] Release notes for 5.5: https://docs.gluster.org/en/latest/release-notes/5.5 <https://docs.gluster.org/en/latest/release-notes/5.5>/ _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5SS24L27QNSR2M...

Dear All, I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help. I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta - brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right. Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2 - The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% - It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) - It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself. Think that's about it. Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better. Best Olaf

Forgot one more issue with ovirt, on some hypervisor nodes we also run docker, it appears vdsm tries to get an hold of the interfaces docker creates/removes and this is spamming the vdsm and engine logs with; Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] veth7611c53 is not present in the system'} Couldn’t really find a way to let vdsm ignore those interfaces.

Olaf, thank you very much for this feedback, I was just about to upgrade my 12 nodes 4.2.8 production cluster. And it seem so that you speared me of a lot of trouble. Though, I thought that 4.3.1 comes with gluster 5.5 which has been solved the issues, and the upgrade procedure works seemless. Not sure now how long or what oVirt version to wait for before upgrading my cluster... On Thu, Mar 28, 2019, 18:48 <olaf.buitelaar@gmail.com> wrote:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2 - The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% - It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) - It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...

Questions/comments inline ... On Thu, Mar 28, 2019 at 10:18 PM <olaf.buitelaar@gmail.com> wrote:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent)
For this, could you share the volume-profile output specifically for the affected volume(s)? Here's what you need to do - 1. # gluster volume profile $VOLNAME stop 2. # gluster volume profile $VOLNAME start 3. Run the test inside the vm wherein you see bad performance 4. # gluster volume profile $VOLNAME info # save the output of this command into a file 5. # gluster volume profile $VOLNAME stop 6. and attach the output file gotten in step 4 - VM’s randomly pause on un
known storage errors, which are “stale file’s”. corresponding log; Lookup
on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle]
Could you share the complete gluster client log file (it would be a filename matching the pattern rhev-data-center-mnt-glusterSD-*) Also the output of `gluster volume info $VOLNAME`
- Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
Adding DHT and readdir-ahead maintainers regarding entries getting listed twice. @Nithya Balachandran <nbalacha@redhat.com> ^^ @Gowdappa, Raghavendra <rgowdapp@redhat.com> ^^ @Poornima Gurusiddaiah <pgurusid@redhat.com> ^^
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
Did you mean 5 brick processes for a single brick directory? +Mohit Agrawal <moagrawa@redhat.com> ^^ -Krutika
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2 - The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% - It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) - It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...

On Fri, Mar 29, 2019 at 12:47 PM Krutika Dhananjay <kdhananj@redhat.com> wrote:
Questions/comments inline ...
On Thu, Mar 28, 2019 at 10:18 PM <olaf.buitelaar@gmail.com> wrote:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent)
For this, could you share the volume-profile output specifically for the affected volume(s)? Here's what you need to do -
1. # gluster volume profile $VOLNAME stop 2. # gluster volume profile $VOLNAME start 3. Run the test inside the vm wherein you see bad performance 4. # gluster volume profile $VOLNAME info # save the output of this command into a file 5. # gluster volume profile $VOLNAME stop 6. and attach the output file gotten in step 4
- VM’s randomly pause on un
known storage errors, which are “stale file’s”. corresponding log; Lookup
on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle]
Could you share the complete gluster client log file (it would be a filename matching the pattern rhev-data-center-mnt-glusterSD-*) Also the output of `gluster volume info $VOLNAME`
- Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
Adding DHT and readdir-ahead maintainers regarding entries getting listed twice. @Nithya Balachandran <nbalacha@redhat.com> ^^ @Gowdappa, Raghavendra <rgowdapp@redhat.com> ^^ @Poornima Gurusiddaiah <pgurusid@redhat.com> ^^
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
Did you mean 5 brick processes for a single brick directory? +Mohit Agrawal <moagrawa@redhat.com> ^^
Mohit - Could this be because of missing the following commit in release-5 branch? It might be worth to backport this fix. commit 66986594a9023c49e61b32769b7e6b260b600626 Author: Mohit Agrawal <moagrawal@redhat.com> Date: Fri Mar 1 13:41:24 2019 +0530 glusterfsd: Multiple shd processes are spawned on brick_mux environment Problem: Multiple shd processes are spawned while starting volumes in the loop on brick_mux environment.glusterd spawn a process based on a pidfile and shd daemon is taking some time to update pid in pidfile due to that glusterd is not able to get shd pid Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed the code to update pidfile in parent for any gluster daemon after getting the status of forking child in parent.To resolve the same correct the condition update pidfile in parent only for glusterd and for rest of the daemon pidfile is updated in child Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81 fixes: bz#1684404 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
-Krutika
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2 - The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% - It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) - It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users

Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>

Dear Krutika, 1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp. Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare. One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade. So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok. Also dd shows performance I would more or less expect; dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync 1+0 records in 1+0 records out 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s if=/dev/urandom of=/data/test_file bs=1024 count=1000000 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io. 2. I’ve attached the mnt log and volume info, but I couldn’t find anything relevant in in those logs. I think this is because we run the VM’s with libgfapi; [root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported LibgfApiSupported: true version: 4.2 LibgfApiSupported: true version: 4.1 LibgfApiSupported: true version: 4.3 And I can confirm the qemu process is invoked with the gluster:// address for the images. The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07 Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle] 3. yes I see multiple instances for the same brick directory, like; /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154 I’ve made an export of the output of ps from the time I observed these multiple processes. In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory. Thanks Olaf Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <sbonazzo@redhat.com>:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>

I’ve also encounter multiple brick processes (glusterfsd) being spawned per brick directory on gluster 5.5 while upgrading from 3.12.15. In my case, it’s on a stand alone server cluster that doesn’t have ovirt installed, so it seems to be gluster itself. Haven’t had the chance to followup on some bug reports yet, but hopefully in the next day or so...
On Mar 29, 2019, at 9:39 AM, Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp. Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare. One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade. So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok. Also dd shows performance I would more or less expect; dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync 1+0 records in 1+0 records out 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s if=/dev/urandom of=/data/test_file bs=1024 count=1000000 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
2. I’ve attached the mnt log and volume info, but I couldn’t find anything relevant in in those logs. I think this is because we run the VM’s with libgfapi; [root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported LibgfApiSupported: true version: 4.2 LibgfApiSupported: true version: 4.1 LibgfApiSupported: true version: 4.3 And I can confirm the qemu process is invoked with the gluster:// address for the images. The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07 Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
3. yes I see multiple instances for the same brick directory, like; /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes. In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>>:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com <mailto:olaf.buitelaar@gmail.com>> ha scritto: Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 <https://bugzilla.redhat.com/show_bug.cgi?id=1693998> to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS <https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS> ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 <https://bugzilla.redhat.com/show_bug.cgi?id=1666795> but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt <https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt> ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/>
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>
<https://red.ht/sig> <multi-glusterfsd-vol1.txt><profile_data.txt><multi-glusterfsd-vol4.txt><profile_data2.txt><multi-glusterfsd-vol2.txt><multi-glusterfsd-vol3.txt><ovirt-kube-volume-info.txt><rhev-data-center-mnt-glusterSD-10.32.9.20__ovirt-kube.zip>_______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/>

Adding back gluster-users Comments inline ... On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off let me know if this helps. 2. I’ve attached the mnt log and volume info, but I couldn’t find anything
relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^ -Krutika Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <sbonazzo@redhat.com
:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Dear Krutika, 1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/sr... line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064... suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality - performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me. - cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here? It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set <VOL> command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on? 2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 Thanks Olaf ps. sorry needed to resend since it exceed the file limit Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <kdhananj@redhat.com>:
Adding back gluster-users Comments inline ...
On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off
let me know if this helps.
2. I’ve attached the mnt log and volume info, but I couldn’t find anything
relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^
-Krutika
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <sbonazzo@redhat.com
:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Hi Olaf, As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also. Regards Mohit Agrawal On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/sr... line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064... suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality
- performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me.
- cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here?
It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set <VOL> command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on?
2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
Thanks Olaf
ps. sorry needed to resend since it exceed the file limit
Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <kdhananj@redhat.com
:
Adding back gluster-users Comments inline ...
On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off
let me know if this helps.
2. I’ve attached the mnt log and volume info, but I couldn’t find
anything relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^
-Krutika
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < sbonazzo@redhat.com>:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Dear Mohit, Sorry i thought Krutika was referring to the ovirt-kube brick logs. due the large size (18MB compressed), i've placed the files here; https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 Also i see i've attached the wrong files, i intended to attach profile_data4.txt | profile_data3.txt Sorry for the confusion. Thanks Olaf Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal <moagrawa@redhat.com>:
Hi Olaf,
As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only
Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also.
Regards Mohit Agrawal
On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/sr... line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064... suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality
- performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me.
- cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here?
It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set <VOL> command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on?
2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
Thanks Olaf
ps. sorry needed to resend since it exceed the file limit
Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <kdhananj@redhat.com
:
Adding back gluster-users Comments inline ...
On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off
let me know if this helps.
2. I’ve attached the mnt log and volume info, but I couldn’t find
anything relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^
-Krutika
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < sbonazzo@redhat.com>:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
Dear All,
I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different experience. After first trying a test upgrade on a 3 node setup, which went fine. i headed to upgrade the 9 node production platform, unaware of the backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata was missing or couldn't be accessed. Restoring this file by getting a good copy of the underlying bricks, removing the file from the underlying bricks where the file was 0 bytes and mark with the stickybit, and the corresponding gfid's. Removing the file from the mount point, and copying back the file on the mount point. Manually mounting the engine domain, and manually creating the corresponding symbolic links in /rhev/data-center and /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was root.root), i was able to start the HA engine again. Since the engine was up again, and things seemed rather unstable i decided to continue the upgrade on the other nodes suspecting an incompatibility in gluster versions, i thought would be best to have them all on the same version rather soonish. However things went from bad to worse, the engine stopped again, and all vm’s stopped working as well. So on a machine outside the setup and restored a backup of the engine taken from version 4.2.8 just before the upgrade. With this engine I was at least able to start some vm’s again, and finalize the upgrade. Once the upgraded, things didn’t stabilize and also lose 2 vm’s during the process due to image corruption. After figuring out gluster 5.3 had quite some issues I was as lucky to see gluster 5.5 was about to be released, on the moment the RPM’s were available I’ve installed those. This helped a lot in terms of stability, for which I’m very grateful! However the performance is unfortunate terrible, it’s about 15% of what the performance was running gluster 3.12.15. It’s strange since a simple dd shows ok performance, but our actual workload doesn’t. While I would expect the performance to be better, due to all improvements made since gluster version 3.12. Does anybody share the same experience? I really hope gluster 6 will soon be tested with ovirt and released, and things start to perform and stabilize again..like the good old days. Of course when I can do anything, I’m happy to help.
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
I think the following short list of issues we have after the migration; Gluster 5.5; - Poor performance for our workload (mostly write dependent) - VM’s randomly pause on unknown storage errors, which are “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] - Some files are listed twice in a directory (probably related the stale file issue?) Example; ls -la /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ total 3081 drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 1a7cf259-6b29-421d-9688-b25dfaafb13c -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
- brick processes sometimes starts multiple times. Sometimes I’ve 5 brick processes for a single volume. Killing all glusterfsd’s for the volume on the machine and running gluster v start <vol> force usually just starts one after the event, from then on things look all right.
May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
Ovirt 4.3.2.1-1.el7 - All vms images ownership are changed to root.root after the vm is shutdown, probably related to; https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped to the HA engine. I’m still in compatibility mode 4.2 for the cluster and for the vm’s, but upgraded to version ovirt 4.3.2
Ryan?
- The network provider is set to ovn, which is fine..actually cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
Miguel? Dominik?
- It seems on all nodes vdsm tries to get the the stats for the HA engine, which is filling the logs with (not sure if this is new); [api.virt] FINISH getStats return={'status': {'message': "Virtual machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
Simone?
- It seems the package os_brick [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw another message, so I suspect this will already be resolved shortly - The machine I used to run the backup HA engine, doesn’t want to get removed from the hosted-engine –vm-status, not even after running; hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine --clean-metadata --force-clean from the machine itself.
Simone?
Think that's about it.
Don’t get me wrong, I don’t want to rant, I just wanted to share my experience and see where things can made better.
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
Best Olaf _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Hi, Thanks Olaf for sharing the relevant logs. @Atin, You are right patch https://review.gluster.org/#/c/glusterfs/+/22344/ will resolve the issue running multiple brick instance for same brick. As we can see in below logs glusterd is trying to start the same brick instance twice at the same time [2019-04-01 10:23:21.752401] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:23:30.348091] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:13.353396] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:24.253764] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine We are seeing below message between starting of two instances The message "E [MSGID: 101012] [common-utils.c:4075:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/ovirt-engine/10.32.9.5-data-gfs-bricks-brick1-ovirt-engine.pid" repeated 2 times between [2019-04-01 10:23:21.748492] and [2019-04-01 10:23:21.752432] I will backport the same. Thanks, Mohit Agrawal On Wed, Apr 3, 2019 at 3:58 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Mohit,
Sorry i thought Krutika was referring to the ovirt-kube brick logs. due the large size (18MB compressed), i've placed the files here; https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 Also i see i've attached the wrong files, i intended to attach profile_data4.txt | profile_data3.txt Sorry for the confusion.
Thanks Olaf
Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal <moagrawa@redhat.com>:
Hi Olaf,
As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only
Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also.
Regards Mohit Agrawal
On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/sr... line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064... suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality
- performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me.
- cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here?
It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set <VOL> command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on?
2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
Thanks Olaf
ps. sorry needed to resend since it exceed the file limit
Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay < kdhananj@redhat.com>:
Adding back gluster-users Comments inline ...
On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar < olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off
let me know if this helps.
2. I’ve attached the mnt log and volume info, but I couldn’t find
anything relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^
-Krutika
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < sbonazzo@redhat.com>:
Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> ha scritto:
> Dear All, > > I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While > previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a > different experience. After first trying a test upgrade on a 3 node setup, > which went fine. i headed to upgrade the 9 node production platform, > unaware of the backward compatibility issues between gluster 3.12.15 -> > 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. > Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata > was missing or couldn't be accessed. Restoring this file by getting a good > copy of the underlying bricks, removing the file from the underlying bricks > where the file was 0 bytes and mark with the stickybit, and the > corresponding gfid's. Removing the file from the mount point, and copying > back the file on the mount point. Manually mounting the engine domain, and > manually creating the corresponding symbolic links in /rhev/data-center and > /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was > root.root), i was able to start the HA engine again. Since the engine was > up again, and things seemed rather unstable i decided to continue the > upgrade on the other nodes suspecting an incompatibility in gluster > versions, i thought would be best to have them all on the same version > rather soonish. However things went from bad to worse, the engine stopped > again, and all vm’s stopped working as well. So on a machine outside the > setup and restored a backup of the engine taken from version 4.2.8 just > before the upgrade. With this engine I was at least able to start some vm’s > again, and finalize the upgrade. Once the upgraded, things didn’t stabilize > and also lose 2 vm’s during the process due to image corruption. After > figuring out gluster 5.3 had quite some issues I was as lucky to see > gluster 5.5 was about to be released, on the moment the RPM’s were > available I’ve installed those. This helped a lot in terms of stability, > for which I’m very grateful! However the performance is unfortunate > terrible, it’s about 15% of what the performance was running gluster > 3.12.15. It’s strange since a simple dd shows ok performance, but our > actual workload doesn’t. While I would expect the performance to be better, > due to all improvements made since gluster version 3.12. Does anybody share > the same experience? > I really hope gluster 6 will soon be tested with ovirt and released, > and things start to perform and stabilize again..like the good old days. Of > course when I can do anything, I’m happy to help. >
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6.
> > I think the following short list of issues we have after the > migration; > Gluster 5.5; > - Poor performance for our workload (mostly write dependent) > - VM’s randomly pause on unknown storage errors, which are > “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file > gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] > - Some files are listed twice in a directory (probably related > the stale file issue?) > Example; > ls -la > /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ > total 3081 > drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . > drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > > - brick processes sometimes starts multiple times. Sometimes I’ve 5 > brick processes for a single volume. Killing all glusterfsd’s for the > volume on the machine and running gluster v start <vol> force usually just > starts one after the event, from then on things look all right. > > May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina?
> Ovirt 4.3.2.1-1.el7 > - All vms images ownership are changed to root.root after the > vm is shutdown, probably related to; > https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only > scoped to the HA engine. I’m still in compatibility mode 4.2 for the > cluster and for the vm’s, but upgraded to version ovirt 4.3.2 >
Ryan?
> - The network provider is set to ovn, which is fine..actually > cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% >
Miguel? Dominik?
> - It seems on all nodes vdsm tries to get the the stats for > the HA engine, which is filling the logs with (not sure if this is new); > [api.virt] FINISH getStats return={'status': {'message': "Virtual > machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", > 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 > (api:54) >
Simone?
> - It seems the package os_brick [root] managedvolume not > supported: Managed Volume Not Supported. Missing package os-brick.: > ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for > this I also saw another message, so I suspect this will already be resolved > shortly > - The machine I used to run the backup HA engine, doesn’t want > to get removed from the hosted-engine –vm-status, not even after running; > hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine > --clean-metadata --force-clean from the machine itself. >
Simone?
> > Think that's about it. > > Don’t get me wrong, I don’t want to rant, I just wanted to share my > experience and see where things can made better. >
If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
> > > Best Olaf > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS... >
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Dear Mohit, Thanks for backporting this issue. Hopefully we can address the others as well, if i can do anything let me know. On my side i've tested with: gluster volume reset <VOL> cluster.choose-local, but haven't noticed really a change in performance. On the good side, the brick processes didn't crash with updating this config. I'll experiment with the other changes as well, and see how the combinations affect performance. I also saw this commit; https://review.gluster.org/#/c/glusterfs/+/21333/ which looks very useful, will this be an recommended option for VM/block workloads? Thanks Olaf Op wo 3 apr. 2019 om 17:56 schreef Mohit Agrawal <moagrawa@redhat.com>:
Hi,
Thanks Olaf for sharing the relevant logs.
@Atin, You are right patch https://review.gluster.org/#/c/glusterfs/+/22344/ will resolve the issue running multiple brick instance for same brick.
As we can see in below logs glusterd is trying to start the same brick instance twice at the same time
[2019-04-01 10:23:21.752401] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:23:30.348091] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:13.353396] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:24.253764] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine
We are seeing below message between starting of two instances The message "E [MSGID: 101012] [common-utils.c:4075:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/ovirt-engine/10.32.9.5-data-gfs-bricks-brick1-ovirt-engine.pid" repeated 2 times between [2019-04-01 10:23:21.748492] and [2019-04-01 10:23:21.752432]
I will backport the same. Thanks, Mohit Agrawal
On Wed, Apr 3, 2019 at 3:58 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Mohit,
Sorry i thought Krutika was referring to the ovirt-kube brick logs. due the large size (18MB compressed), i've placed the files here; https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 Also i see i've attached the wrong files, i intended to attach profile_data4.txt | profile_data3.txt Sorry for the confusion.
Thanks Olaf
Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal <moagrawa@redhat.com>:
Hi Olaf,
As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only
Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also.
Regards Mohit Agrawal
On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/sr... line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064... suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality
- performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me.
- cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here?
It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set <VOL> command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on?
2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
Thanks Olaf
ps. sorry needed to resend since it exceed the file limit
Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay < kdhananj@redhat.com>:
Adding back gluster-users Comments inline ...
On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar < olaf.buitelaar@gmail.com> wrote:
Dear Krutika,
1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt and profile_data2.txt). Looking at it, most time seems be spent at the fop’s fsync and readdirp.
Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a bit hard to compare.
One additional thing I do notice on 1 machine (10.32.9.5) the iowait time increased a lot, from an average below the 1% it’s now around the 12% after the upgrade.
So first suspicion with be lighting strikes twice, and I’ve also just now a bad disk, but that doesn’t appear to be the case, since all smart status report ok.
Also dd shows performance I would more or less expect;
dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
if=/dev/urandom of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
When I disable this brick (service glusterd stop; pkill glusterfsd) performance in gluster is better, but not on par with what it was. Also the cpu usages on the “neighbor” nodes which hosts the other bricks in the same subvolume increases quite a lot in this case, which I wouldn’t expect actually since they shouldn't handle much more work, except flagging shards to heal. Iowait also goes to idle once gluster is stopped, so it’s for sure gluster which waits for io.
So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set <VOLNAME> network.remote-dio off # gluster volume set <VOLNAME> performance.strict-o-direct on and also disable choose-local # gluster volume set <VOLNAME> cluster.choose-local off
let me know if this helps.
2. I’ve attached the mnt log and volume info, but I couldn’t find
anything relevant in in those logs. I think this is because we run the VM’s with libgfapi;
[root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
LibgfApiSupported: true version: 4.2
LibgfApiSupported: true version: 4.1
LibgfApiSupported: true version: 4.3
And I can confirm the qemu process is invoked with the gluster:// address for the images.
The message is logged in the /var/lib/libvert/qemu/<machine> file, which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
Which has the error; E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
Could you also attach the brick logs for this volume?
3. yes I see multiple instances for the same brick directory, like;
/usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid -S /var/run/gluster/452591c9165945d9.socket --brick-name /data/gfs/bricks/brick1/ovirt-core -l /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 --process-name brick --brick-port 49154 --xlator-option ovirt-core-server.listen-port=49154
I’ve made an export of the output of ps from the time I observed these multiple processes.
In addition the brick_mux bug as noted by Atin. I might also have another possible cause, as ovirt moves nodes from none-operational state or maintenance state to active/activating, it also seems to restart gluster, however I don’t have direct proof for this theory.
+Atin Mukherjee <amukherj@redhat.com> ^^ +Mohit Agrawal <moagrawa@redhat.com> ^^
-Krutika
Thanks Olaf
Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < sbonazzo@redhat.com>:
> > > Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar@gmail.com> > ha scritto: > >> Dear All, >> >> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. >> While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one >> was a different experience. After first trying a test upgrade on a 3 node >> setup, which went fine. i headed to upgrade the 9 node production platform, >> unaware of the backward compatibility issues between gluster 3.12.15 -> >> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >> was missing or couldn't be accessed. Restoring this file by getting a good >> copy of the underlying bricks, removing the file from the underlying bricks >> where the file was 0 bytes and mark with the stickybit, and the >> corresponding gfid's. Removing the file from the mount point, and copying >> back the file on the mount point. Manually mounting the engine domain, and >> manually creating the corresponding symbolic links in /rhev/data-center and >> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >> root.root), i was able to start the HA engine again. Since the engine was >> up again, and things seemed rather unstable i decided to continue the >> upgrade on the other nodes suspecting an incompatibility in gluster >> versions, i thought would be best to have them all on the same version >> rather soonish. However things went from bad to worse, the engine stopped >> again, and all vm’s stopped working as well. So on a machine outside the >> setup and restored a backup of the engine taken from version 4.2.8 just >> before the upgrade. With this engine I was at least able to start some vm’s >> again, and finalize the upgrade. Once the upgraded, things didn’t stabilize >> and also lose 2 vm’s during the process due to image corruption. After >> figuring out gluster 5.3 had quite some issues I was as lucky to see >> gluster 5.5 was about to be released, on the moment the RPM’s were >> available I’ve installed those. This helped a lot in terms of stability, >> for which I’m very grateful! However the performance is unfortunate >> terrible, it’s about 15% of what the performance was running gluster >> 3.12.15. It’s strange since a simple dd shows ok performance, but our >> actual workload doesn’t. While I would expect the performance to be better, >> due to all improvements made since gluster version 3.12. Does anybody share >> the same experience? >> I really hope gluster 6 will soon be tested with ovirt and >> released, and things start to perform and stabilize again..like the good >> old days. Of course when I can do anything, I’m happy to help. >> > > Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track > the rebase on Gluster 6. > > > >> >> I think the following short list of issues we have after the >> migration; >> Gluster 5.5; >> - Poor performance for our workload (mostly write dependent) >> - VM’s randomly pause on unknown storage errors, which are >> “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file >> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >> - Some files are listed twice in a directory (probably >> related the stale file issue?) >> Example; >> ls -la >> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >> total 3081 >> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c >> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c >> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >> >> - brick processes sometimes starts multiple times. Sometimes I’ve 5 >> brick processes for a single volume. Killing all glusterfsd’s for the >> volume on the machine and running gluster v start <vol> force usually just >> starts one after the event, from then on things look all right. >> >> > May I kindly ask to open bugs on Gluster for above issues at > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? > Sahina? > > >> Ovirt 4.3.2.1-1.el7 >> - All vms images ownership are changed to root.root after the >> vm is shutdown, probably related to; >> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >> scoped to the HA engine. I’m still in compatibility mode 4.2 for the >> cluster and for the vm’s, but upgraded to version ovirt 4.3.2 >> > > Ryan? > > >> - The network provider is set to ovn, which is fine..actually >> cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% >> > > Miguel? Dominik? > > >> - It seems on all nodes vdsm tries to get the the stats for >> the HA engine, which is filling the logs with (not sure if this is new); >> [api.virt] FINISH getStats return={'status': {'message': "Virtual >> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >> (api:54) >> > > Simone? > > >> - It seems the package os_brick [root] managedvolume not >> supported: Managed Volume Not Supported. Missing package os-brick.: >> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >> this I also saw another message, so I suspect this will already be resolved >> shortly >> - The machine I used to run the backup HA engine, doesn’t >> want to get removed from the hosted-engine –vm-status, not even after >> running; hosted-engine --clean-metadata --host-id=10 --force-clean or >> hosted-engine --clean-metadata --force-clean from the machine itself. >> > > Simone? > > >> >> Think that's about it. >> >> Don’t get me wrong, I don’t want to rant, I just wanted to share my >> experience and see where things can made better. >> > > If not already done, can you please open bugs for above issues at > https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? > > >> >> >> Best Olaf >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-leave@ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS... >> > > > -- > > SANDRO BONAZZOLA > > MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV > > Red Hat EMEA <https://www.redhat.com/> > > sbonazzo@redhat.com > <https://red.ht/sig> > _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6...

Dear Mohit, I've upgraded to gluster 5.6, however the starting of multiple glusterfsd processed per brick doesn't seems to be fully resolved yet. However it does seem to happen less than before. Also in some cases glusterd did seem to detect a glusterfsd was running, but decided it was not valid. It was reproducible on all my machines after a reboot, but only a few bricks seemed to be affected. I'm running about 14 bricks per machine, and only 1 - 3 were affected. The ones with 3 full bricks, seemed tp suffer most. Also in some cases a restart of the glusterd service did spawn multiple glusterfsd processed for the same bricks configured on the node. See for example logs; [2019-04-19 17:49:50.853099] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 17:50:33.302239] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 17:56:11.287692] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 17:57:12.699967] I [glusterd-utils.c:6184:glusterd_brick_start] 0-management: Either pid 14884 is not running or brick path /data/gfs/bricks/brick1/ovirt-core is not consumed so cleanup pidfile [2019-04-19 17:57:12.700150] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 18:02:58.420870] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 18:03:29.420891] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 18:48:14.046029] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-19 18:55:04.508606] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core or [2019-04-18 17:00:00.665476] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:00:32.799529] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:02:38.271880] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:08:32.867046] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:09:00.440336] I [glusterd-utils.c:6184:glusterd_brick_start] 0-management: Either pid 9278 is not running or brick path /data/gfs/bricks/brick1/ovirt-core is not consumed so cleanup pidfile [2019-04-18 17:09:00.440476] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:09:07.644070] I [glusterd-utils.c:6184:glusterd_brick_start] 0-management: Either pid 24126 is not running or brick path /data/gfs/bricks/brick1/ovirt-core is not consumed so cleanup pidfile [2019-04-18 17:09:07.644184] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:09:13.785798] I [glusterd-utils.c:6184:glusterd_brick_start] 0-management: Either pid 27197 is not running or brick path /data/gfs/bricks/brick1/ovirt-core is not consumed so cleanup pidfile [2019-04-18 17:09:13.785918] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:09:24.344561] I [glusterd-utils.c:6184:glusterd_brick_start] 0-management: Either pid 28468 is not running or brick path /data/gfs/bricks/brick1/ovirt-core is not consumed so cleanup pidfile [2019-04-18 17:09:24.344675] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 17:37:07.150799] I [glusterd-utils.c:6214:glusterd_brick_start] 0-management: discovered already-running brick /data/gfs/bricks/brick1/ovirt-core [2019-04-18 18:17:23.203719] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-core Again the the procedure to resolve this, was kill all the glusterfsd processed for the brick, and do a gluster v <VOL> start force, which resulted in only 1 processes being started. After the upgrade to 5.6 i do notice a small performance improvement of around 15%, but it's still far from 3.12.15. I don't experience a drop in network utilisation, but i doubt i ever suffered from that issue, as since as long as i run gluster (3.7), the usage was always between on average: 15 - 180 Mbps. And depending on machine and hosted bricks/brick types gravitates around 30Mbps/80Mbps/160Mbps. I also found the reason the ovs-vswitchd starts using 100% cpu, it appears the one of the machines tries to add an interface twice on all other machines. I don't really understand where this is configured; 801cc877-dd59-4b73-9cd4-6e89b7dd4245 Bridge br-int fail_mode: secure Port "ovn-ab29e1-0" Interface "ovn-ab29e1-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.5"} Port "ovn-e1f5eb-0" Interface "ovn-e1f5eb-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.7"} Port "ovn-17c441-0" Interface "ovn-17c441-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.21"} Port "ovn-6a362b-0" Interface "ovn-6a362b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.5"} error: "could not add network device ovn-6a362b-0 to ofproto (File exists)" Port "ovn-99caac-0" Interface "ovn-99caac-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.20"} Port "ovn-1c9643-0" Interface "ovn-1c9643-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.6"} Port "ovn-2e5821-0" Interface "ovn-2e5821-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.8"} Port "ovn-484b7e-0" Interface "ovn-484b7e-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.9"} Port br-int Interface br-int type: internal Port "ovn-0522c9-0" Interface "ovn-0522c9-0" type: geneve options: {csum="true", key=flow, remote_ip="10.32.9.4"} Port "ovn-437985-0" Interface "ovn-437985-0" type: geneve options: {csum="true", key=flow, remote_ip="10.0.6.1"} ovs_version: "2.10.1" It seems the interface for 10.32.9.5 is added twice; ovn-6a362b-0 and ovn-ab29e1-0. Manually removing the inferface with ovs-vsctl, doesn't help. The only thing which seems to resolve it is restarting openvswitch service on 10.32.9.5, however when i reboot the machine it the issue resurfaces. Any pointers on where this might be configured are welcome. Also i found that glusterd is always restarted when a node is transitioning from maintenance/non-operational to active. Especially in the case the node is none-operational, and other nodes are also none-operational, this introduces extra instability, since the gluster service is constantly restarting causing quorum loss, making things worse. Maybe it's an idea to have some logic in place when gluster should be restarted by ovirt and when it's better to leave it running? i also was thinking maybe it's a good idea to have an option on what should happen when a disk image becomes unavailable, currently you have the option to either pause the VM or kill it. Maybe a third option could be added, and thread this event as a removed/faulty disk. In this scenario you could for example setup a mirrored volume within the VM on 2 different gluster volumes, and let your VM continue running. I've also upgraded to ovirt 4.3.3, and the messages about; Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] veth7611c53 is not present in the system'} seems to be gone, but i cannot find a specific release note about it. Hope we can also resolve the other issues. Best Olaf

Sorry it appears the messages about; Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] veth18ae509 is not present in the system'} aren't gone, just are happening much less frequent. Best Olaf
participants (9)
-
Atin Mukherjee
-
Darrell Budic
-
Krutika Dhananjay
-
Leo David
-
Mohit Agrawal
-
Olaf Buitelaar
-
olaf.buitelaar@gmail.com
-
Sandro Bonazzola
-
Strahil