Are people still experiencing issues with GlusterFS on 4.3x?

I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up. Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time? Thanks!

I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up. > Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?> Thanks! Yep, sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force".Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable. Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ? Best Regards,Strahil Nikolov

Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up.
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks!
Yep,
sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force". Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable.
Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ?
No, there's no such convention. 4.3 is supposed to be stable and production ready. The fact it isn't stable enough for all the cases means it has not been tested for those cases. In oVirt 4.3.1 RC cycle testing ( https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 ) we got participation of only 6 people and not even all the tests have been completed. Help testing during release candidate phase helps having more stable final releases. oVirt 4.3.2 is at its second release candidate, if you have time and resource, it would be helpful testing it on an environment which is similar to your production environment and give feedback / report bugs. Thanks
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RP...
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>

Hi, something that I’m seeing in the vdsm.log, that I think is gluster related is the following message: 2019-03-15 05:58:28,980-0700 INFO (jsonrpc/6) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:148) os_brick seems something available by openstack channels but I didn’t verify. Simon
On Mar 15, 2019, at 1:54 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov <hunter86_bg@yahoo.com <mailto:hunter86_bg@yahoo.com>> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up.
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks!
Yep,
sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force". Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable.
Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ?
No, there's no such convention. 4.3 is supposed to be stable and production ready. The fact it isn't stable enough for all the cases means it has not been tested for those cases. In oVirt 4.3.1 RC cycle testing (https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 <https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1> ) we got participation of only 6 people and not even all the tests have been completed. Help testing during release candidate phase helps having more stable final releases. oVirt 4.3.2 is at its second release candidate, if you have time and resource, it would be helpful testing it on an environment which is similar to your production environment and give feedback / report bugs.
Thanks
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RP... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RPIPZNXYSLCBXZ4VOPX2/>
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>
<https://red.ht/sig> _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4GPZTHOY4QC6GGO66CX/>

Il giorno ven 15 mar 2019 alle ore 14:00 Simon Coter <simon.coter@oracle.com> ha scritto:
Hi,
something that I’m seeing in the vdsm.log, that I think is gluster related is the following message:
2019-03-15 05:58:28,980-0700 INFO (jsonrpc/6) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:148)
os_brick seems something available by openstack channels but I didn’t verify.
Fred, I see you introduced above error in vdsm commit 9646c6dc1b875338b170df2cfa4f41c0db8a6525 back in November 2018. I guess you are referring to python-os-brick. Looks like it's related to cinderlib integration. I would suggest to: - fix error message pointing to python-os-brick - add python-os-brick dependency in spec file if the dependency is not optional - if the dependency is optional as it seems to be, adjust the error message to say so. I feel nervous seeing errors on missing packages :-)
Simon
On Mar 15, 2019, at 1:54 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up.
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks!
Yep,
sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force". Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable.
Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ?
No, there's no such convention. 4.3 is supposed to be stable and production ready. The fact it isn't stable enough for all the cases means it has not been tested for those cases. In oVirt 4.3.1 RC cycle testing ( https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 ) we got participation of only 6 people and not even all the tests have been completed. Help testing during release candidate phase helps having more stable final releases. oVirt 4.3.2 is at its second release candidate, if you have time and resource, it would be helpful testing it on an environment which is similar to your production environment and give feedback / report bugs.
Thanks
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RP...
-- SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4...
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>

On Fri, Mar 15, 2019, 15:16 Sandro Bonazzola <sbonazzo@redhat.com wrote:
Il giorno ven 15 mar 2019 alle ore 14:00 Simon Coter < simon.coter@oracle.com> ha scritto:
Hi,
something that I’m seeing in the vdsm.log, that I think is gluster related is the following message:
2019-03-15 05:58:28,980-0700 INFO (jsonrpc/6) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:148)
os_brick seems something available by openstack channels but I didn’t verify.
Fred, I see you introduced above error in vdsm commit 9646c6dc1b875338b170df2cfa4f41c0db8a6525 back in November 2018. I guess you are referring to python-os-brick. Looks like it's related to cinderlib integration. I would suggest to: - fix error message pointing to python-os-brick - add python-os-brick dependency in spec file if the dependency is not optional - if the dependency is optional as it seems to be, adjust the error message to say so. I feel nervous seeing errors on missing packages :-)
There is no error message here. This is an INFO level message, not an ERROR or WARN, and it just explains why managed volumes will not be available on this host. Having this information in the log is extremely important for developers and support. I think we can improve the message to mention the actual package name, but otherwise there is no issue in this info message. Nir
Simon
On Mar 15, 2019, at 1:54 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov < hunter86_bg@yahoo.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up.
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks!
Yep,
sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force". Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable.
Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ?
No, there's no such convention. 4.3 is supposed to be stable and production ready. The fact it isn't stable enough for all the cases means it has not been tested for those cases. In oVirt 4.3.1 RC cycle testing ( https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 ) we got participation of only 6 people and not even all the tests have been completed. Help testing during release candidate phase helps having more stable final releases. oVirt 4.3.2 is at its second release candidate, if you have time and resource, it would be helpful testing it on an environment which is similar to your production environment and give feedback / report bugs.
Thanks
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RP...
-- SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKBPVI4FY2L5KR...

Dear Sandro, Nir, usually I avoid the test repos for 2 reasons:1. I had a bad experience getting away from the RHEL 7.5 Beta to 7.5 Standard repos ,so now I prefer to update only when patches are in standard repo2. My Lab is a kind of a test environment , but I prefer to be able to spin up a VM or 2 once needed. Last issues rendered the lab unmanageable for several days due to 0 bytes OVF_STORE and other issues. About the gluster update issue - I think this is a serious one. If we had that in mind , the most wise approach would be to stay on gluster v3 until the issue is resolved. I have a brick down almost every day and despite not being a "killer", the experience is no way near the 4.2.7 - 4.2.8 . Can someone point me to a docu with minimum requirements for a nested oVirt Lab? I'm planing to create a nested test environment in order to both provide feedback on new releases and to be prepared before deploying on my lab. Best Regards,Strahil Nikolov В събота, 16 март 2019 г., 15:35:05 ч. Гринуич+2, Nir Soffer <nsoffer@redhat.com> написа: On Fri, Mar 15, 2019, 15:16 Sandro Bonazzola <sbonazzo@redhat.com wrote: Il giorno ven 15 mar 2019 alle ore 14:00 Simon Coter <simon.coter@oracle.com> ha scritto: Hi, something that I’m seeing in the vdsm.log, that I think is gluster related is the following message: 2019-03-15 05:58:28,980-0700 INFO (jsonrpc/6) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick',) (caps:148) os_brick seems something available by openstack channels but I didn’t verify. Fred, I see you introduced above error in vdsm commit 9646c6dc1b875338b170df2cfa4f41c0db8a6525 back in November 2018.I guess you are referring to python-os-brick.Looks like it's related to cinderlib integration.I would suggest to:- fix error message pointing to python-os-brick- add python-os-brick dependency in spec file if the dependency is not optional- if the dependency is optional as it seems to be, adjust the error message to say so. I feel nervous seeing errors on missing packages :-) There is no error message here. This is an INFO level message, not an ERROR or WARN, and it just explains why managed volumes will not be available on this host. Having this information in the log is extremely important for developers and support. I think we can improve the message to mention the actual package name, but otherwise there is no issue in this info message. Nir Simon On Mar 15, 2019, at 1:54 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote: Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov <hunter86_bg@yahoo.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I >hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop >and I have to force start the volume to get it back up. > Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?> Thanks! Yep, sometimes a brick dies (usually my ISO domain ) and then I have to "gluster volume start isos force".Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes), issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are quite unstable. Is there a convention indicating stability ? Is 4.3.xxx means unstable , while 4.2.yyy means stable ?
No, there's no such convention. 4.3 is supposed to be stable and production ready.The fact it isn't stable enough for all the cases means it has not been tested for those cases.In oVirt 4.3.1 RC cycle testing (https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 ) we got participation of only 6 people and not even all the tests have been completed.Help testing during release candidate phase helps having more stable final releases.oVirt 4.3.2 is at its second release candidate, if you have time and resource, it would be helpful testing it on an environment which is similar to your production environment and give feedback / report bugs. Thanks Best Regards,Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RP... -- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | | _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4... -- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA sbonazzo@redhat.com | | | _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKBPVI4FY2L5KR... _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WT5W3X3RYMRV47...

Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF...
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>

Yes that is correct. I don't know if the upgrade to 4.3.1 itself caused issues or simply related somehow to rebooting all hosts again to apply node updates started causing brick issues for me again. I started having similar brick issues after upgrading to 4.3 originally that seemed to have stabilized, prior to 4.3 I never had a single glusterFS issue or brick offline on 4.2 On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>

Just FYI, I have observed similar issues where a volume becomes unstable for a period of time after the upgrade, but then seems to settle down after a while. I've only witnessed this in the 4.3.x versions. I suspect it's more of a Gluster issue than oVirt, but troubling none the less. On Fri, 15 Mar 2019 at 09:37, Jayme <jaymef@gmail.com> wrote:
Yes that is correct. I don't know if the upgrade to 4.3.1 itself caused issues or simply related somehow to rebooting all hosts again to apply node updates started causing brick issues for me again. I started having similar brick issues after upgrading to 4.3 originally that seemed to have stabilized, prior to 4.3 I never had a single glusterFS issue or brick offline on 4.2
On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3S...

That is essentially the behaviour that I've seen. I wonder if perhaps it could be related to the increased heal activity that occurs on the volumes during reboots of nodes after updating. On Fri, Mar 15, 2019 at 12:43 PM Ron Jerome <ronjero@gmail.com> wrote:
Just FYI, I have observed similar issues where a volume becomes unstable for a period of time after the upgrade, but then seems to settle down after a while. I've only witnessed this in the 4.3.x versions. I suspect it's more of a Gluster issue than oVirt, but troubling none the less.
On Fri, 15 Mar 2019 at 09:37, Jayme <jaymef@gmail.com> wrote:
Yes that is correct. I don't know if the upgrade to 4.3.1 itself caused issues or simply related somehow to rebooting all hosts again to apply node updates started causing brick issues for me again. I started having similar brick issues after upgrading to 4.3 originally that seemed to have stabilized, prior to 4.3 I never had a single glusterFS issue or brick offline on 4.2
On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3S...

Upgrading gluster from version 3.12 or 4.1 (included in ovirt 3.x) to 5.3 (in ovirt 4.3) seems to cause this due to a bug in the gluster upgrade process. It’s an unfortunate side effect fo us upgrading ovirt hyper-converged systems. Installing new should be fine, but I’d wait for gluster to get https://bugzilla.redhat.com/show_bug.cgi?id=1684385 <https://bugzilla.redhat.com/show_bug.cgi?id=1684385> included in the version ovirt installs before installing a hyper converged cluster. I just upgraded my 4.2.8 cluster to 4.3.1, leaving my separate gluster 3.12.15 servers along, and it worked fine. Except for a different bug screwing up HA engine permissions on launch, but it looks like that’s getting fixed on a different bug. Sandro, it’s unfortunate I can’t take more part in testing days, but the haven’t been happening at times where I can participate, and a one test test isn’t really something i can participate in often. I sometimes try and keep up with the RCs on my test cluster, but major version changes wait until I get time to consider it, unfortunately. I’m also a little surprised that a major upstream issue like that bug hasn’t caused you to issue more warnings, it’s something that is going to affect everyone who’s upgrading a converged system. Any discussion on why more news wasn’t released about it? -Darrell
On Mar 15, 2019, at 11:50 AM, Jayme <jaymef@gmail.com> wrote:
That is essentially the behaviour that I've seen. I wonder if perhaps it could be related to the increased heal activity that occurs on the volumes during reboots of nodes after updating.
On Fri, Mar 15, 2019 at 12:43 PM Ron Jerome <ronjero@gmail.com <mailto:ronjero@gmail.com>> wrote: Just FYI, I have observed similar issues where a volume becomes unstable for a period of time after the upgrade, but then seems to settle down after a while. I've only witnessed this in the 4.3.x versions. I suspect it's more of a Gluster issue than oVirt, but troubling none the less.
On Fri, 15 Mar 2019 at 09:37, Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote: Yes that is correct. I don't know if the upgrade to 4.3.1 itself caused issues or simply related somehow to rebooting all hosts again to apply node updates started causing brick issues for me again. I started having similar brick issues after upgrading to 4.3 originally that seemed to have stabilized, prior to 4.3 I never had a single glusterFS issue or brick offline on 4.2
On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola <sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>> wrote:
Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> ha scritto: I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CFYS6BFUA2C4FIYZQMGU/>
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>
<https://red.ht/sig> _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3S... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3SOUEKXYGOKTU43LZV3M/> _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VSZ3ROIE6NXIGW...

Interesting this is the first time I’ve seen this bug posted. I’m still having problems with bricks going down in my hci setup. It had been 1-2 bricks dropping 2-4 times per day on different volumes. However when all bricks are up everything is working ok and all bricks are healed and seemingly in sync. Could this stil be related to the mentioned bug in such a case? On Fri, Mar 15, 2019 at 11:37 PM Darrell Budic <budic@onholyground.com> wrote:
Upgrading gluster from version 3.12 or 4.1 (included in ovirt 3.x) to 5.3 (in ovirt 4.3) seems to cause this due to a bug in the gluster upgrade process. It’s an unfortunate side effect fo us upgrading ovirt hyper-converged systems. Installing new should be fine, but I’d wait for gluster to get https://bugzilla.redhat.com/show_bug.cgi?id=1684385 included in the version ovirt installs before installing a hyper converged cluster.
I just upgraded my 4.2.8 cluster to 4.3.1, leaving my separate gluster 3.12.15 servers along, and it worked fine. Except for a different bug screwing up HA engine permissions on launch, but it looks like that’s getting fixed on a different bug.
Sandro, it’s unfortunate I can’t take more part in testing days, but the haven’t been happening at times where I can participate, and a one test test isn’t really something i can participate in often. I sometimes try and keep up with the RCs on my test cluster, but major version changes wait until I get time to consider it, unfortunately. I’m also a little surprised that a major upstream issue like that bug hasn’t caused you to issue more warnings, it’s something that is going to affect everyone who’s upgrading a converged system. Any discussion on why more news wasn’t released about it?
-Darrell
On Mar 15, 2019, at 11:50 AM, Jayme <jaymef@gmail.com> wrote:
That is essentially the behaviour that I've seen. I wonder if perhaps it could be related to the increased heal activity that occurs on the volumes during reboots of nodes after updating.
On Fri, Mar 15, 2019 at 12:43 PM Ron Jerome <ronjero@gmail.com> wrote:
Just FYI, I have observed similar issues where a volume becomes unstable for a period of time after the upgrade, but then seems to settle down after a while. I've only witnessed this in the 4.3.x versions. I suspect it's more of a Gluster issue than oVirt, but troubling none the less.
On Fri, 15 Mar 2019 at 09:37, Jayme <jaymef@gmail.com> wrote:
Yes that is correct. I don't know if the upgrade to 4.3.1 itself caused issues or simply related somehow to rebooting all hosts again to apply node updates started causing brick issues for me again. I started having similar brick issues after upgrading to 4.3 originally that seemed to have stabilized, prior to 4.3 I never had a single glusterFS issue or brick offline on 4.2
On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno ven 15 mar 2019 alle ore 13:38 Jayme <jaymef@gmail.com> ha scritto:
I along with others had GlusterFS issues after 4.3 upgrades, the failed to dispatch handler issue with bricks going down intermittently. After some time it seemed to have corrected itself (at least in my enviornment) and I hadn't had any brick problems in a while. I upgraded my three node HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues. They will all be up running fine then all of a sudden a brick will randomly drop and I have to force start the volume to get it back up.
Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
Have any of these Gluster issues been addressed in 4.3.2 or any other releases/patches that may be available to help the problem at this time?
Thanks! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CF...
-- SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3S...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VSZ3ROIE6NXIGW...
participants (7)
-
Darrell Budic
-
Jayme
-
Nir Soffer
-
Ron Jerome
-
Sandro Bonazzola
-
Simon Coter
-
Strahil Nikolov