Does cluster upgrade wait for heal before proceeding to next host?

Jayme

6 Aug 2019 6 Aug '19

4:22 p.m.

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops. I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs. Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete?

Attachments:

attachment.html (text/html — 780 bytes)

Show replies by date

Shani Leviim

6 Aug 6 Aug

4:38 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter? *Regards,* *Shani Leviim* On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com> wrote:

...

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

Robert O'Kane

10:54 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

Hello, Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot). This reboot then makes the brick "out of sync" and it has to be resync'd. I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or? Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again. Cheers, Robert O'Kane Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...

Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln

Jayme

11:14 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken) On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...

Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

Sandro Bonazzola

8 Aug 8 Aug

10:24 a.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...

I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

...

On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

Martin Perina

9 Aug 9 Aug

12:10 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...

Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...
I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

The cluster upgrade process continues to the 2nd host after the 1st host becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process to prevent breakage. Sahina, is gluster healing process status exposed in RESTAPI? If so, does it makes sense to wait for healing to be finished before trying to move next host to maintenance? Or any other ideas how to improve?

...

...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Laura Wright

1:09 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

I'd be happy to take a look at the process from a UX perspective. Would anyone be able to document a series of screenshots or a video of the end to end experience of it? On Fri, Aug 9, 2019 at 6:11 AM Martin Perina <mperina@redhat.com> wrote:

...

On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...
Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...
I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

The cluster upgrade process continues to the 2nd host after the 1st host becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process to prevent breakage. Sahina, is gluster healing process status exposed in RESTAPI? If so, does it makes sense to wait for healing to be finished before trying to move next host to maintenance? Or any other ideas how to improve?

...
...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

-- Laura Wright She/Her/Hers UXD Team Red Hat Massachusetts <https://www.redhat.com/> 314 Littleton Rd lwright@redhat.com <https://www.redhat.com/>

Sahina Bose

11 Sep 11 Sep

8:53 a.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mperina@redhat.com> wrote:

...

On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...
Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...
I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

The cluster upgrade process continues to the 2nd host after the 1st host becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process to prevent breakage. Sahina, is gluster healing process status exposed in RESTAPI? If so, does it makes sense to wait for healing to be finished before trying to move next host to maintenance? Or any other ideas how to improve?

I need to cross-check this, if we expose the heal count in the gluster bricks. Moving a host to maintenance does check if there are pending heal entries or possibility of quorum loss. And this would prevent the additional hosts to upgrade. +Gobinda Das <godas@redhat.com> +Sachidananda URS <surs@redhat.com>

...

...
...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Jayme

1:31 p.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

This sounds similar to the issue I hit with the cluster upgrade process in my environment. I have large 2tb ssds and most of my vms are several hundred Gbs in size. The heal process after host reboot can take 5-10 minutes to complete. I may be able to address this with better gluster tuning. Either way the upgrade process should be aware of the heal status and wait for it to complete before attempting to move on to the next host. On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sabose@redhat.com> wrote:

...

On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mperina@redhat.com> wrote:

...
On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...
Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...
I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

The cluster upgrade process continues to the 2nd host after the 1st host becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process to prevent breakage. Sahina, is gluster healing process status exposed in RESTAPI? If so, does it makes sense to wait for healing to be finished before trying to move next host to maintenance? Or any other ideas how to improve?

I need to cross-check this, if we expose the heal count in the gluster bricks. Moving a host to maintenance does check if there are pending heal entries or possibility of quorum loss. And this would prevent the additional hosts to upgrade. +Gobinda Das <godas@redhat.com> +Sachidananda URS <surs@redhat.com>

...
...
...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim:

...
Hi Jayme, I can't recall such a healing time. Can you please retry and attach the engine & vdsm logs so we'll be smarter?

*Regards, * *Shani Leviim *

On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com <mailto:jaymef@gmail.com>> wrote:

I've yet to have cluster upgrade finish updating my three host HCI cluster. The most recent try was today moving from oVirt 4.3.3 to 4.3.5.5. The first host updates normally, but when it moves on to the second host it fails to put it in maintenance and the cluster upgrade stops.

I suspect this is due to that fact that after my hosts are updated it takes 10 minutes or more for all volumes to sync/heal. I have 2Tb SSDs.

Does the cluster upgrade process take heal time in to account before attempting to place the next host in maintenance to upgrade it? Or is there something else that may be at fault here, or perhaps a reason why the heal process takes 10 minutes after reboot to complete? _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...

...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

...
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Kaustav Majumder

16 Sep 16 Sep

10:02 a.m.

New subject: Does cluster upgrade wait for heal before proceeding to next host?

Hi Jayme, It would be great if you could raise a bug regarding the same. On Wed, Sep 11, 2019 at 5:05 PM Jayme <jaymef@gmail.com> wrote:

...

This sounds similar to the issue I hit with the cluster upgrade process in my environment. I have large 2tb ssds and most of my vms are several hundred Gbs in size. The heal process after host reboot can take 5-10 minutes to complete. I may be able to address this with better gluster tuning.

Either way the upgrade process should be aware of the heal status and wait for it to complete before attempting to move on to the next host.

On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sabose@redhat.com> wrote:

...
On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mperina@redhat.com> wrote:

...
On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...
Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha scritto:

...
I’m aware of the heal process but it’s unclear to me if the update continues to run while the volumes are healing and resumes when they are done. There doesn’t seem to be any indication in the ui (unless I’m mistaken)

Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose <sabose@redhat.com> and @Laura Wright <lwright@redhat.com> on this, hyperconverged deployments using cluster upgrade command would probably need some improvement.

The cluster upgrade process continues to the 2nd host after the 1st host becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process to prevent breakage. Sahina, is gluster healing process status exposed in RESTAPI? If so, does it makes sense to wait for healing to be finished before trying to move next host to maintenance? Or any other ideas how to improve?

I need to cross-check this, if we expose the heal count in the gluster bricks. Moving a host to maintenance does check if there are pending heal entries or possibility of quorum loss. And this would prevent the additional hosts to upgrade. +Gobinda Das <godas@redhat.com> +Sachidananda URS <surs@redhat.com>

...
...
...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:

...
Hello,

Often(?), updates to a hypervisor that also has (provides) a Gluster brick takes the hypervisor offline (updates often require a reboot).

This reboot then makes the brick "out of sync" and it has to be resync'd.

I find it a "feature" than another host that is also part of a gluster domain can not be updated (rebooted) before all the bricks are updated in order to guarantee there is not data loss. It is called Quorum, or?

Always let the heal process end. Then the next update can start. For me there is ALWAYS a healing time before Gluster is happy again.

Cheers,

Robert O'Kane

Am 06.08.2019 um 16:38 schrieb Shani Leviim: > Hi Jayme, > I can't recall such a healing time. > Can you please retry and attach the engine & vdsm logs so we'll be smarter? > > *Regards, > * > *Shani Leviim > * > > > On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com > <mailto:jaymef@gmail.com>> wrote: > > I've yet to have cluster upgrade finish updating my three host HCI > cluster. The most recent try was today moving from oVirt 4.3.3 to > 4.3.5.5. The first host updates normally, but when it moves on to > the second host it fails to put it in maintenance and the cluster > upgrade stops. > > I suspect this is due to that fact that after my hosts are updated > it takes 10 minutes or more for all volumes to sync/heal. I have > 2Tb SSDs. > > Does the cluster upgrade process take heal time in to account before > attempting to place the next host in maintenance to upgrade it? Or > is there something else that may be at fault here, or perhaps a > reason why the heal process takes 10 minutes after reboot to complete? > _______________________________________________ > Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> > To unsubscribe send an email to users-leave@ovirt.org > <mailto:users-leave@ovirt.org> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP... > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7... >

-- Systems Administrator Kunsthochschule für Medien Köln Peter-Welter-Platz 2 50676 Köln _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...

--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

_______________________________________________

Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/I4KLDBPYBCQEMC...

-- Thanks, Kaustav Majumder

2345

Age (days ago)

2386

Last active (days ago)

List overview

Download

9 comments

8 participants

participants (8)

Jayme
Kaustav Majumder
Laura Wright
Martin Perina
Robert O'Kane
Sahina Bose
Sandro Bonazzola
Shani Leviim

Does cluster upgrade wait for heal before proceeding to next host?

tags

participants (8)