[ovirt-users] Re: Does cluster upgrade wait for heal before proceeding to next host?

11 Sep 2019

      This sounds similar to the issue I hit with the cluster upgrade process in
my environment. I have large 2tb ssds and most of my vms are several
hundred Gbs in size. The heal process after host reboot can take 5-10
minutes to complete. I may be able to address this with better gluster
tuning.

Either way the upgrade process should be aware of the heal status and wait
for it to complete before attempting to move on to the next host.

On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sabose@redhat.com> wrote:
...
On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mperina@redhat.com> wrote:
...
On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo@redhat.com>
wrote:
...
Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef@gmail.com> ha
scritto:
...
I’m aware of the heal process but it’s unclear to me if the update
continues to run while the volumes are healing and resumes when they are
done. There doesn’t seem to be any indication in the ui (unless I’m
mistaken)
Adding @Martin Perina <mperina@redhat.com> , @Sahina Bose
<sabose@redhat.com>   and @Laura Wright <lwright@redhat.com>  on this,
hyperconverged deployments using cluster upgrade command would probably
need some improvement.
The cluster upgrade process continues to the 2nd host after the 1st host
becomes Up. If 2nd host then fails to switch to maintenance, we stop the
upgrade process to prevent breakage.
Sahina, is gluster healing process status exposed in RESTAPI? If so, does
it makes sense to wait for healing to be finished before trying to move
next host to maintenance? Or any other ideas how to improve?
I need to cross-check this, if we expose the heal count in the gluster
bricks. Moving a host to maintenance does check if there are pending heal
entries or possibility of quorum loss. And this would prevent the
additional hosts to upgrade.
+Gobinda Das <godas@redhat.com> +Sachidananda URS <surs@redhat.com>
...
...
...
On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane@khm.de> wrote:
...
Hello,
Often(?), updates to a hypervisor that also has (provides) a Gluster
brick takes the hypervisor offline (updates often require a reboot).
This reboot then makes the brick "out of sync" and it has to be
resync'd.
I find it a "feature" than another host that is also part of a gluster
domain can not be updated (rebooted) before all the bricks are updated
in order to guarantee there is not data loss. It is called Quorum, or?
Always let the heal process end. Then the next update can start.
For me there is ALWAYS a healing time before Gluster is happy again.
Cheers,
Robert O'Kane
Am 06.08.2019 um 16:38 schrieb Shani Leviim:
...
Hi Jayme,
I can't recall such a healing time.
Can you please retry and attach the engine & vdsm logs so we'll be
smarter?
*Regards,
*
*Shani Leviim
*
On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef@gmail.com
<mailto:jaymef@gmail.com>> wrote:
I've yet to have cluster upgrade finish updating my three host
HCI
    cluster.  The most recent try was today moving from oVirt 4.3.3
to
    4.3.5.5.  The first host updates normally, but when it moves on
to
    the second host it fails to put it in maintenance and the cluster
    upgrade stops.
I suspect this is due to that fact that after my hosts are
updated
    it takes 10 minutes or more for all volumes to sync/heal.  I have
    2Tb SSDs.
Does the cluster upgrade process take heal time in to account
before
    attempting to place the next host in maintenance to upgrade it?
Or
    is there something else that may be at fault here, or perhaps a
    reason why the heal process takes 10 minutes after reboot to
complete?
    _______________________________________________
    Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
    To unsubscribe send an email to users-leave@ovirt.org
    <mailto:users-leave@ovirt.org>
    Privacy Statement: https://www.ovirt.org/site/privacy-policy/
    oVirt Code of Conduct:
    https://www.ovirt.org/community/about/community-guidelines/
    List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIP...
...
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
...
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7...
--
Systems Administrator
Kunsthochschule für Medien Köln
Peter-Welter-Platz 2
50676 Köln
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LH...
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com
<https://www.redhat.com/>*Red Hat respects your work life balance.
Therefore there is no need to answer this email out of your office hours.
<https://mojo.redhat.com/docs/DOC-1199578>*
--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.