I keep reading this chain and I still don't get what/who should wait for the cluster
to heal...
Is there some kind of built-in autopatching feature?
Here is my approach:
1. Set global maintenance
2. Power off the engine
3. Create a gluster snapshot of the engine's volume
4. Power on engine manually
5. Check engine status
6. Upgrade engine
7. Upgrade engine's OS
8. Reboot engine and check health
9. Remove global maintenance
10. Set a host into local maintenance (evacuate all VMs)
11. Use UI to patch the host (enable autoreboot)
12. When host is up - login and check gluster volumes' heal status
13. Remove maintenance for the host and repeate for the rest of the cluster.
I realize that for large clusters this approach is tedious and an automatic approach can
be scripted.
Best Regards,
Strahil NikolovOn Sep 16, 2019 11:02, Kaustav Majumder <kmajumde(a)redhat.com> wrote:
Hi Jayme,
It would be great if you could raise a bug regarding the same.
On Wed, Sep 11, 2019 at 5:05 PM Jayme <jaymef(a)gmail.com> wrote:
>
> This sounds similar to the issue I hit with the cluster upgrade process in my
environment. I have large 2tb ssds and most of my vms are several hundred Gbs in size. The
heal process after host reboot can take 5-10 minutes to complete. I may be able to address
this with better gluster tuning.
>
> Either way the upgrade process should be aware of the heal status and wait for it to
complete before attempting to move on to the next host.
>
>
> On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sabose(a)redhat.com> wrote:
>>
>>
>>
>> On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mperina(a)redhat.com> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbonazzo(a)redhat.com>
wrote:
>>>>
>>>>
>>>>
>>>> Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jaymef(a)gmail.com> ha
scritto:
>>>>>
>>>>> I’m aware of the heal process but it’s unclear to me if the update
continues to run while the volumes are healing and resumes when they are done. There
doesn’t seem to be any indication in the ui (unless I’m mistaken)
>>>>
>>>>
>>>> Adding @Martin Perina , @Sahina Bose and @Laura Wright on this,
hyperconverged deployments using cluster upgrade command would probably need some
improvement.
>>>
>>>
>>> The cluster upgrade process continues to the 2nd host after the 1st host
becomes Up. If 2nd host then fails to switch to maintenance, we stop the upgrade process
to prevent breakage.
>>> Sahina, is gluster healing process status exposed in RESTAPI? If so, does it
makes sense to wait for healing to be finished before trying to move next host to
maintenance? Or any other ideas how to improve?
>>
>>
>> I need to cross-check this, if we expose the heal count in the gluster bricks.
Moving a host to maintenance does check if there are pending heal entries or possibility
of quorum loss. And this would prevent the additional hosts to upgrade.
>> +Gobinda Das +Sachidananda URS
>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <okane(a)khm.de>
wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Often(?), updates to a hypervisor that also has (provides) a
Gluster
>>>>>> brick takes the hypervisor offline (updates often require a
reboot).
>>>>>>
>>>>>> This reboot then makes the brick "out of sync" and it
has to be resync'd.
>>>>>>
>>>>>> I find it a "feature" than another host that is also
part of a gluster
>>>>>> domain can not be updated (rebooted) before all the bricks are
updated
>>>>>> in order to guarantee there is not data loss. It is called
Quorum, or?
>>>>>>
>>>>>> Always let the heal process end. Then the next update can start.
>>>>>> For me there is ALWAYS a healing time before Gluster is happy
again.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Robert O'Kane
>>>>>>
>>>>>>
>>>>>> Am 06.08.2019 um 16:38 schrieb Shani Leviim:
>>>>>> > Hi Jayme,
>>>>>> > I can't recall such a healing time.
>>>>>> > Can you please retry and attach the engine & vdsm logs
so we'll be smarter?
>>>>>> >
>>>>>> > *Regards,
>>>>>> > *
>>>>>> > *Shani Leviim
>>>>>> > *
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Aug 6, 2019 at 5:24 PM Jayme <jaymef(a)gmail.com
>>>>>> > <mailto:jaymef@gmail.com>> wrote:
>>>>>> >
>>>>>> > I've yet to have cluster upgrade finish updating my
three host HCI
>>>>>> > cluster. The most recent try was today moving from
oVirt 4.3.3 to
>>>>>> > 4.3.5.5. The first host updates normally, but when it
moves on to
>>>>>> > the second host it fails to put it in maintenance and
the cluster
>>>>>> > upgrade stops.
>>>>>> >
>>>>>> > I suspect this is due to that fact that after my hosts
are updated
>>>>>> > it takes 10 minutes or more for all volumes to
sync/heal. I have
>>>>>> > 2Tb SSDs.
>>>>>> >
>>>>>> > Does the cluster upgrade process take heal time in to
account before
>>>>>> > attempting to place the next host in maintenance to
upgrade it? Or
>>>>>> > is there something else that may be at fault here, or
perhaps a
>>>>>> > reason why the heal process takes 10 minutes after
reboot to complete?
>>>>>> > _______________________________________________
>>>>>> > Users mailing list -- users(a)ovirt.org
<mailto:users@ovirt.org>
>>>>>> > To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>> > <mailto:users-leave@ovirt.org>
>>>>>> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> > oVirt Code of Conduct:
>>>>>> >
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> > List Archives:
>>>>>> >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIPAKY4KTTOSJZMCWHUPD/
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Users mailing list -- users(a)ovirt.org
>>>>>> > To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> > oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> > List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGM...
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Systems Administrator
>>>>>> Kunsthochschule für Medien Köln
>>>>>> Peter-Welter-Platz 2
>>>>>> 50676 Köln
>>>>>> _______________________________________________
>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI...
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL...
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sandro Bonazzola
>>>>
>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>>>
>>>> Red Hat EMEA
>>>>
>>>> sbonazzo(a)redhat.com
>>>>
>>>> Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.
>>>
>>>
>>>
>>> --
>>> Martin Perina
>>> Manager, Software Engineering
>>> Red Hat Czech s.r.o.
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I4KLDBPYBCQ...
--
Thanks,
Kaustav Majumder