On Tue, Jul 6, 2021 at 7:02 PM Sandro Bonazzola <sbonazzo(a)redhat.com> wrote:
Il giorno mar 6 lug 2021 alle ore 17:33 Nir Soffer <nsoffer(a)redhat.com>
ha scritto:
> On Tue, Jul 6, 2021 at 5:58 PM Scott Worthington
> <scott.c.worthington(a)gmail.com> wrote:
> >
> >
> >
> > On Tue, Jul 6, 2021 at 8:13 AM Nir Soffer <nsoffer(a)redhat.com> wrote:
> >>
> >> On Tue, Jul 6, 2021 at 2:29 PM Sandro Bonazzola <sbonazzo(a)redhat.com>
> wrote:
> >>>
> >>>
> >>>
> >>> Il giorno mar 6 lug 2021 alle ore 13:03 Nir Soffer <
> nsoffer(a)redhat.com> ha scritto:
> >>>>
> >>>> On Tue, Jul 6, 2021 at 1:11 PM Nathanaël Blanchet
<blanchet(a)abes.fr>
> wrote:
> >>>> > We are installing UPS powerchute client on hypervisors.
> >>>> >
> >>>> > What is the default vms behaviour of running vms when an
> hypervisor is
> >>>> > ordered to shutdown: do the vms live migrate or do they
shutdown
> >>>> > properly (even the restart on an other host because of HA) ?
> >>>>
> >>>> In general VMs are not restarted after an unexpected shutdown, but
> HA VMs
> >>>> are restarted after failures.
> >>>>
> >>>> If the HA VM has a lease, it can restart safely on another host
> regardless of
> >>>> the original host status. If the HA VM does not have a lease, the
> system must
> >>>> wait until the original host is up again to check if the VM is
still
> >>>> running on this
> >>>> host.
> >>>>
> >>>> Arik can add more details on this.
> >>>
> >>>
> >>> I think the question is not related to what happens after the host is
> back.
> >>> I think the question is what happens when the host goes down.
> >>> To me, the right way to shutdown a host is putting it first to
> maintenance (VM evacuate to other hosts) and then shutdown.
> >>
> >>
> >> Right, but the we don't have integration with the UPS, so engine
> cannot put the host
> >> to maintenance when the host lose power and the UPS will shut it down
> after
> >> few minutes.
> >
> >
> > This is outside of the scope of oVirt team:
> >
> > Perhaps one could combine multiple applications ( NUT + Ansible +
> Nagios/Zabbix ) to notify the oVirt engine to switch a host to maintenance?
> >
> > NUT[0] could be configured to alert a monitoring system ( like Nagios
> or Zabbix) to trigger an Ansible playbook [1][2] to put the host in
> maintenance mode, and the trigger should happen before the UPS battery is
> depleted (you'll have to account for the time it takes to live migrate VMs).
>
> I would trigger this once power is lost. You never know how much time
> migration will take, so best migrate all vms immediately.
>
> It would be nice to integrate this with engine, but we can start by
> something
> like you describe, that will use engine API/SDK to prepare the hosts for
> graceful shutdown.
>
There are pros and cons to this approach.
If the workloads manage to get evacuated quickly, before libvirt-guests
starts shutting them down, that's great.
But what happens if the VMs are still migrated after libvirt-guests
initiated shutdowns?
Think about the following case:
1. A highly available VM starts migrating
2. libvirt-guests tries to shut down the guest
3. The migration completed
4. The guest shuts down while it runs on the destination host
I'm not sure that we'll treat that case as a non-intentional shutdown since
we may lose the context of the shutdown while the VM runs to the
destination host and therefore won't try to restart the VM automatically.