[ovirt-users] oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

Simone Tiraboschi stirabos at redhat.com
Fri Sep 16 14:07:27 UTC 2016


On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov at it-kb.ru> wrote:

> So, colleagues.
> I again tested the Fencing and now I think that my host-server
> power-button (physically or through ILO) sends a KILL-command to the host
> OS (and as a result to VM)
> This journald log in my guest OS when I press the power-button on the host:
>
> ...
> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon...
> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID
> 1000...
> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades
> Shutdown...
> Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063
> main.go:67: Exiting on terminated signal.
> Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session
> closed for user user
> Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session
> closed for user root
> Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting,
> returning status 0
> Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session
> closed for user root
> Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67:
> Exiting on terminated signal.
> Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating.
> ...
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All
> Filesystems.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File
> Systems (Pre).
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2
> mirrors, snapshots etc. using dmeventd or progress polling...
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel
> File Systems.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device
> Nodes in /dev.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot...
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2
> mirrors, snapshots etc. using dmeventd or progress polling.
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down.
> Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR*
> head number too large or missing monitors config: ffffc9000084a000,
> 0systemd-shutdown[1]: Sending SIGTERM to remaining processes...
> Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped
> -- Reboot --
>
> Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
>
> If I test the unavailability of a host other ways that everything is going
> well.
>
> I described my experience testing Fencing on practical examples on my blog
> for everyone in Russian.
> https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-
> about-ssh-soft-fencing-and-hard-fencing-over-hp-proliant-
> ilo2-power-managment-agent-and-test-of-high-availability/
>
>
> Thank you all very much for your participation and support.
>
> Michal, what kind of scenario are you talking about?
>

Basically what you just did,
the question is what happens when you run 'shutdown -h now' (or press the
physical button if configured to trigger a soft shutdown); is it going to
propagate somehow the shutdown action to the VMs or to brutally kill them?

In the first case the VMs will not restart regardless of their HA flags.


>
>
> PS: Excuse me for my bad English :)
>
>
> 16.09.2016, 16:37, "Simone Tiraboschi" <stirabos at redhat.com>:
> > On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
> >>> On 16 Sep 2016, at 15:31, aleksey.maksimov at it-kb.ru wrote:
> >>>
> >>> Hi Simone.
> >>> Exactly.
> >>> Now I'll put the journald on the guest and try to understand how the
> guest off.
> >>
> >> great. thanks
> >>
> >>> 16.09.2016, 16:25, "Simone Tiraboschi" <stirabos at redhat.com>:
> >>>> On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
> >>>>>> On 16 Sep 2016, at 15:05, Gianluca Cecchi <
> gianluca.cecchi at gmail.com> wrote:
> >>>>>>
> >>>>>> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
> >>>>>>> no, that’s not how HA works today. When you log into a guest and
> issue “shutdown” we do not restart the VM under your hands. We can argue
> how it should or may work, but this is the defined behavior since the dawn
> of oVirt.
> >>>>>>>
> >>>>>>>> ​AFAIK that's correct, we need to be able ​
> >>>>>>>> ​shutdown HA VM​
> >>>>>>>> ​
> >>>>>>>> ​ without being it immediately restarted on different host. We
> want to restart HA VM only if host, where HA VM is running, is
> non-responsive.
> >>>>>>>
> >>>>>>> we try to restart it in all other cases other than user initiated
> shutdown, e.g. a QEMU process crash on an otherwise-healthy host
> >>>>>> Hi, just another question in case HA is not configured at all.
> >>>>>
> >>>>> by “HA configured” I expect you’re referring to the “Highly
> Available” checkbox in Edit VM dialog.
> >>>>>
> >>>>>> If I run the "shutdown -h now" command on an host where some VMs
> are running, what is the expected behavior?
> >>>>>> Clean VM shutdown (with or without timeout in case it doesn't
> complete?) or crash of their related QEMU processes?
> >>>>>
> >>>>> expectation is that you won’t do that. That’s why there is the
> Maintenance host state.
> >>>>> But if you do that regardless, with VMs running, all the processes
> will be terminated in a regular system way, i.e. all QEMU processes get
> SIGTERM. From the perspective of each guest this is not a clean shutdown
> and it would just get killed
> >>>>
> >>>> Aleksey is reporting that he started a shutdown on his host by power
> management and the VM processes didn't get roughly killed but smoothly shut
> down and so they didn't restarted regardless of their HA flag and so this
> thread.
> >>
> >> Gianluca talks about “shutdown -h now”, you talk about power management
> action, those are two different things. The current idea is that systemd or
> some other component just propagates the action to the guest and if that
> guest is configured to handle it as a shutdown it starts it itself as well
> so it looks like a user-initiated one. Even though this mostly makes sense
> it is not ok for current HA logic
> >
> > Aleksey, can you please also test this scenario?
> >>>>> Thanks,
> >>>>> michal
> >>>>>> Thanks,
> >>>>>> Gianluca
> >>>>>> _______________________________________________
> >>>>>> Users mailing list
> >>>>>> Users at ovirt.org
> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> Users mailing list
> >>>>> Users at ovirt.org
> >>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160916/aec05903/attachment-0001.html>


More information about the Users mailing list