[ovirt-users] oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

Simone Tiraboschi stirabos at redhat.com
Fri Sep 16 11:54:14 UTC 2016


On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina at redhat.com> wrote:

>
>
> On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
>
>>
>> > On 16 Sep 2016, at 08:29, aleksey.maksimov at it-kb.ru wrote:
>> >
>> > There are more ideas?
>> >
>> > 15.09.2016, 14:40, "aleksey.maksimov at it-kb.ru" <
>> aleksey.maksimov at it-kb.ru>:
>> >> Martin, I physically turned off the server through the iLO2. See
>> screenshots.
>> >> I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time.
>> >> The virtual machine has been turned on at the time when the host shut
>> down.
>> >>
>> >> 15.09.2016, 14:27, "Martin Perina" <mperina at redhat.com>:
>> >>>  Hi,
>> >>>
>> >>>  I found out this in the log:
>> >>>
>> >>>  2016-09-15 12:02:04,661 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>> (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02)
>> moved from 'Up' --> 'Down'
>> >>>  2016-09-15 12:02:04,788 INFO  [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) []
>> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM
>> KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
>>
>> since it shut down cleanly, can you please check the guest's logs to see
>> what triggered the shutdown? In such cases it is considered a user
>> requested shutdown and such VMs are not restarted automatically
>>
>
> ​That's exactly what I meant by my response. From the log it's obvious
> that VM was shutdown properly, so engine will not restart it on a
> different. host. Also on most modern hosts if you execute power management
> off action, a signal is sent to OS to execute ​
>
> ​regular shutdown so VMs are also shutted down properly.
>

I understand the reason, but is it really what the user expects?

I mean, if I set HA mode on a VM I'd expect the that the engine cares to
keep it up of restart if needed regardless of shutdown reasons.
For instance, on hosted-engine the HA agent, if not in global maintenance
mode, will restart the engine VM regardless of who or why it went off.



>>
>> We are aware of a similar issue on specific hw -
>> https://bugzilla.redhat.com/show_bug.cgi?id=1341106
>>
>> >>>
>> >>>  If I'm not mistaken, this means that VM was properly shutted down
>> from within itself and in that case it's not restarted automatically. So
>> I'm curious what actions have you made to make host KOM-AD01-VM31
>> non-responsive?
>> >>>
>> >>>  If you want to test fencing properly, then I suggest you to either
>> block connection between host and engine on host side and forcibly stop
>> ovirtmgmt network interface on host and watch fencing is applied.
>>
>
> ​Try above if you want to test fencing. Of course you can always configure
> firewall rule to drop all packets between engine and host or unplug host
> network cable​.
>
> >>>
>> >>>  Martin
>> >>>
>> >>>  On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov at it-kb.ru> wrote:
>> >>>>  engine.log for this period.
>> >>>>
>> >>>>  15.09.2016, 14:01, "Martin Perina" <mperina at redhat.com>:
>> >>>>>  On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov at it-kb.ru>
>> wrote:
>> >>>>>>  Hi Martin.
>> >>>>>>  I have a stupid question. Use Watchdog device mandatory to
>> automatically start a virtual machine in host Fencing process?
>> >>>>>
>> >>>>>  ​AFAIK it's not, but I'm not na expert, adding Arik.
>> >>>>>
>> >>>>>  You need correct power management setup for the hosts and VM has
>> to be marked as highly available​ for sure.
>> >>>>>
>> >>>>>>  15.09.2016, 13:43, "Martin Perina" <mperina at redhat.com>:
>> >>>>>>>  Hi,
>> >>>>>>>
>> >>>>>>>  could you please share whole engine.log?
>> >>>>>>>
>> >>>>>>>  Thanks
>> >>>>>>>
>> >>>>>>>  Martin Perina
>> >>>>>>>
>> >>>>>>>  On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov at it-kb.ru>
>> wrote:
>> >>>>>>>>  Hello oVirt guru`s !
>> >>>>>>>>
>> >>>>>>>>  I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2
>> hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
>> >>>>>>>>
>> >>>>>>>>  1. I configured Power Management for the Hosts (successfully
>> added Fencing Agent for iLO2 from my hosts)
>> >>>>>>>>
>> >>>>>>>>  2. I created new VM (KOM-AD01-PBX02) and installed Guest OS
>> (Ubuntu Server 16.04 LTS) and oVirt Guest Agent
>> >>>>>>>>  (As described herein https://blog.it-kb.ru/2016/09/
>> 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log
>> ical-network-vlan-vm-settings-console-guest-agent-live-migration/)
>> >>>>>>>>     In VM settings on "High Availability" I turned on the option
>> "Highly Available" and change "Priority" to "High"
>> >>>>>>>>
>> >>>>>>>>  3. Now I'm trying to check Hard-Fencing and power off my first
>> host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
>> >>>>>>>>
>> >>>>>>>>  Fencing successfully works and server is automatically turned
>> on, but my HA VM not started on second host (KOM-AD01-VM32).
>> >>>>>>>>
>> >>>>>>>>  These events I see in the oVirt web console:
>> >>>>>>>>
>> >>>>>>>>  Sep 15, 2016 12:08:13 PM        Host KOM-AD01-VM31 power
>> management was verified successfully.
>> >>>>>>>>  Sep 15, 2016 12:08:13 PM        Status of host KOM-AD01-VM31
>> was set to Up.
>> >>>>>>>>  Sep 15, 2016 12:08:05 PM        Executing power management
>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>> ilo:KOM-AD01-ILO31.holding.com.
>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Host KOM-AD01-VM31 is rebooting.
>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Host KOM-AD01-VM31 was started
>> by SYSTEM.
>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Power management start of Host
>> KOM-AD01-VM31 succeeded.
>> >>>>>>>>  Sep 15, 2016 12:05:41 PM        Executing power management
>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>> ilo:KOM-AD01-ILO31.holding.com.
>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Executing power management
>> start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>> ilo:KOM-AD01-ILO31.holding.com.
>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Power management start of Host
>> KOM-AD01-VM31 initiated.
>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Auto fence for host
>> KOM-AD01-VM31 was started.
>> >>>>>>>>  Sep 15, 2016 12:05:11 PM        Executing power management
>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>> ilo:KOM-AD01-ILO31.holding.com.
>> >>>>>>>>  Sep 15, 2016 12:05:04 PM        Executing power management
>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>> ilo:KOM-AD01-ILO31.holding.com.
>> >>>>>>>>  Sep 15, 2016 12:05:04 PM        Host KOM-AD01-VM31 is non
>> responsive.
>> >>>>>>>>  Sep 15, 2016 12:02:32 PM        Host KOM-AD01-VM31 is not
>> responding. It will stay in Connecting state for a grace period of 60
>> seconds and after that an attempt to fence the host will be issued.
>> >>>>>>>>  Sep 15, 2016 12:02:32 PM        VDSM KOM-AD01-VM31 command
>> failed: Heartbeat exeeded
>> >>>>>>>>  Sep 15, 2016 12:02:04 PM        VM KOM-AD01-PBX02 is down. Exit
>> message: User shut down from within the guest
>> >>>>>>>>
>> >>>>>>>>  What am I doing wrong? Why HA VM not start on a second host?
>> >>>>>>>>  _______________________________________________
>> >>>>>>>>  Users mailing list
>> >>>>>>>>  Users at ovirt.org
>> >>>>>>>>  http://lists.ovirt.org/mailman/listinfo/users
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> >
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160916/ad895b78/attachment-0001.html>


More information about the Users mailing list