[ovirt-users] oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

Martin Perina mperina at redhat.com
Fri Sep 16 08:23:09 EDT 2016


On Fri, Sep 16, 2016 at 1:54 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina at redhat.com>
> wrote:
>
>>
>>
>> On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek <
>> michal.skrivanek at redhat.com> wrote:
>>
>>>
>>> > On 16 Sep 2016, at 08:29, aleksey.maksimov at it-kb.ru wrote:
>>> >
>>> > There are more ideas?
>>> >
>>> > 15.09.2016, 14:40, "aleksey.maksimov at it-kb.ru" <
>>> aleksey.maksimov at it-kb.ru>:
>>> >> Martin, I physically turned off the server through the iLO2. See
>>> screenshots.
>>> >> I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time.
>>> >> The virtual machine has been turned on at the time when the host shut
>>> down.
>>> >>
>>> >> 15.09.2016, 14:27, "Martin Perina" <mperina at redhat.com>:
>>> >>>  Hi,
>>> >>>
>>> >>>  I found out this in the log:
>>> >>>
>>> >>>  2016-09-15 12:02:04,661 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>>> (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02)
>>> moved from 'Up' --> 'Down'
>>> >>>  2016-09-15 12:02:04,788 INFO  [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) []
>>> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM
>>> KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
>>>
>>> since it shut down cleanly, can you please check the guest's logs to see
>>> what triggered the shutdown? In such cases it is considered a user
>>> requested shutdown and such VMs are not restarted automatically
>>>
>>
>> ​That's exactly what I meant by my response. From the log it's obvious
>> that VM was shutdown properly, so engine will not restart it on a
>> different. host. Also on most modern hosts if you execute power management
>> off action, a signal is sent to OS to execute ​
>>
>> ​regular shutdown so VMs are also shutted down properly.
>>
>
> I understand the reason, but is it really what the user expects?
>
> I mean, if I set HA mode on a VM I'd expect the that the engine cares to
> keep it up of restart if needed regardless of shutdown reasons.
>

​AFAIK that's correct, we need to be able ​
​shutdown HA VM​
​
​ without being it immediately restarted on different host. We want to
restart HA VM only if host, where HA VM is running, is non-responsive.

For instance, on hosted-engine the HA agent, if not in global maintenance
> mode, will restart the engine VM regardless of who or why it went off.
>

​Well, HE VM is definitely not a standard HA VM :-)
​


>
>
>
>>>>
>>> We are aware of a similar issue on specific hw -
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1341106
>>>
>>> >>>
>>> >>>  If I'm not mistaken, this means that VM was properly shutted down
>>> from within itself and in that case it's not restarted automatically. So
>>> I'm curious what actions have you made to make host KOM-AD01-VM31
>>> non-responsive?
>>> >>>
>>> >>>  If you want to test fencing properly, then I suggest you to either
>>> block connection between host and engine on host side and forcibly stop
>>> ovirtmgmt network interface on host and watch fencing is applied.
>>>
>>
>> ​Try above if you want to test fencing. Of course you can always
>> configure firewall rule to drop all packets between engine and host or
>> unplug host network cable​.
>>
>> >>>
>>> >>>  Martin
>>> >>>
>>> >>>  On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov at it-kb.ru> wrote:
>>> >>>>  engine.log for this period.
>>> >>>>
>>> >>>>  15.09.2016, 14:01, "Martin Perina" <mperina at redhat.com>:
>>> >>>>>  On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov at it-kb.ru>
>>> wrote:
>>> >>>>>>  Hi Martin.
>>> >>>>>>  I have a stupid question. Use Watchdog device mandatory to
>>> automatically start a virtual machine in host Fencing process?
>>> >>>>>
>>> >>>>>  ​AFAIK it's not, but I'm not na expert, adding Arik.
>>> >>>>>
>>> >>>>>  You need correct power management setup for the hosts and VM has
>>> to be marked as highly available​ for sure.
>>> >>>>>
>>> >>>>>>  15.09.2016, 13:43, "Martin Perina" <mperina at redhat.com>:
>>> >>>>>>>  Hi,
>>> >>>>>>>
>>> >>>>>>>  could you please share whole engine.log?
>>> >>>>>>>
>>> >>>>>>>  Thanks
>>> >>>>>>>
>>> >>>>>>>  Martin Perina
>>> >>>>>>>
>>> >>>>>>>  On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov at it-kb.ru>
>>> wrote:
>>> >>>>>>>>  Hello oVirt guru`s !
>>> >>>>>>>>
>>> >>>>>>>>  I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS
>>> 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
>>> >>>>>>>>
>>> >>>>>>>>  1. I configured Power Management for the Hosts (successfully
>>> added Fencing Agent for iLO2 from my hosts)
>>> >>>>>>>>
>>> >>>>>>>>  2. I created new VM (KOM-AD01-PBX02) and installed Guest OS
>>> (Ubuntu Server 16.04 LTS) and oVirt Guest Agent
>>> >>>>>>>>  (As described herein https://blog.it-kb.ru/2016/09/
>>> 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log
>>> ical-network-vlan-vm-settings-console-guest-agent-live-migration/)
>>> >>>>>>>>     In VM settings on "High Availability" I turned on the
>>> option "Highly Available" and change "Priority" to "High"
>>> >>>>>>>>
>>> >>>>>>>>  3. Now I'm trying to check Hard-Fencing and power off my first
>>> host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
>>> >>>>>>>>
>>> >>>>>>>>  Fencing successfully works and server is automatically turned
>>> on, but my HA VM not started on second host (KOM-AD01-VM32).
>>> >>>>>>>>
>>> >>>>>>>>  These events I see in the oVirt web console:
>>> >>>>>>>>
>>> >>>>>>>>  Sep 15, 2016 12:08:13 PM        Host KOM-AD01-VM31 power
>>> management was verified successfully.
>>> >>>>>>>>  Sep 15, 2016 12:08:13 PM        Status of host KOM-AD01-VM31
>>> was set to Up.
>>> >>>>>>>>  Sep 15, 2016 12:08:05 PM        Executing power management
>>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>>> ilo:KOM-AD01-ILO31.holding.com.
>>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Host KOM-AD01-VM31 is
>>> rebooting.
>>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Host KOM-AD01-VM31 was started
>>> by SYSTEM.
>>> >>>>>>>>  Sep 15, 2016 12:05:48 PM        Power management start of Host
>>> KOM-AD01-VM31 succeeded.
>>> >>>>>>>>  Sep 15, 2016 12:05:41 PM        Executing power management
>>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>>> ilo:KOM-AD01-ILO31.holding.com.
>>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Executing power management
>>> start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>>> ilo:KOM-AD01-ILO31.holding.com.
>>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Power management start of Host
>>> KOM-AD01-VM31 initiated.
>>> >>>>>>>>  Sep 15, 2016 12:05:19 PM        Auto fence for host
>>> KOM-AD01-VM31 was started.
>>> >>>>>>>>  Sep 15, 2016 12:05:11 PM        Executing power management
>>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>>> ilo:KOM-AD01-ILO31.holding.com.
>>> >>>>>>>>  Sep 15, 2016 12:05:04 PM        Executing power management
>>> status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent
>>> ilo:KOM-AD01-ILO31.holding.com.
>>> >>>>>>>>  Sep 15, 2016 12:05:04 PM        Host KOM-AD01-VM31 is non
>>> responsive.
>>> >>>>>>>>  Sep 15, 2016 12:02:32 PM        Host KOM-AD01-VM31 is not
>>> responding. It will stay in Connecting state for a grace period of 60
>>> seconds and after that an attempt to fence the host will be issued.
>>> >>>>>>>>  Sep 15, 2016 12:02:32 PM        VDSM KOM-AD01-VM31 command
>>> failed: Heartbeat exeeded
>>> >>>>>>>>  Sep 15, 2016 12:02:04 PM        VM KOM-AD01-PBX02 is down.
>>> Exit message: User shut down from within the guest
>>> >>>>>>>>
>>> >>>>>>>>  What am I doing wrong? Why HA VM not start on a second host?
>>> >>>>>>>>  _______________________________________________
>>> >>>>>>>>  Users mailing list
>>> >>>>>>>>  Users at ovirt.org
>>> >>>>>>>>  http://lists.ovirt.org/mailman/listinfo/users
>>> > _______________________________________________
>>> > Users mailing list
>>> > Users at ovirt.org
>>> > http://lists.ovirt.org/mailman/listinfo/users
>>> >
>>> >
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160916/142af18f/attachment.html>


More information about the Users mailing list