[ovirt-users] HA - Fencing not working when host with engine gets shutdown

Michael Hölzl mh at ins.jku.at
Mon Sep 21 14:47:06 UTC 2015


Hi,

The whole engine.log including the shutdown time (was performed around 9:19)
http://pastebin.com/cdY9uTkJ

vdsm.log of host01 (the host which kept on running and took over the
engine) split into 3 uploads (limit of 512 kB of pastebin):
1 : http://pastebin.com/dr9jNTek
2 : http://pastebin.com/cuyHL6ne
3 : http://pastebin.com/7x2ZQy1y

Michael

On 09/21/2015 03:00 PM, Martin Perina wrote:
> Hi,
>
> could you please post whole engine.log (from the time which you turned off
> the host with engine VM) and also vdsm.log from both hosts?
>
> Thanks
>
> Martin Perina
>
> ----- Original Message -----
>> From: "Michael Hölzl" <mh at ins.jku.at>
>> To: users at ovirt.org
>> Sent: Monday, September 21, 2015 10:27:08 AM
>> Subject: [ovirt-users] HA - Fencing not working when host with engine gets	shutdown
>>
>> Hi all,
>>
>> we are trying to setup an ovirt environment with two hosts, both
>> connected to a ISCSI storage device, a hosted engine and power
>> management configured over ILO. So far it seems to work fine in our
>> testing setup and starting/stopping VMs works smoothly with proper
>> scheduling between those hosts. So we wanted to test HA for the VMs now
>> and started to manually shutdown a host while there are still VMs
>> running on that machine (to simulate power failure or a kernel panic).
>> The expected outcome was that all machines were HA is enabled, are
>> booted again. This works if the machine with the failure does not have
>> the engine running. If the machine with the hosted engine VM gets
>> shutdown, the host gets in the "Not Responsive state" and all VMs end up
>> in an unkown state. However, the engine itself starts correctly on the
>> second host and it seems like it tries to fence the other host (as
>> expected) - Events which we get in the open virtualization manager:
>> 1. Host hosted_engine_2 is non responsive
>> 2. Host hosted_engine_1 from cluster Default was chosen as a proxy to
>> execute Status command on Host hosted_engine_2.
>> 3. Host hosted_engine_2 became non responsive. It has no power
>> management configured. Please check the host status, manually reboot it,
>> and click "Confirm Host Has Been Rebooted"
>> 4. Host hosted_engine_2 is not responding. It will stay in Connecting
>> state for a grace period of 124 seconds and after that an attempt to
>> fence the host will be issued.
>>
>> Event 4 is continuously coming every 3 minutes. Complete engine.log file
>> during engine boot up: http://pastebin.com/D6xS3Wfy
>> So the host detects the machine is not responding and wants to fence it.
>> But although the host has power management configured over ILO, the
>> engine thinks that it is not. As a result the second host does not get
>> fenced and VMs are not migrated to the running machine.
>> In the log files there are also a lot of time out exception. But I guess
>> that this is because the host cannot connect to the other machine.
>>
>> Did anybody face similar problems with HA? Or any clue what the problem
>> might be?
>>
>> Thanks,
>> Michael
>>
>>
>> ----
>> ovirt version: 3.5.4
>> Hosted engine VM OS: Cent OS 6.5
>> Host Machines OS: Cent OS 7
>>
>> P.S. We also have to note that we had problems with the command
>> fence_ipmilan at the beginning. We were receiving the message "Unable to
>> obtain correct plug status or plug is not available," whenever the
>> command fence_ipmilan was called. However, the command fence_ilo4
>> worked. So we use a simple script for fence_ipmilan now that calls
>> fence_ilo4 and passes the arguments.
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>



More information about the Users mailing list