Hi,
The whole engine.log including the shutdown time (was performed around 9:19)
vdsm.log of host01 (the host which kept on running and took over the
engine) split into 3 uploads (limit of 512 kB of pastebin):
1 :
Hi,
could you please post whole engine.log (from the time which you turned off
the host with engine VM) and also vdsm.log from both hosts?
Thanks
Martin Perina
----- Original Message -----
> From: "Michael Hölzl" <mh(a)ins.jku.at>
> To: users(a)ovirt.org
> Sent: Monday, September 21, 2015 10:27:08 AM
> Subject: [ovirt-users] HA - Fencing not working when host with engine gets shutdown
>
> Hi all,
>
> we are trying to setup an ovirt environment with two hosts, both
> connected to a ISCSI storage device, a hosted engine and power
> management configured over ILO. So far it seems to work fine in our
> testing setup and starting/stopping VMs works smoothly with proper
> scheduling between those hosts. So we wanted to test HA for the VMs now
> and started to manually shutdown a host while there are still VMs
> running on that machine (to simulate power failure or a kernel panic).
> The expected outcome was that all machines were HA is enabled, are
> booted again. This works if the machine with the failure does not have
> the engine running. If the machine with the hosted engine VM gets
> shutdown, the host gets in the "Not Responsive state" and all VMs end up
> in an unkown state. However, the engine itself starts correctly on the
> second host and it seems like it tries to fence the other host (as
> expected) - Events which we get in the open virtualization manager:
> 1. Host hosted_engine_2 is non responsive
> 2. Host hosted_engine_1 from cluster Default was chosen as a proxy to
> execute Status command on Host hosted_engine_2.
> 3. Host hosted_engine_2 became non responsive. It has no power
> management configured. Please check the host status, manually reboot it,
> and click "Confirm Host Has Been Rebooted"
> 4. Host hosted_engine_2 is not responding. It will stay in Connecting
> state for a grace period of 124 seconds and after that an attempt to
> fence the host will be issued.
>
> Event 4 is continuously coming every 3 minutes. Complete engine.log file
> during engine boot up:
http://pastebin.com/D6xS3Wfy
> So the host detects the machine is not responding and wants to fence it.
> But although the host has power management configured over ILO, the
> engine thinks that it is not. As a result the second host does not get
> fenced and VMs are not migrated to the running machine.
> In the log files there are also a lot of time out exception. But I guess
> that this is because the host cannot connect to the other machine.
>
> Did anybody face similar problems with HA? Or any clue what the problem
> might be?
>
> Thanks,
> Michael
>
>
> ----
> ovirt version: 3.5.4
> Hosted engine VM OS: Cent OS 6.5
> Host Machines OS: Cent OS 7
>
> P.S. We also have to note that we had problems with the command
> fence_ipmilan at the beginning. We were receiving the message "Unable to
> obtain correct plug status or plug is not available," whenever the
> command fence_ipmilan was called. However, the command fence_ilo4
> worked. So we use a simple script for fence_ipmilan now that calls
> fence_ilo4 and passes the arguments.
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>