
Hi, The whole engine.log including the shutdown time (was performed around 9:19) http://pastebin.com/cdY9uTkJ vdsm.log of host01 (the host which kept on running and took over the engine) split into 3 uploads (limit of 512 kB of pastebin): 1 : http://pastebin.com/dr9jNTek 2 : http://pastebin.com/cuyHL6ne 3 : http://pastebin.com/7x2ZQy1y Michael On 09/21/2015 03:00 PM, Martin Perina wrote:
Hi,
could you please post whole engine.log (from the time which you turned off the host with engine VM) and also vdsm.log from both hosts?
Thanks
Martin Perina
----- Original Message -----
From: "Michael Hölzl" <mh@ins.jku.at> To: users@ovirt.org Sent: Monday, September 21, 2015 10:27:08 AM Subject: [ovirt-users] HA - Fencing not working when host with engine gets shutdown
Hi all,
we are trying to setup an ovirt environment with two hosts, both connected to a ISCSI storage device, a hosted engine and power management configured over ILO. So far it seems to work fine in our testing setup and starting/stopping VMs works smoothly with proper scheduling between those hosts. So we wanted to test HA for the VMs now and started to manually shutdown a host while there are still VMs running on that machine (to simulate power failure or a kernel panic). The expected outcome was that all machines were HA is enabled, are booted again. This works if the machine with the failure does not have the engine running. If the machine with the hosted engine VM gets shutdown, the host gets in the "Not Responsive state" and all VMs end up in an unkown state. However, the engine itself starts correctly on the second host and it seems like it tries to fence the other host (as expected) - Events which we get in the open virtualization manager: 1. Host hosted_engine_2 is non responsive 2. Host hosted_engine_1 from cluster Default was chosen as a proxy to execute Status command on Host hosted_engine_2. 3. Host hosted_engine_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted" 4. Host hosted_engine_2 is not responding. It will stay in Connecting state for a grace period of 124 seconds and after that an attempt to fence the host will be issued.
Event 4 is continuously coming every 3 minutes. Complete engine.log file during engine boot up: http://pastebin.com/D6xS3Wfy So the host detects the machine is not responding and wants to fence it. But although the host has power management configured over ILO, the engine thinks that it is not. As a result the second host does not get fenced and VMs are not migrated to the running machine. In the log files there are also a lot of time out exception. But I guess that this is because the host cannot connect to the other machine.
Did anybody face similar problems with HA? Or any clue what the problem might be?
Thanks, Michael
---- ovirt version: 3.5.4 Hosted engine VM OS: Cent OS 6.5 Host Machines OS: Cent OS 7
P.S. We also have to note that we had problems with the command fence_ipmilan at the beginning. We were receiving the message "Unable to obtain correct plug status or plug is not available," whenever the command fence_ipmilan was called. However, the command fence_ilo4 worked. So we use a simple script for fence_ipmilan now that calls fence_ilo4 and passes the arguments. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users