On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas@redhat.com> wrote:On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech@gmail.com> wrote:Seems that there is a racing issue somewhere.A second test did not yield the same result.This time the VMs were restarted to another host and when the lost host recovered no VMs were running on it.Did you test with the same VM?Yeswere the disks + lease located on the same storage domains in both tests?Yes. On all cases the leases are on same storage domain, the same where the VM disks reside.did the VM run on the same host (and if not, is the libvirt + qemu versions different between the two?).YesIt may be a racing issue but not necessarily. There is an observation in the bug I mentioned before that it happens only (/more) with certain storage types...The storage is based on gluster volume, replica 3 with 1 arbiter.
The gluster version is 3.8.12.
A third test yielded the same issue, VMs on recovered host remained in paused status.
Thanx,AlexOn Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas@redhat.com> wrote:On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech@gmail.com> wrote:Is this behavior normal?After 15 minutes I see that the VMs at host A are still in "paused" state and I would expect that the cluster should decide at some point to shutdown the paused VMs and continue with the VMs that are already running at other hosts.The VMs that were running on the host A were found "paused", which is normal.When connecting back the host A, the cluster performed a power management and the host became a member of the cluster.The VMs (with VM lease enabled) were successfully restarted to another host.Hi again,I performed a different test by isolating one host (say host A) through removing all its network interfaces (thus power management through IPMI was also not avaialble).I believe it is not the expected behavior - the VM should not stay in paused state when its lease expires. But we know about this, see comment 9 in [1].AlexThanx,On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech@gmail.com> wrote:AlexThanx,VM leases is just what I needed.Hi All,Just completed the tests and it works great.On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <ykaul@redhat.com> wrote:On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkicktech@gmail.com> wrote:Enabling VM leases could be an answer to this. Will test tomorrow.Indeed. Let us know how it worked for you.Thanx,AlexOn Sep 18, 2017 7:50 PM, "Alex K" <rightkicktech@gmail.com> wrote:Is there any fencing option to change this behavior so as if both available hosts fail to do power management of the unresponsive host to decide that the host is down and to restart the VMs of that host to the other available hosts.If I disconnect power from one host (say C) (or disconnect all network cables of the host) the two other hosts go to a loop where they try to verify the status of the host C by issuing power management commands to the host C. Since power of host is off the server iLO does not respond on the network and the power management of host C fails, leaving the VMs that were running on the host C in an unknown state and they are never restarted to the other hosts.The hosts have been configured with power management using IPMI (server iLO).I have 3 servers (A, B, C) with hosted engine in self hosted setup on top gluster with replica 3 + 1 arbiter. All good except one point:Hi All,I have the following issue with the HA behavior of oVirt 4.1 and need to check with you if there is any work around from your experience.No, this is a bad assumption. Perhaps they are the ones isolated form it?Y.AlexThanx,I could also add additional power management through UPS to avoid this issue but this is not currently an option and I am interested to see if this behavior can be tweaked.
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users