On Tue, Sep 19, 2017 at 3:27 PM, Alex K <rightkicktech@gmail.com> wrote:

On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas@redhat.com> wrote:

On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech@gmail.com> wrote:
A second test did not yield the same result.
This time the VMs were restarted to another host and when the lost host recovered no VMs were running on it.
Seems that there is a racing issue somewhere.

Did you test with the same VM?
Yes
were the disks + lease located on the same storage domains in both tests?
Yes. On all cases the leases are on same storage domain, the same where the VM disks reside.
did the VM run on the same host (and if not, is the libvirt + qemu versions different between the two?).
Yes
It may be a racing issue but not necessarily. There is an observation in the bug I mentioned before that it happens only (/more) with certain storage types...
The storage is based on gluster volume, replica 3 with 1 arbiter.
The gluster version is 3.8.12.
A third test yielded the same issue, VMs on recovered host remained in paused status.

Ack, thanks.

So I suggest you to add yourself (as CC) to [1] so you will be informed about the resolution for this. In light of your answers it does look like a racing issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865

Thanx,
Alex

On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas@redhat.com> wrote:

On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech@gmail.com> wrote:
Hi again,

I performed a different test by isolating one host (say host A) through removing all its network interfaces (thus power management through IPMI was also not avaialble).
The VMs (with VM lease enabled) were successfully restarted to another host.
When connecting back the host A, the cluster performed a power management and the host became a member of the cluster.
The VMs that were running on the host A were found "paused", which is normal.
After 15 minutes I see that the VMs at host A are still in "paused" state and I would expect that the cluster should decide at some point to shutdown the paused VMs and continue with the VMs that are already running at other hosts.

Is this behavior normal?

I believe it is not the expected behavior - the VM should not stay in paused state when its lease expires. But we know about this, see comment 9 in [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865

Thanx,
Alex

On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech@gmail.com> wrote:
Hi All,

Just completed the tests and it works great.
VM leases is just what I needed.

Thanx,
Alex

On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <ykaul@redhat.com> wrote:

On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkicktech@gmail.com> wrote:
Enabling VM leases could be an answer to this. Will test tomorrow.

Indeed. Let us know how it worked for you.

Thanx,
Alex

On Sep 18, 2017 7:50 PM, "Alex K" <rightkicktech@gmail.com> wrote:
Hi All,

I have the following issue with the HA behavior of oVirt 4.1 and need to check with you if there is any work around from your experience.

I have 3 servers (A, B, C) with hosted engine in self hosted setup on top gluster with replica 3 + 1 arbiter. All good except one point:

The hosts have been configured with power management using IPMI (server iLO).
If I disconnect power from one host (say C) (or disconnect all network cables of the host) the two other hosts go to a loop where they try to verify the status of the host C by issuing power management commands to the host C. Since power of host is off the server iLO does not respond on the network and the power management of host C fails, leaving the VMs that were running on the host C in an unknown state and they are never restarted to the other hosts.

Is there any fencing option to change this behavior so as if both available hosts fail to do power management of the unresponsive host to decide that the host is down and to restart the VMs of that host to the other available hosts.

No, this is a bad assumption. Perhaps they are the ones isolated form it?
Y.

I could also add additional power management through UPS to avoid this issue but this is not currently an option and I am interested to see if this behavior can be tweaked.

Thanx,
Alex

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users