Re: [ovirt-users] oVirt HA behavior

19 Sep 2017


      On Tue, Sep 19, 2017 at 3:27 PM, Alex K <rightkicktech@gmail.com> wrote:
...
On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas@redhat.com> wrote:
...
On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech@gmail.com> wrote:
...
A second test did not yield the same result.
This time the VMs were restarted to another host and when the lost host
recovered no VMs were running on it.
Seems that there is a racing issue somewhere.
Did you test with the same VM?
Yes
...
were the disks + lease located on the same storage domains in both tests?
Yes. On all cases the leases are on same storage domain, the same where
the VM disks reside.
...
did the VM run on the same host (and if not, is the libvirt + qemu
versions different between the two?).
Yes
...
It may be a racing issue but not necessarily. There is an observation in
the bug I mentioned before that it happens only (/more) with certain
storage types...
The storage is based on gluster volume, replica 3 with 1 arbiter.
The gluster version is 3.8.12.
A third test yielded the same issue, VMs on recovered host remained in
paused status.
Ack, thanks.
So I suggest you to add yourself (as CC) to [1] so you will be informed
about the resolution for this. In light of your answers it does look like a
racing issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865
...
...
...
Thanx,
Alex
On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas@redhat.com> wrote:
...
On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech@gmail.com>
wrote:
...
Hi again,
I performed a different test by isolating one host (say host A)
through removing all its network interfaces (thus power management through
IPMI was also not avaialble).
The VMs (with VM lease enabled) were successfully restarted to another
host.
When connecting back the host A, the cluster performed a power
management and the host became a member of the cluster.
The VMs that were running on the host A were found "paused", which is
normal.
After 15 minutes I see that the VMs at host A are still in "paused"
state and I would expect that the cluster should decide at some point to
shutdown the paused VMs and continue with the VMs that are already running
at other hosts.
Is this behavior normal?
I believe it is not the expected behavior - the VM should not stay in
paused state when its lease expires. But we know about this, see comment 9
in [1].
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865
...
Thanx,
Alex
On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech@gmail.com>
wrote:
...
Hi All,
Just completed the tests and it works great.
VM leases is just what I needed.
Thanx,
Alex
On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <ykaul@redhat.com>
wrote:
>
>
> On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkicktech@gmail.com>
> wrote:
>
>> Enabling VM leases could be an answer to this. Will test tomorrow.
>>
>>
> Indeed. Let us know how it worked for you.
>
>
>> Thanx,
>> Alex
>>
>> On Sep 18, 2017 7:50 PM, "Alex K" <rightkicktech@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have the following issue with the HA behavior of oVirt 4.1 and
>> need to check with you if there is any work around from your experience.
>>
>> I have 3 servers (A, B, C) with hosted engine in self hosted setup
>> on top gluster with replica 3 + 1 arbiter. All good except one point:
>>
>> The hosts have been configured with power management using IPMI
>> (server iLO).
>> If I disconnect power from one host (say C) (or disconnect all
>> network cables of the host) the two other hosts go to a loop where they try
>> to verify the status of the host C by issuing power management commands to
>> the host C. Since power of host is off the server iLO does not respond on
>> the network and the power management of host C fails, leaving the VMs that
>> were running on the host C in an unknown state and they are never restarted
>> to the other hosts.
>>
>> Is there any fencing option to change this behavior so as if both
>> available hosts fail to do power management of the unresponsive host to
>> decide that the host is down and to restart the VMs of that host to the
>> other available hosts.
>>
>>
> No, this is a bad assumption. Perhaps they are the ones isolated
> form it?
> Y.
>
>
>>
>> I could also add additional power management through UPS to avoid
>> this issue but this is not currently an option and I am interested to see
>> if this behavior can be tweaked.
>>
>> Thanx,
>> Alex
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users