[ovirt-users] oVirt HA behavior

Alex K rightkicktech at gmail.com
Tue Sep 19 12:27:07 UTC 2017


On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas at redhat.com> wrote:

>
> On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech at gmail.com> wrote:
>
>> A second test did not yield the same result.
>> This time the VMs were restarted to another host and when the lost host
>> recovered no VMs were running on it.
>> Seems that there is a racing issue somewhere.
>>
>
> Did you test with the same VM?
>
Yes

> were the disks + lease located on the same storage domains in both tests?
>
Yes. On all cases the leases are on same storage domain, the same where the
VM disks reside.

> did the VM run on the same host (and if not, is the libvirt + qemu
> versions different between the two?).
>
Yes

> It may be a racing issue but not necessarily. There is an observation in
> the bug I mentioned before that it happens only (/more) with certain
> storage types...
>
The storage is based on gluster volume, replica 3 with 1 arbiter.
The gluster version is 3.8.12.
A third test yielded the same issue, VMs on recovered host remained in
paused status.


>
>
>>
>> Thanx,
>> Alex
>>
>>
>> On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas at redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech at gmail.com>
>>> wrote:
>>>
>>>> Hi again,
>>>>
>>>> I performed a different test by isolating one host (say host A) through
>>>> removing all its network interfaces (thus power management through IPMI was
>>>> also not avaialble).
>>>> The VMs (with VM lease enabled) were successfully restarted to another
>>>> host.
>>>> When connecting back the host A, the cluster performed a power
>>>> management and the host became a member of the cluster.
>>>> The VMs that were running on the host A were found "paused", which is
>>>> normal.
>>>> After 15 minutes I see that the VMs at host A are still in "paused"
>>>> state and I would expect that the cluster should decide at some point to
>>>> shutdown the paused VMs and continue with the VMs that are already running
>>>> at other hosts.
>>>>
>>>> Is this behavior normal?
>>>>
>>>
>>> I believe it is not the expected behavior - the VM should not stay in
>>> paused state when its lease expires. But we know about this, see comment 9
>>> in [1].
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865
>>>
>>>
>>>>
>>>> Thanx,
>>>> Alex
>>>>
>>>> On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Just completed the tests and it works great.
>>>>> VM leases is just what I needed.
>>>>>
>>>>> Thanx,
>>>>> Alex
>>>>>
>>>>> On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkicktech at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Enabling VM leases could be an answer to this. Will test tomorrow.
>>>>>>>
>>>>>>>
>>>>>> Indeed. Let us know how it worked for you.
>>>>>>
>>>>>>
>>>>>>> Thanx,
>>>>>>> Alex
>>>>>>>
>>>>>>> On Sep 18, 2017 7:50 PM, "Alex K" <rightkicktech at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have the following issue with the HA behavior of oVirt 4.1 and
>>>>>>> need to check with you if there is any work around from your experience.
>>>>>>>
>>>>>>> I have 3 servers (A, B, C) with hosted engine in self hosted setup
>>>>>>> on top gluster with replica 3 + 1 arbiter. All good except one point:
>>>>>>>
>>>>>>> The hosts have been configured with power management using IPMI
>>>>>>> (server iLO).
>>>>>>> If I disconnect power from one host (say C) (or disconnect all
>>>>>>> network cables of the host) the two other hosts go to a loop where they try
>>>>>>> to verify the status of the host C by issuing power management commands to
>>>>>>> the host C. Since power of host is off the server iLO does not respond on
>>>>>>> the network and the power management of host C fails, leaving the VMs that
>>>>>>> were running on the host C in an unknown state and they are never restarted
>>>>>>> to the other hosts.
>>>>>>>
>>>>>>> Is there any fencing option to change this behavior so as if both
>>>>>>> available hosts fail to do power management of the unresponsive host to
>>>>>>> decide that the host is down and to restart the VMs of that host to the
>>>>>>> other available hosts.
>>>>>>>
>>>>>>>
>>>>>> No, this is a bad assumption. Perhaps they are the ones isolated form
>>>>>> it?
>>>>>> Y.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I could also add additional power management through UPS to avoid
>>>>>>> this issue but this is not currently an option and I am interested to see
>>>>>>> if this behavior can be tweaked.
>>>>>>>
>>>>>>> Thanx,
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170919/1e1d417c/attachment.html>


More information about the Users mailing list