[ovirt-users] oVirt HA behavior
Arik Hadas
ahadas at redhat.com
Tue Sep 19 19:25:14 UTC 2017
On Tue, Sep 19, 2017 at 3:27 PM, Alex K <rightkicktech at gmail.com> wrote:
> On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas at redhat.com> wrote:
>
>>
>> On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech at gmail.com> wrote:
>>
>>> A second test did not yield the same result.
>>> This time the VMs were restarted to another host and when the lost host
>>> recovered no VMs were running on it.
>>> Seems that there is a racing issue somewhere.
>>>
>>
>> Did you test with the same VM?
>>
> Yes
>
>> were the disks + lease located on the same storage domains in both tests?
>>
> Yes. On all cases the leases are on same storage domain, the same where
> the VM disks reside.
>
>> did the VM run on the same host (and if not, is the libvirt + qemu
>> versions different between the two?).
>>
> Yes
>
>> It may be a racing issue but not necessarily. There is an observation in
>> the bug I mentioned before that it happens only (/more) with certain
>> storage types...
>>
> The storage is based on gluster volume, replica 3 with 1 arbiter.
> The gluster version is 3.8.12.
> A third test yielded the same issue, VMs on recovered host remained in
> paused status.
>
>
Ack, thanks.
So I suggest you to add yourself (as CC) to [1] so you will be informed
about the resolution for this. In light of your answers it does look like a
racing issue.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865
>
>>
>>>
>>> Thanx,
>>> Alex
>>>
>>>
>>> On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi again,
>>>>>
>>>>> I performed a different test by isolating one host (say host A)
>>>>> through removing all its network interfaces (thus power management through
>>>>> IPMI was also not avaialble).
>>>>> The VMs (with VM lease enabled) were successfully restarted to another
>>>>> host.
>>>>> When connecting back the host A, the cluster performed a power
>>>>> management and the host became a member of the cluster.
>>>>> The VMs that were running on the host A were found "paused", which is
>>>>> normal.
>>>>> After 15 minutes I see that the VMs at host A are still in "paused"
>>>>> state and I would expect that the cluster should decide at some point to
>>>>> shutdown the paused VMs and continue with the VMs that are already running
>>>>> at other hosts.
>>>>>
>>>>> Is this behavior normal?
>>>>>
>>>>
>>>> I believe it is not the expected behavior - the VM should not stay in
>>>> paused state when its lease expires. But we know about this, see comment 9
>>>> in [1].
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865
>>>>
>>>>
>>>>>
>>>>> Thanx,
>>>>> Alex
>>>>>
>>>>> On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Just completed the tests and it works great.
>>>>>> VM leases is just what I needed.
>>>>>>
>>>>>> Thanx,
>>>>>> Alex
>>>>>>
>>>>>> On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <ykaul at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkicktech at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Enabling VM leases could be an answer to this. Will test tomorrow.
>>>>>>>>
>>>>>>>>
>>>>>>> Indeed. Let us know how it worked for you.
>>>>>>>
>>>>>>>
>>>>>>>> Thanx,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> On Sep 18, 2017 7:50 PM, "Alex K" <rightkicktech at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I have the following issue with the HA behavior of oVirt 4.1 and
>>>>>>>> need to check with you if there is any work around from your experience.
>>>>>>>>
>>>>>>>> I have 3 servers (A, B, C) with hosted engine in self hosted setup
>>>>>>>> on top gluster with replica 3 + 1 arbiter. All good except one point:
>>>>>>>>
>>>>>>>> The hosts have been configured with power management using IPMI
>>>>>>>> (server iLO).
>>>>>>>> If I disconnect power from one host (say C) (or disconnect all
>>>>>>>> network cables of the host) the two other hosts go to a loop where they try
>>>>>>>> to verify the status of the host C by issuing power management commands to
>>>>>>>> the host C. Since power of host is off the server iLO does not respond on
>>>>>>>> the network and the power management of host C fails, leaving the VMs that
>>>>>>>> were running on the host C in an unknown state and they are never restarted
>>>>>>>> to the other hosts.
>>>>>>>>
>>>>>>>> Is there any fencing option to change this behavior so as if both
>>>>>>>> available hosts fail to do power management of the unresponsive host to
>>>>>>>> decide that the host is down and to restart the VMs of that host to the
>>>>>>>> other available hosts.
>>>>>>>>
>>>>>>>>
>>>>>>> No, this is a bad assumption. Perhaps they are the ones isolated
>>>>>>> form it?
>>>>>>> Y.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I could also add additional power management through UPS to avoid
>>>>>>>> this issue but this is not currently an option and I am interested to see
>>>>>>>> if this behavior can be tweaked.
>>>>>>>>
>>>>>>>> Thanx,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at ovirt.org
>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170919/9580435e/attachment.html>
More information about the Users
mailing list