On Tue, Sep 19, 2017 at 3:27 PM, Alex K <rightkicktech(a)gmail.com> wrote:
On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <ahadas(a)redhat.com>
wrote:
>
> On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkicktech(a)gmail.com> wrote:
>
>> A second test did not yield the same result.
>> This time the VMs were restarted to another host and when the lost host
>> recovered no VMs were running on it.
>> Seems that there is a racing issue somewhere.
>>
>
> Did you test with the same VM?
>
Yes
> were the disks + lease located on the same storage domains in both tests?
>
Yes. On all cases the leases are on same storage domain, the same where
the VM disks reside.
> did the VM run on the same host (and if not, is the libvirt + qemu
> versions different between the two?).
>
Yes
> It may be a racing issue but not necessarily. There is an observation in
> the bug I mentioned before that it happens only (/more) with certain
> storage types...
>
The storage is based on gluster volume, replica 3 with 1 arbiter.
The gluster version is 3.8.12.
A third test yielded the same issue, VMs on recovered host remained in
paused status.
Ack, thanks.
So I suggest you to add yourself (as CC) to [1] so you will be informed
about the resolution for this. In light of your answers it does look like a
racing issue.
[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1459865
>
>>
>> Thanx,
>> Alex
>>
>>
>> On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <ahadas(a)redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkicktech(a)gmail.com>
>>> wrote:
>>>
>>>> Hi again,
>>>>
>>>> I performed a different test by isolating one host (say host A)
>>>> through removing all its network interfaces (thus power management
through
>>>> IPMI was also not avaialble).
>>>> The VMs (with VM lease enabled) were successfully restarted to another
>>>> host.
>>>> When connecting back the host A, the cluster performed a power
>>>> management and the host became a member of the cluster.
>>>> The VMs that were running on the host A were found "paused",
which is
>>>> normal.
>>>> After 15 minutes I see that the VMs at host A are still in
"paused"
>>>> state and I would expect that the cluster should decide at some point to
>>>> shutdown the paused VMs and continue with the VMs that are already
running
>>>> at other hosts.
>>>>
>>>> Is this behavior normal?
>>>>
>>>
>>> I believe it is not the expected behavior - the VM should not stay in
>>> paused state when its lease expires. But we know about this, see comment 9
>>> in [1].
>>>
>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1459865
>>>
>>>
>>>>
>>>> Thanx,
>>>> Alex
>>>>
>>>> On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkicktech(a)gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Just completed the tests and it works great.
>>>>> VM leases is just what I needed.
>>>>>
>>>>> Thanx,
>>>>> Alex
>>>>>
>>>>> On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul
<ykaul(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 19, 2017 at 1:00 AM, Alex K
<rightkicktech(a)gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Enabling VM leases could be an answer to this. Will test
tomorrow.
>>>>>>>
>>>>>>>
>>>>>> Indeed. Let us know how it worked for you.
>>>>>>
>>>>>>
>>>>>>> Thanx,
>>>>>>> Alex
>>>>>>>
>>>>>>> On Sep 18, 2017 7:50 PM, "Alex K"
<rightkicktech(a)gmail.com> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have the following issue with the HA behavior of oVirt 4.1
and
>>>>>>> need to check with you if there is any work around from your
experience.
>>>>>>>
>>>>>>> I have 3 servers (A, B, C) with hosted engine in self hosted
setup
>>>>>>> on top gluster with replica 3 + 1 arbiter. All good except
one point:
>>>>>>>
>>>>>>> The hosts have been configured with power management using
IPMI
>>>>>>> (server iLO).
>>>>>>> If I disconnect power from one host (say C) (or disconnect
all
>>>>>>> network cables of the host) the two other hosts go to a loop
where they try
>>>>>>> to verify the status of the host C by issuing power
management commands to
>>>>>>> the host C. Since power of host is off the server iLO does
not respond on
>>>>>>> the network and the power management of host C fails, leaving
the VMs that
>>>>>>> were running on the host C in an unknown state and they are
never restarted
>>>>>>> to the other hosts.
>>>>>>>
>>>>>>> Is there any fencing option to change this behavior so as if
both
>>>>>>> available hosts fail to do power management of the
unresponsive host to
>>>>>>> decide that the host is down and to restart the VMs of that
host to the
>>>>>>> other available hosts.
>>>>>>>
>>>>>>>
>>>>>> No, this is a bad assumption. Perhaps they are the ones isolated
>>>>>> form it?
>>>>>> Y.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I could also add additional power management through UPS to
avoid
>>>>>>> this issue but this is not currently an option and I am
interested to see
>>>>>>> if this behavior can be tweaked.
>>>>>>>
>>>>>>> Thanx,
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>