[ovirt-users] sanlock + gluster recovery -- RFE

Fri May 23 10:55:06 UTC 2014

On 05/21/2014 10:22 PM, Federico Simoncelli wrote:
> ----- Original Message -----
>> From: "Giuseppe Ragusa" <giuseppe.ragusa at hotmail.com>
>> To: fsimonce at redhat.com
>> Cc: users at ovirt.org
>> Sent: Wednesday, May 21, 2014 5:15:30 PM
>> Subject: sanlock + gluster recovery -- RFE
>>
>> Hi,
>>
>>> ----- Original Message -----
>>>> From: "Ted Miller" <tmiller at hcjb.org>
>>>> To: "users" <users at ovirt.org>
>>>> Sent: Tuesday, May 20, 2014 11:31:42 PM
>>>> Subject: [ovirt-users] sanlock + gluster recovery -- RFE
>>>>
>>>> As you are aware, there is an ongoing split-brain problem with running
>>>> sanlock on replicated gluster storage. Personally, I believe that this is
>>>> the 5th time that I have been bitten by this sanlock+gluster problem.
>>>>
>>>> I believe that the following are true (if not, my entire request is
>>>> probably
>>>> off base).
>>>>
>>>>
>>>>      * ovirt uses sanlock in such a way that when the sanlock storage is
>>>>      on a
>>>>      replicated gluster file system, very small storage disruptions can
>>>>      result in a gluster split-brain on the sanlock space
>>>
>>> Although this is possible (at the moment) we are working hard to avoid it.
>>> The hardest part here is to ensure that the gluster volume is properly
>>> configured.
>>>
>>> The suggested configuration for a volume to be used with ovirt is:
>>>
>>> Volume Name: (...)
>>> Type: Replicate
>>> Volume ID: (...)
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> (...three bricks...)
>>> Options Reconfigured:
>>> network.ping-timeout: 10
>>> cluster.quorum-type: auto
>>>
>>> The two options ping-timeout and quorum-type are really important.
>>>
>>> You would also need a build where this bug is fixed in order to avoid any
>>> chance of a split-brain:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1066996
>>
>> It seems that the aforementioned bug is peculiar to 3-bricks setups.
>>
>> I understand that a 3-bricks setup can allow proper quorum formation without
>> resorting to "first-configured-brick-has-more-weight" convention used with
>> only 2 bricks and quorum "auto" (which makes one node "special", so not
>> properly any-single-fault tolerant).
>
> Correct.
>
>> But, since we are on ovirt-users, is there a similar suggested configuration
>> for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management
>> properly configured and tested-working?
>> I mean a configuration where "any" host can go south and oVirt (through the
>> other one) fences it (forcibly powering it off with confirmation from IPMI
>> or similar) then restarts HA-marked vms that were running there, all the
>> while keeping the underlying GlusterFS-based storage domains responsive and
>> readable/writeable (maybe apart from a lapse between detected other-node
>> unresposiveness and confirmed fencing)?
>
> We already had a discussion with gluster asking if it was possible to
> add fencing to the replica 2 quorum/consistency mechanism.
>
> The idea is that as soon as you can't replicate a write you have to
> freeze all IO until either the connection is re-established or you
> know that the other host has been killed.
>
> Adding Vijay.
>

There is a related thread on gluster-devel [1] to have a better behavior 
in GlusterFS for prevention of split brains with sanlock and 2-way 
replicated gluster volumes.

Please feel free to comment on the proposal there.

Thanks,
Vijay

[1] 
http://supercolony.gluster.org/pipermail/gluster-devel/2014-May/040751.html