----- Original Message -----
> From: "Giuseppe Ragusa" <giuseppe.ragusa(a)hotmail.com>
> To: fsimonce(a)redhat.com
> Cc: users(a)ovirt.org
> Sent: Wednesday, May 21, 2014 5:15:30 PM
> Subject: sanlock + gluster recovery -- RFE
>
> Hi,
>
>> ----- Original Message -----
>>> From: "Ted Miller" <tmiller at hcjb.org>
>>> To: "users" <users at ovirt.org>
>>> Sent: Tuesday, May 20, 2014 11:31:42 PM
>>> Subject: [ovirt-users] sanlock + gluster recovery -- RFE
>>>
>>> As you are aware, there is an ongoing split-brain problem with running
>>> sanlock on replicated gluster storage. Personally, I believe that this is
>>> the 5th time that I have been bitten by this sanlock+gluster problem.
>>>
>>> I believe that the following are true (if not, my entire request is
>>> probably
>>> off base).
>>>
>>>
>>> * ovirt uses sanlock in such a way that when the sanlock storage is
>>> on a
>>> replicated gluster file system, very small storage disruptions can
>>> result in a gluster split-brain on the sanlock space
>>
>> Although this is possible (at the moment) we are working hard to avoid it.
>> The hardest part here is to ensure that the gluster volume is properly
>> configured.
>>
>> The suggested configuration for a volume to be used with ovirt is:
>>
>> Volume Name: (...)
>> Type: Replicate
>> Volume ID: (...)
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> (...three bricks...)
>> Options Reconfigured:
>> network.ping-timeout: 10
>> cluster.quorum-type: auto
>>
>> The two options ping-timeout and quorum-type are really important.
>>
>> You would also need a build where this bug is fixed in order to avoid any
>> chance of a split-brain:
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1066996
>
> It seems that the aforementioned bug is peculiar to 3-bricks setups.
>
> I understand that a 3-bricks setup can allow proper quorum formation without
> resorting to "first-configured-brick-has-more-weight" convention used with
> only 2 bricks and quorum "auto" (which makes one node "special",
so not
> properly any-single-fault tolerant).
Correct.
> But, since we are on ovirt-users, is there a similar suggested configuration
> for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management
> properly configured and tested-working?
> I mean a configuration where "any" host can go south and oVirt (through
the
> other one) fences it (forcibly powering it off with confirmation from IPMI
> or similar) then restarts HA-marked vms that were running there, all the
> while keeping the underlying GlusterFS-based storage domains responsive and
> readable/writeable (maybe apart from a lapse between detected other-node
> unresposiveness and confirmed fencing)?
We already had a discussion with gluster asking if it was possible to
add fencing to the replica 2 quorum/consistency mechanism.
The idea is that as soon as you can't replicate a write you have to
freeze all IO until either the connection is re-established or you
know that the other host has been killed.
Adding Vijay.
There is a related thread on gluster-devel [1] to have a better behavior
in GlusterFS for prevention of split brains with sanlock and 2-way
replicated gluster volumes.
Please feel free to comment on the proposal there.
Thanks,
Vijay
[1]