Re: [ovirt-users] sanlock + gluster recovery -- RFE

23 May 2014

      On 05/21/2014 10:22 PM, Federico Simoncelli wrote:
...
----- Original Message -----
...
From: "Giuseppe Ragusa" <giuseppe.ragusa@hotmail.com>
To: fsimonce@redhat.com
Cc: users@ovirt.org
Sent: Wednesday, May 21, 2014 5:15:30 PM
Subject: sanlock + gluster recovery -- RFE
Hi,
...
----- Original Message -----
...
From: "Ted Miller" <tmiller at hcjb.org>
To: "users" <users at ovirt.org>
Sent: Tuesday, May 20, 2014 11:31:42 PM
Subject: [ovirt-users] sanlock + gluster recovery -- RFE
As you are aware, there is an ongoing split-brain problem with running
sanlock on replicated gluster storage. Personally, I believe that this is
the 5th time that I have been bitten by this sanlock+gluster problem.
I believe that the following are true (if not, my entire request is
probably
off base).
* ovirt uses sanlock in such a way that when the sanlock storage is
     on a
     replicated gluster file system, very small storage disruptions can
     result in a gluster split-brain on the sanlock space
Although this is possible (at the moment) we are working hard to avoid it.
The hardest part here is to ensure that the gluster volume is properly
configured.
The suggested configuration for a volume to be used with ovirt is:
Volume Name: (...)
Type: Replicate
Volume ID: (...)
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
(...three bricks...)
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: auto
The two options ping-timeout and quorum-type are really important.
You would also need a build where this bug is fixed in order to avoid any
chance of a split-brain:
https://bugzilla.redhat.com/show_bug.cgi?id=1066996
It seems that the aforementioned bug is peculiar to 3-bricks setups.
I understand that a 3-bricks setup can allow proper quorum formation without
resorting to "first-configured-brick-has-more-weight" convention used with
only 2 bricks and quorum "auto" (which makes one node "special", so not
properly any-single-fault tolerant).
Correct.
...
But, since we are on ovirt-users, is there a similar suggested configuration
for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management
properly configured and tested-working?
I mean a configuration where "any" host can go south and oVirt (through the
other one) fences it (forcibly powering it off with confirmation from IPMI
or similar) then restarts HA-marked vms that were running there, all the
while keeping the underlying GlusterFS-based storage domains responsive and
readable/writeable (maybe apart from a lapse between detected other-node
unresposiveness and confirmed fencing)?
We already had a discussion with gluster asking if it was possible to
add fencing to the replica 2 quorum/consistency mechanism.
The idea is that as soon as you can't replicate a write you have to
freeze all IO until either the connection is re-established or you
know that the other host has been killed.
Adding Vijay.
There is a related thread on gluster-devel [1] to have a better behavior 
in GlusterFS for prevention of split brains with sanlock and 2-way 
replicated gluster volumes.

Please feel free to comment on the proposal there.

Thanks,
Vijay

[1] 
http://supercolony.gluster.org/pipermail/gluster-devel/2014-May/040751.html

Re: [ovirt-users] sanlock + gluster recovery -- RFE

Vijay Bellur