Re: [ovirt-users] sanlock + gluster recovery -- RFE

21 May 2014

      ...
Hi,
...
...
From: "Ted Miller" <tmiller at hcjb.org>
To: "users" <users at ovirt.org>
Sent: Tuesday, May 20, 2014 11:31:42 PM
Subject: [ovirt-users] sanlock + gluster recovery -- RFE
As you are aware, there is an ongoing split-brain problem with running
sanlock on replicated gluster storage. Personally, I believe that this is
the 5th time that I have been bitten by this sanlock+gluster problem.
I believe that the following are true (if not, my entire request is
----- Original Message -----
probably
...
off base).
* ovirt uses sanlock in such a way that when the sanlock storage is 
on a
    replicated gluster file system, very small storage disruptions can
    result in a gluster split-brain on the sanlock space
Although this is possible (at the moment) we are working hard to avoid it.
The hardest part here is to ensure that the gluster volume is properly
configured.
The suggested configuration for a volume to be used with ovirt is:
Volume Name: (...)
Type: Replicate
Volume ID: (...)
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
(...three bricks...)
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: auto
The two options ping-timeout and quorum-type are really important.
You would also need a build where this bug is fixed in order to avoid any
chance of a split-brain:
https://bugzilla.redhat.com/show_bug.cgi?id=1066996
It seems that the aforementioned bug is peculiar to 3-bricks setups.
I understand that a 3-bricks setup can allow proper quorum formation 
without resorting to "first-configured-brick-has-more-weight" convention 
used with only 2 bricks and quorum "auto" (which makes one node "special", 
so not properly any-single-fault tolerant).
But, since we are on ovirt-users, is there a similar suggested 
configuration for a 2-hosts setup oVirt+GlusterFS with oVirt-side power 
management properly configured and tested-working?
I mean a configuration where "any" host can go south and oVirt (through the 
other one) fences it (forcibly powering it off with confirmation from IPMI 
or similar) then restarts HA-marked vms that were running there, all the 
while keeping the underlying GlusterFS-based storage domains responsive and 
readable/writeable (maybe apart from a lapse between detected other-node 
unresposiveness and confirmed fencing)?
Furthermore: is such a suggested configuration possible in a 
self-hosted-engine scenario?
Regards,
Giuseppe
...
...
How did I get into this mess?
...
What I would like to see in ovirt to help me (and others like me). 
Alternates
listed in order from most desirable (automatic) to least desirable (set of
commands to type, with lots of variables to figure out).
The real solution is to avoid the split-brain altogether. At the moment it
seems that using the suggested configurations and the bug fix we shouldn't
hit a split-brain.
...
1. automagic recovery
2. recovery subcommand
3. script
4. commands
I think that the commands to resolve a split-brain should be documented.
I just started a page here:
http://www.ovirt.org/Gluster_Storage_Domain_Reference
I suggest you add these lines to the Gluster configuration, as I have seen
--------------010808060507020608030500
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

On 5/21/2014 11:15 AM, Giuseppe Ragusa wrote:
this come up multiple times on the User list:

storage.owner-uid: 36
storage.owner-gid: 36

Ted Miller
Elkhart, IN, USA

--------------010808060507020608030500
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 5/21/2014 11:15 AM, Giuseppe Ragusa
      wrote:<br>
    </div>
    <blockquote cite="mid:DUB121-W1FD91CF8CB1AFFE6E62D5FA3C0@phx.gbl"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style>
      <div dir="ltr">Hi,<br>
        <br>
        > ----- Original Message -----<br>
        > > From: "Ted Miller" <tmiller at hcjb.org><br>
        > > To: "users" <users at ovirt.org><br>
        > > Sent: Tuesday, May 20, 2014 11:31:42 PM<br>
        > > Subject: [ovirt-users] sanlock + gluster recovery --
        RFE<br>
        > > <br>
        > > As you are aware, there is an ongoing split-brain
        problem with running<br>
        > > sanlock on replicated gluster storage. Personally, I
        believe that this is<br>
        > > the 5th time that I have been bitten by this
        sanlock+gluster problem.<br>
        > > <br>
        > > I believe that the following are true (if not, my
        entire request is probably<br>
        > > off base).<br>
        > > <br>
        > > <br>
        > >     * ovirt uses sanlock in such a way that when the
        sanlock storage is on a<br>
        > >     replicated gluster file system, very small storage
        disruptions can<br>
        > >     result in a gluster split-brain on the sanlock
        space<br>
        > <br>
        > Although this is possible (at the moment) we are working
        hard to avoid it.<br>
        > The hardest part here is to ensure that the gluster volume
        is properly<br>
        > configured.<br>
        > <br>
        > The suggested configuration for a volume to be used with
        ovirt is:<br>
        > <br>
        > Volume Name: (...)<br>
        > Type: Replicate<br>
        > Volume ID: (...)<br>
        > Status: Started<br>
        > Number of Bricks: 1 x 3 = 3<br>
        > Transport-type: tcp<br>
        > Bricks:<br>
        > (...three bricks...)<br>
        > Options Reconfigured:<br>
        > network.ping-timeout: 10<br>
        > cluster.quorum-type: auto<br>
        > <br>
        > The two options ping-timeout and quorum-type are really
        important.<br>
        > <br>
        > You would also need a build where this bug is fixed in
        order to avoid any<br>
        > chance of a split-brain:<br>
        > <br>
        > <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1066996">https://bugzilla.redhat.com/show_bug.cgi?id=1066996</a><br>
        <br>
        It seems that the aforementioned bug is peculiar to 3-bricks
        setups.<br>
        <br>
        I understand that a 3-bricks setup can allow proper quorum
        formation without resorting to
        "first-configured-brick-has-more-weight" convention used with
        only 2 bricks and quorum "auto" (which makes one node "special",
        so not properly any-single-fault tolerant).<br>
        <br>
        But, since we are on ovirt-users, is there a similar suggested
        configuration for a 2-hosts setup oVirt+GlusterFS with
        oVirt-side power management properly configured and
        tested-working?<br>
        I mean a configuration where "any" host can go south and oVirt
        (through the other one) fences it (forcibly powering it off with
        confirmation from IPMI or similar) then restarts HA-marked vms
        that were running there, all the while keeping the underlying
        GlusterFS-based storage domains responsive and
        readable/writeable (maybe apart from a lapse between detected
        other-node unresposiveness and confirmed fencing)?<br>
        <br>
        Furthermore: is such a suggested configuration possible in a
        self-hosted-engine scenario?<br>
        <br>
        Regards,<br>
        Giuseppe<br>
        <br>
        > > How did I get into this mess?<br>
        > > <br>
        > > ...<br>
        > > <br>
        > > What I would like to see in ovirt to help me (and
        others like me). Alternates<br>
        > > listed in order from most desirable (automatic) to
        least desirable (set of<br>
        > > commands to type, with lots of variables to figure
        out).<br>
        > <br>
        > The real solution is to avoid the split-brain altogether.
        At the moment it<br>
        > seems that using the suggested configurations and the bug
        fix we shouldn't<br>
        > hit a split-brain.<br>
        > <br>
        > > 1. automagic recovery<br>
        > > <br>
        > > 2. recovery subcommand<br>
        > > <br>
        > > 3. script<br>
        > > <br>
        > > 4. commands<br>
        > <br>
        > I think that the commands to resolve a split-brain should
        be documented.<br>
        > I just started a page here:<br>
        > <br>
        > <a class="moz-txt-link-freetext" href="http://www.ovirt.org/Gluster_Storage_Domain_Reference">http://www.ovirt.org/Gluster_Storage_Domain_Reference</a></div>
    </blockquote>
    I suggest you add these lines to the Gluster configuration, as I
    have seen this come up multiple times on the User list:<br>
    <br>
    storage.owner-uid: 36<br>
    storage.owner-gid: 36<br>
    <br>
    Ted Miller<br>
    Elkhart, IN, USA<br>
    <br>
  </body>
</html>

--------------010808060507020608030500--