
--_27321994-845b-49b2-9a1d-a49b376f5af2_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C
----- Original Message -----
From: "Ted Miller" <tmiller at hcjb.org> To: "users" <users at ovirt.org> Sent: Tuesday=2C May 20=2C 2014 11:31:42 PM Subject: [ovirt-users] sanlock + gluster recovery -- RFE =20 As you are aware=2C there is an ongoing split-brain problem with runnin= g sanlock on replicated gluster storage. Personally=2C I believe that thi= s is the 5th time that I have been bitten by this sanlock+gluster problem. =20 I believe that the following are true (if not=2C my entire request is p= robably off base). =20 =20 * ovirt uses sanlock in such a way that when the sanlock storage is= on a replicated gluster file system=2C very small storage disruptions ca= n result in a gluster split-brain on the sanlock space =20 Although this is possible (at the moment) we are working hard to avoid it= . The hardest part here is to ensure that the gluster volume is properly configured. =20 The suggested configuration for a volume to be used with ovirt is: =20 Volume Name: (...) Type: Replicate Volume ID: (...) Status: Started Number of Bricks: 1 x 3 =3D 3 Transport-type: tcp Bricks: (...three bricks...) Options Reconfigured: network.ping-timeout: 10 cluster.quorum-type: auto =20 The two options ping-timeout and quorum-type are really important. =20 You would also need a build where this bug is fixed in order to avoid any chance of a split-brain: =20 https://bugzilla.redhat.com/show_bug.cgi?id=3D1066996
It seems that the aforementioned bug is peculiar to 3-bricks setups. I understand that a 3-bricks setup can allow proper quorum formation withou= t resorting to "first-configured-brick-has-more-weight" convention used wit= h only 2 bricks and quorum "auto" (which makes one node "special"=2C so not= properly any-single-fault tolerant). But=2C since we are on ovirt-users=2C is there a similar suggested configur= ation for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management = properly configured and tested-working? I mean a configuration where "any" host can go south and oVirt (through the= other one) fences it (forcibly powering it off with confirmation from IPMI= or similar) then restarts HA-marked vms that were running there=2C all the= while keeping the underlying GlusterFS-based storage domains responsive an= d readable/writeable (maybe apart from a lapse between detected other-node = unresposiveness and confirmed fencing)? Furthermore: is such a suggested configuration possible in a self-hosted-en= gine scenario? Regards=2C Giuseppe
How did I get into this mess? =20 ... =20 What I would like to see in ovirt to help me (and others like me). Alte= rnates listed in order from most desirable (automatic) to least desirable (set= of commands to type=2C with lots of variables to figure out). =20 The real solution is to avoid the split-brain altogether. At the moment i= t seems that using the suggested configurations and the bug fix we shouldn'= t hit a split-brain. =20 1. automagic recovery =20 2. recovery subcommand =20 3. script =20 4. commands =20 I think that the commands to resolve a split-brain should be documented. I just started a page here: =20 http://www.ovirt.org/Gluster_Storage_Domain_Reference =20 Could you add your documentation there? Thanks! =20 --=20 Federico
>=3B >=3B As you are aware=2C there is an ongoing split-brain problem = with running<br>>=3B >=3B sanlock on replicated gluster storage. Person= ally=2C I believe that this is<br>>=3B >=3B the 5th time that I have be= en bitten by this sanlock+gluster problem.<br>>=3B >=3B <br>>=3B >= =3B I believe that the following are true (if not=2C my entire request is p= robably<br>>=3B >=3B off base).<br>>=3B >=3B <br>>=3B >=3B <br>= >=3B >=3B =3B =3B =3B =3B * ovirt uses sanlock in such = a way that when the sanlock storage is on a<br>>=3B >=3B =3B = =3B =3B =3B replicated gluster file system=2C very small storage di= sruptions can<br>>=3B >=3B =3B =3B =3B =3B result in a = gluster split-brain on the sanlock space<br>>=3B <br>>=3B Although this= is possible (at the moment) we are working hard to avoid it.<br>>=3B The= hardest part here is to ensure that the gluster volume is properly<br>>= =3B configured.<br>>=3B <br>>=3B The suggested configuration for a volu= me to be used with ovirt is:<br>>=3B <br>>=3B Volume Name: (...)<br>>= =3B Type: Replicate<br>>=3B Volume ID: (...)<br>>=3B Status: Started<br= >=3B Number of Bricks: 1 x 3 =3D 3<br>>=3B Transport-type: tcp<br>>= =3B Bricks:<br>>=3B (...three bricks...)<br>>=3B Options Reconfigured:<= br>>=3B network.ping-timeout: 10<br>>=3B cluster.quorum-type: auto<br>&= gt=3B <br>>=3B The two options ping-timeout and quorum-type are really im=
<br>Regards=2C<br>Giuseppe<br><br>>=3B >=3B How did I get into this me= ss?<br>>=3B >=3B <br>>=3B >=3B ...<br>>=3B >=3B <br>>=3B >= =3B What I would like to see in ovirt to help me (and others like me). Alte= rnates<br>>=3B >=3B listed in order from most desirable (automatic) to = least desirable (set of<br>>=3B >=3B commands to type=2C with lots of v= ariables to figure out).<br>>=3B <br>>=3B The real solution is to avoid=
= --_27321994-845b-49b2-9a1d-a49b376f5af2_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <style><!-- .hmmessage P { margin:0px=3B padding:0px } body.hmmessage { font-size: 12pt=3B font-family:Calibri } --></style></head> <body class=3D'hmmessage'><div dir=3D'ltr'>Hi=2C<br><br>>=3B ----- Origin= al Message -----<br>>=3B >=3B From: "Ted Miller" <=3Btmiller at hcjb.= org>=3B<br>>=3B >=3B To: "users" <=3Busers at ovirt.org>=3B<br>&g= t=3B >=3B Sent: Tuesday=2C May 20=2C 2014 11:31:42 PM<br>>=3B >=3B Su= bject: [ovirt-users] sanlock + gluster recovery -- RFE<br>>=3B >=3B <br= portant.<br>>=3B <br>>=3B You would also need a build where this bug is= fixed in order to avoid any<br>>=3B chance of a split-brain:<br>>=3B <= br>>=3B https://bugzilla.redhat.com/show_bug.cgi?id=3D1066996<br><br>It s= eems that the aforementioned bug is peculiar to 3-bricks setups.<br><br>I u= nderstand that a 3-bricks setup can allow proper quorum formation without r= esorting to "first-configured-brick-has-more-weight" convention used with o= nly 2 bricks and quorum "auto" (which makes one node "special"=2C so not pr= operly any-single-fault tolerant).<br><br>But=2C since we are on ovirt-user= s=2C is there a similar suggested configuration for a 2-hosts setup oVirt+G= lusterFS with oVirt-side power management properly configured and tested-wo= rking?<br>I mean a configuration where "any" host can go south and oVirt (t= hrough the other one) fences it (forcibly powering it off with confirmation= from IPMI or similar) then restarts HA-marked vms that were running there= =2C all the while keeping the underlying GlusterFS-based storage domains re= sponsive and readable/writeable (maybe apart from a lapse between detected = other-node unresposiveness and confirmed fencing)?<br><br>Furthermore: is s= uch a suggested configuration possible in a self-hosted-engine scenario?<br= the split-brain altogether. At the moment it<br>>=3B seems that using th= e suggested configurations and the bug fix we shouldn't<br>>=3B hit a spl= it-brain.<br>>=3B <br>>=3B >=3B 1. automagic recovery<br>>=3B >= =3B <br>>=3B >=3B 2. recovery subcommand<br>>=3B >=3B <br>>=3B &g= t=3B 3. script<br>>=3B >=3B <br>>=3B >=3B 4. commands<br>>=3B <br=
>=3B I think that the commands to resolve a split-brain should be docume= nted.<br>>=3B I just started a page here:<br>>=3B <br>>=3B http://www= .ovirt.org/Gluster_Storage_Domain_Reference<br>>=3B <br>>=3B Could you = add your documentation there? Thanks!<br>>=3B <br>>=3B -- <br>>=3B Fe= derico<br><br> </div></body> </html>=
--_27321994-845b-49b2-9a1d-a49b376f5af2_--