Re: [ovirt-users] HA storage based on two nodes with one point of failure

Sunday, 7 June 2015

This is a multi-part message in MIME format.
--------------070106010204050704030701
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi,

I have made a lab with a config listed below and have got unexpected 
result. Someone, tell me, please, where did I go wrong?

I am testing oVirt. Data Center has two clusters: the first as a 
computing with three nodes (node1, node2, node3); the second as a 
storage (node5, node6) based on glusterfs (replica 2).

I want the storage to be HA. I have read here 
<https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Admi...

next:
For a replicated volume with two nodes and one brick on each machine, if 
the server-side quorum is enabled and one of the nodes goes offline, the 
other node will also be taken offline because of the quorum 
configuration. As a result, the high availability provided by the 
replication is ineffective. To prevent this situation, a dummy node can 
be added to the trusted storage pool which does not contain any bricks. 
This ensures that even if one of the nodes which contains data goes 
offline, the other node will remain online. Note that if the dummy node 
and one of the data nodes goes offline, the brick on other node will be 
also be taken offline, and will result in data unavailability.

So, I have added my "Engine" (not self-hosted) as a dummy node without a 
brick and have configured quorum as listed below:
cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

Then, I've run a VM and have dropped the network link from node6, after 
one a hour have switched back the link and after a while have got a 
split-brain. But why? No one could write to the brick on node6: the VM 
was running on node3 and node1 was SPM.

Gluster's log from node6:
Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: 
[2015-06-07 12:35:06.106270] C [MSGID: 106002] 
[glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action] 
0-management: Server quorum lost for volume vol3. Stopping local bricks.
Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: 
[2015-06-07 13:30:06.261505] C [MSGID: 106003] 
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 
0-management: Server quorum regained for volume vol3. Starting local bricks.

gluster> volume heal vol3 info
Brick node5.virt.local:/storage/brick12/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1

Brick node6.virt.local:/storage/brick13/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1

gluster> volume info vol3

Volume Name: vol3
Type: Replicate
Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node5.virt.local:/storage/brick12
Brick2: node6.virt.local:/storage/brick13
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: fixed
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: disable
nfs.disable: on
performance.readdir-ahead: on
cluster.quorum-count: 1
cluster.server-quorum-ratio: 51%

06.06.2015 12:09, Юрий Полторацкий пишет:
...
 Hi,

 I want to build a HA storage based on two servers. I want that if one 
 goes down, my storage will be available in RW mode.

 If I will use replica 2, then split-brain can occur. To avoid this I 
 would use a quorum. As I understand correctly, I can use quorum on a 
 client side, on a server side, or on both. I want to add a dummy node 
 without a brick and make such config:

 cluster.quorum-type: fixed
 cluster.quorum-count: 1
 cluster.server-quorum-type: server
 cluster.server-quorum-ratio: 51%

 I expect that client will have access in RW mode until one brick 
 alive. On the other side if server's quorum will not meet, then bricks 
 will be RO.

 Say, HOST1 with a brick BRICK1, HOST2 with a brick BRICK2, and HOST3 
 without a brick.

 Once HOST1 lose a network connection, than on this node server quorum 
 will not meet and the brick BRICK1 will not be able for writing. But 
 on HOST2 there is no problem with server quorum (HOST2 + HOST3 > 51%) 
 and that's why BRICK2 still accessible for writing. With client's 
 quorum there is no problem also - one brick is alive, so client can 
 write on it.

 I have made a lab using KVM on my desktop and it seems to be worked 
 well and as expected.

 The main question is:
 Can I use such a storage for production?

 Thanks.

--------------070106010204050704030701
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi,<br>
    <br>
    I have made a lab with a config listed below and have got unexpected
    result. Someone, tell me, please, where did I go wrong?<br>
    <br>
    I am testing oVirt. Data Center has two clusters: the first as a
    computing with three nodes (node1, node2, node3); the second as a
    storage (node5, node6) based on glusterfs (replica 2).<br>
    <br>
    I want the storage to be HA. I have read <a
href="https://access.redhat.com/documentation/en-US/Red_Hat_Storage/...
    next:<br>
    <tt>For a replicated volume with two nodes and one brick on each
      machine, if the server-side quorum is enabled and one of the nodes
      goes offline, the other node will also be taken offline because of
      the quorum configuration. As a result, the high availability
      provided by the replication is ineffective. To prevent this
      situation, a dummy node can be added to the trusted storage pool
      which does not contain any bricks. This ensures that even if one
      of the nodes which contains data goes offline, the other node will
      remain online. Note that if the dummy node and one of the data
      nodes goes offline, the brick on other node will be also be taken
      offline, and will result in data unavailability. </tt><br>
    <br>
    So, I have added my "Engine" (not self-hosted) as a dummy node
    without a brick and have configured quorum as listed below:<br>
    <tt>cluster.quorum-type: fixed</tt><tt><br>
    </tt><tt>cluster.quorum-count: 1</tt><tt><br>
    </tt><tt>cluster.server-quorum-type:
server</tt><tt><br>
    </tt><tt>cluster.server-quorum-ratio: 51%</tt><br>
    <br>
    <br>
    Then, I've run a VM and have dropped the network link from node6,
    after one a hour have switched back the link and after a while have
    got a split-brain. But why? No one could write to the brick on
    node6: the VM was running on node3 and node1 was SPM.<br>
    <br>
    Gluster's log from node6:<br>
    <tt>Июн 07 15:35:06 node6.virt.local
      etc-glusterfs-glusterd.vol[28491]: [2015-06-07 12:35:06.106270] C
      [MSGID: 106002]
      [glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action]
      0-management: Server quorum lost for volume vol3. Stopping local
      bricks.</tt><tt><br>
    </tt><tt>Июн 07 16:30:06 node6.virt.local
      etc-glusterfs-glusterd.vol[28491]: [2015-06-07 13:30:06.261505] C
      [MSGID: 106003]
      [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
      0-management: Server quorum regained for volume vol3. Starting
      local bricks.</tt><br>
    <tt><br>
      <br>
    </tt><tt>gluster&gt; volume heal vol3 info
</tt><tt><br>
    </tt><tt>Brick
node5.virt.local:/storage/brick12/</tt><tt><br>
    </tt><tt>/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in
      split-brain</tt><tt><br>
    </tt><tt><br>
    </tt><tt>Number of entries: 1</tt><tt><br>
    </tt><tt><br>
    </tt><tt>Brick
node6.virt.local:/storage/brick13/</tt><tt><br>
    </tt><tt>/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in
      split-brain</tt><tt><br>
    </tt><tt><br>
    </tt><tt>Number of entries: 1</tt><br>
    <br>
    <br>
    <tt>gluster&gt; volume info vol3</tt><tt><br>
    </tt><tt> </tt><tt><br>
    </tt><tt>Volume Name: vol3</tt><tt><br>
    </tt><tt>Type: Replicate</tt><tt><br>
    </tt><tt>Volume ID:
69ba8c68-6593-41ca-b1d9-40b3be50ac80</tt><tt><br>
    </tt><tt>Status: Started</tt><tt><br>
    </tt><tt>Number of Bricks: 1 x 2 = 2</tt><tt><br>
    </tt><tt>Transport-type: tcp</tt><tt><br>
    </tt><tt>Bricks:</tt><tt><br>
    </tt><tt>Brick1:
node5.virt.local:/storage/brick12</tt><tt><br>
    </tt><tt>Brick2:
node6.virt.local:/storage/brick13</tt><tt><br>
    </tt><tt>Options Reconfigured:</tt><tt><br>
    </tt><tt>storage.owner-gid: 36</tt><tt><br>
    </tt><tt>storage.owner-uid: 36</tt><tt><br>
    </tt><tt>cluster.server-quorum-type:
server</tt><tt><br>
    </tt><tt>cluster.quorum-type: fixed</tt><tt><br>
    </tt><tt>network.remote-dio: enable</tt><tt><br>
    </tt><tt>cluster.eager-lock: enable</tt><tt><br>
    </tt><tt>performance.stat-prefetch: off</tt><tt><br>
    </tt><tt>performance.io-cache: off</tt><tt><br>
    </tt><tt>performance.read-ahead: off</tt><tt><br>
    </tt><tt>performance.quick-read: off</tt><tt><br>
    </tt><tt>auth.allow: *</tt><tt><br>
    </tt><tt>user.cifs: disable</tt><tt><br>
    </tt><tt>nfs.disable: on</tt><tt><br>
    </tt><tt>performance.readdir-ahead: on</tt><tt><br>
    </tt><tt>cluster.quorum-count: 1</tt><tt><br>
    </tt><tt>cluster.server-quorum-ratio: 51%</tt><br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">06.06.2015 12:09, Юрий Полторацкий
      пишет:<br>
    </div>
    <blockquote
cite="mid:CANgBB_sWBZwDh3yaAJV=ETLqrJyYxT9_EfRjnjMHF-1NVbevqw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>Hi,<br>
                        <br>
                      </div>
                      I want to build a HA storage based on two servers.
                      I want that if one goes down, my storage will be
                      available in RW mode.<br>
                      <br>
                    </div>
                    If I will use replica 2, then split-brain can occur.
                    To avoid this I would use a quorum. As I understand
                    correctly, I can use quorum on a client side, on a
                    server side, or on both. I want to add a dummy node
                    without a brick and make such config:<br>
                  </div>
                  <br>
                  cluster.quorum-type: fixed<br>
                  cluster.quorum-count: 1<br>
                  cluster.server-quorum-type: server<br>
                  cluster.server-quorum-ratio: 51%<br>
                  <br>
                </div>
                I expect that client will have access in RW mode until
                one brick alive. On the other side if server's quorum
                will not meet, then bricks will be RO. <br>
                <br>
                Say, HOST1 with a brick BRICK1, HOST2 with a brick
                BRICK2, and HOST3 without a brick.<br>
                <br>
                Once HOST1 lose a network connection, than on this node
                server quorum will not meet and the brick BRICK1 will
                not be able for writing. But on HOST2 there is no
                problem with server quorum (HOST2 + HOST3 &gt; 51%) and
                that's why BRICK2 still accessible for writing. With
                client's quorum there is no problem also - one brick is
                alive, so client can write on it.<br>
                <br>
              </div>
              I have made a lab using KVM on my desktop and it seems to
              be worked well and as expected.<br>
              <br>
            </div>
            The main question is:<br>
          </div>
          Can I use such a storage for production?<br>
          <br>
        </div>
        Thanks. <br>
        <div>
          <div>
            <div>
              <div>
                <div><br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------070106010204050704030701--

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [ovirt-users] HA storage based on two nodes with one point of failure