<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi,<br>

    <br>

    I have made a lab with a config listed below and have got unexpected

    result. Someone, tell me, please, where did I go wrong?<br>

    <br>

    I am testing oVirt. Data Center has two clusters: the first as a

    computing with three nodes (node1, node2, node3); the second as a

    storage (node5, node6) based on glusterfs (replica 2).<br>

    <br>

    I want the storage to be HA. I have read <a

href="https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html">here</a>

    next:<br>

    <tt>For a replicated volume with two nodes and one brick on each

      machine, if the server-side quorum is enabled and one of the nodes

      goes offline, the other node will also be taken offline because of

      the quorum configuration. As a result, the high availability

      provided by the replication is ineffective. To prevent this

      situation, a dummy node can be added to the trusted storage pool

      which does not contain any bricks. This ensures that even if one

      of the nodes which contains data goes offline, the other node will

      remain online. Note that if the dummy node and one of the data

      nodes goes offline, the brick on other node will be also be taken

      offline, and will result in data unavailability. </tt><br>

    <br>

    So, I have added my "Engine" (not self-hosted) as a dummy node

    without a brick and have configured quorum as listed below:<br>

    <tt>cluster.quorum-type: fixed</tt><tt><br>

    </tt><tt>cluster.quorum-count: 1</tt><tt><br>

    </tt><tt>cluster.server-quorum-type: server</tt><tt><br>

    </tt><tt>cluster.server-quorum-ratio: 51%</tt><br>

    <br>

    <br>

    Then, I've run a VM and have dropped the network link from node6,

    after one a hour have switched back the link and after a while have

    got a split-brain. But why? No one could write to the brick on

    node6: the VM was running on node3 and node1 was SPM.<br>

    <br>

    Gluster's log from node6:<br>

    <tt>Июн 07 15:35:06 node6.virt.local

      etc-glusterfs-glusterd.vol[28491]: [2015-06-07 12:35:06.106270] C

      [MSGID: 106002]

      [glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action]

      0-management: Server quorum lost for volume vol3. Stopping local

      bricks.</tt><tt><br>

    </tt><tt>Июн 07 16:30:06 node6.virt.local

      etc-glusterfs-glusterd.vol[28491]: [2015-06-07 13:30:06.261505] C

      [MSGID: 106003]

      [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]

      0-management: Server quorum regained for volume vol3. Starting

      local bricks.</tt><br>

    <tt><br>

      <br>

    </tt><tt>gluster&gt; volume heal vol3 info </tt><tt><br>

    </tt><tt>Brick node5.virt.local:/storage/brick12/</tt><tt><br>

    </tt><tt>/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in

      split-brain</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Number of entries: 1</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Brick node6.virt.local:/storage/brick13/</tt><tt><br>

    </tt><tt>/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in

      split-brain</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Number of entries: 1</tt><br>

    <br>

    <br>

    <tt>gluster&gt; volume info vol3</tt><tt><br>

    </tt><tt> </tt><tt><br>

    </tt><tt>Volume Name: vol3</tt><tt><br>

    </tt><tt>Type: Replicate</tt><tt><br>

    </tt><tt>Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80</tt><tt><br>

    </tt><tt>Status: Started</tt><tt><br>

    </tt><tt>Number of Bricks: 1 x 2 = 2</tt><tt><br>

    </tt><tt>Transport-type: tcp</tt><tt><br>

    </tt><tt>Bricks:</tt><tt><br>

    </tt><tt>Brick1: node5.virt.local:/storage/brick12</tt><tt><br>

    </tt><tt>Brick2: node6.virt.local:/storage/brick13</tt><tt><br>

    </tt><tt>Options Reconfigured:</tt><tt><br>

    </tt><tt>storage.owner-gid: 36</tt><tt><br>

    </tt><tt>storage.owner-uid: 36</tt><tt><br>

    </tt><tt>cluster.server-quorum-type: server</tt><tt><br>

    </tt><tt>cluster.quorum-type: fixed</tt><tt><br>

    </tt><tt>network.remote-dio: enable</tt><tt><br>

    </tt><tt>cluster.eager-lock: enable</tt><tt><br>

    </tt><tt>performance.stat-prefetch: off</tt><tt><br>

    </tt><tt>performance.io-cache: off</tt><tt><br>

    </tt><tt>performance.read-ahead: off</tt><tt><br>

    </tt><tt>performance.quick-read: off</tt><tt><br>

    </tt><tt>auth.allow: *</tt><tt><br>

    </tt><tt>user.cifs: disable</tt><tt><br>

    </tt><tt>nfs.disable: on</tt><tt><br>

    </tt><tt>performance.readdir-ahead: on</tt><tt><br>

    </tt><tt>cluster.quorum-count: 1</tt><tt><br>

    </tt><tt>cluster.server-quorum-ratio: 51%</tt><br>

    <br>

    <br>

    <br>

    <div class="moz-cite-prefix">06.06.2015 12:09, Юрий Полторацкий

      пишет:<br>

    </div>

    <blockquote

cite="mid:CANgBB_sWBZwDh3yaAJV=ETLqrJyYxT9_EfRjnjMHF-1NVbevqw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>

                  <div>

                    <div>

                      <div>Hi,<br>

                        <br>

                      </div>

                      I want to build a HA storage based on two servers.

                      I want that if one goes down, my storage will be

                      available in RW mode.<br>

                      <br>

                    </div>

                    If I will use replica 2, then split-brain can occur.

                    To avoid this I would use a quorum. As I understand

                    correctly, I can use quorum on a client side, on a

                    server side, or on both. I want to add a dummy node

                    without a brick and make such config:<br>

                  </div>

                  <br>

                  cluster.quorum-type: fixed<br>

                  cluster.quorum-count: 1<br>

                  cluster.server-quorum-type: server<br>

                  cluster.server-quorum-ratio: 51%<br>

                  <br>

                </div>

                I expect that client will have access in RW mode until

                one brick alive. On the other side if server's quorum

                will not meet, then bricks will be RO. <br>

                <br>

                Say, HOST1 with a brick BRICK1, HOST2 with a brick

                BRICK2, and HOST3 without a brick.<br>

                <br>

                Once HOST1 lose a network connection, than on this node

                server quorum will not meet and the brick BRICK1 will

                not be able for writing. But on HOST2 there is no

                problem with server quorum (HOST2 + HOST3 &gt; 51%) and

                that's why BRICK2 still accessible for writing. With

                client's quorum there is no problem also - one brick is

                alive, so client can write on it.<br>

                <br>

              </div>

              I have made a lab using KVM on my desktop and it seems to

              be worked well and as expected.<br>

              <br>

            </div>

            The main question is:<br>

          </div>

          Can I use such a storage for production?<br>

          <br>

        </div>

        Thanks. <br>

        <div>

          <div>

            <div>

              <div>

                <div><br>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>