<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi Fernando,<br>

    </p>

    <p>Indeed, having and arbiter node is always a good idea, and it

      saves costs a lot.</p>

    <p>Good luck with your setup.</p>

    <p>Cheers</p>

    <p>Erekle<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 07.08.2017 23:03, FERNANDO FREDIANI

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:47b0f3b5-a836-d5c2-7cf4-c0147aa3948f@upx.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Thanks for the detailed answer Erekle.</p>

      <p>I conclude that it is worth in any scenario to have a arbiter

        node in order to avoid wasting more disk space to RAID X +

        Gluster Replication on the top of it. The cost seems much lower

        if you consider running costs of the whole storage and compare

        it with the cost to build the arbiter node. Even having a fully

        redundant arbiter service with 2 nodes would make it wort on a

        larger deployment.</p>

      <p>Regards<br>

        Fernando</p>

      <div class="moz-cite-prefix">On 07/08/2017 17:07, Erekle Magradze

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:b61c0164-c933-8204-a949-0aa303983548@recogizer.de">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <p>Hi Fernando (sorry for misspelling your name, I used a

          different keyboard),</p>

        <p>So let's go with the following scenarios:</p>

        <p>1. Let's say you have two servers (replication factor is 2),

          i.e. two bricks per volume, in this case it is strongly

          recommended to have the arbiter node, the metadata storage

          that will guarantee avoiding the split brain situation, in

          this case for arbiter you don't even need a disk with lots of

          space, it's enough to have a tiny ssd but hosted on a separate

          server. Advantage of such setup is that you don't need the

          RAID 1 for each brick, you have the metadata information

          stored in arbiter node and brick replacement is easy.</p>

        <p>2. If you have odd number of bricks (let's say 3, i.e.

          replication factor is 3) in your volume and you didn't create

          the arbiter node as well as you didn't configure the quorum,

          in this case the entire load for keeping the consistency of

          the volume resides on all 3 servers, each of them is important

          and each brick contains key information, they need to

          cross-check each other (that's what people usually do with the

          first try of gluster :) ), in this case replacing a brick is a

          big pain and in this case RAID 1 is a good option to have

          (that's the disadvantage, i.e. loosing the space and not

          having the JBOD option) advantage is that you don't have the

          to have additional arbiter node.</p>

        <p>3. You have odd number of bricks and configured arbiter node,

          in this case you can easily go with JBOD, however a good

          practice would be to have a RAID 1 for arbiter disks (tiny

          128GB SSD-s ar perfectly sufficient for volumes with 10s of

          TB-s in size.)</p>

        <p>That's basically it</p>

        <p>The rest about the reliability and setup scenarios you can

          find in gluster documentation, especially look for quorum and

          arbiter node configs+options.</p>

        <p>Cheers</p>

        <p>Erekle</p>

        P.S. What I was mentioning, regarding a good practice is mostly

        related to the operations of gluster not installation or

        deployment, i.e. not the conceptual understanding of gluster

        (conceptually it's a JBOD system).<br>

        <br>

        <div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO

          FREDIANI wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com">

          <meta http-equiv="Content-Type" content="text/html;

            charset=utf-8">

          <p>Thanks for the clarification Erekle.</p>

          <p>However I get surprised with this way of operating from

            GlusterFS as it adds another layer of complexity to the

            system (either a hardware or software RAID) before the

            gluster config and increase the system's overall costs.<br>

          </p>

          <p>An important point to consider is: In RAID configuration

            you already have space 'wasted' in order to build redundancy

            (either RAID 1, 5, or 6). Then when you have GlusterFS on

            the top of several RAIDs you have again more data replicated

            so you end up with the same data consuming more space in a

            group of disks and again on the top of several RAIDs

            depending on the Gluster configuration you have (in a RAID 1

            config the same data is replicated 4 times).</p>

          <p>Yet another downside of having a RAID (specially RAID 5 or

            6) is that it reduces considerably the write speeds as each

            group of disks will end up having the write speed of a

            single disk as all other disks of that group have to wait

            for each other to write as well.<br>

          </p>

          <p>Therefore if Gluster already replicates data why does it

            create this big pain you mentioned if the data is replicated

            somewhere else, can still be retrieved to both serve clients

            and reconstruct the equivalent disk when it is replaced ?</p>

          <p>Fernando<br>

          </p>

          <br>

          <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle

            Magradze wrote:<br>

          </div>

          <blockquote type="cite"

            cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de">

            <meta http-equiv="Content-Type" content="text/html;

              charset=utf-8">

            <p>Hi Frenando,</p>

            <p>Here is my experience, if you consider a particular hard

              drive as a brick for gluster volume and it dies, i.e. it

              becomes not accessible it's a huge hassle to discard that

              brick and exchange with another one, since gluster some

              tries to access that broken brick and it's causing (at

              least it cause for me) a big pain, therefore it's better

              to have a RAID as brick, i.e. have RAID 1 (mirroring) for

              each brick, in this case if the disk is down you can

              easily exchange it and rebuild the RAID without going

              offline, i.e switching off the volume doing brick

              manipulations and switching it back on.<br>

            </p>

            <p>Cheers</p>

            <p>Erekle<br>

            </p>

            <br>

            <div class="moz-cite-prefix">On 08/07/2017 03:04 PM,

              FERNANDO FREDIANI wrote:<br>

            </div>

            <blockquote type="cite"

              cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com">

              <meta http-equiv="Content-Type" content="text/html;

                charset=utf-8">

              <p>For any RAID 5 or 6 configuration I normally follow a

                simple gold rule which gave good results so far:<br>

                - up to 4 disks RAID 5<br>

                - 5 or more disks RAID 6</p>

              <p>However I didn't really understand well the

                recommendation to use any RAID with GlusterFS. I always

                thought that GlusteFS likes to work in JBOD mode and

                control the disks (bricks) directlly so you can create

                whatever distribution rule you wish, and if a single

                disk fails you just replace it and which obviously have

                the data replicated from another. The only downside of

                using in this way is that the replication data will be

                flow accross all servers but that is not much a big

                issue.</p>

              <p>Anyone can elaborate about Using RAID + GlusterFS and

                JBOD + GlusterFS.</p>

              <p>Thanks<br>

                Regards<br>

                Fernando<br>

              </p>

              <br>

              <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin

                Acosta wrote:<br>

              </div>

              <blockquote type="cite"

cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com">

                <style>body{font-family:Helvetica,Arial;font-size:13px}</style>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono"><br>

                  </font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono">Moacir,</font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono"><br>

                  </font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono">I have recently installed multiple Red Hat

                    Virtualization hosts for several different

                    companies, and have dealt with the Red Hat Support

                    Team in depth about optimal configuration in regards

                    to setting up GlusterFS most efficiently and I

                    wanted to share with you what I learned.</font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono"><br>

                  </font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono">In general Red Hat Virtualization team frowns

                    upon using each DISK of the system as just a JBOD,

                    sure there is some protection by having the data

                    replicated, however, the recommendation is to use

                    RAID 6 (preferred) or RAID-5, or at least RAID-1 at

                    the very least.</font></div>

                <div id="bloop_customfont"

                  style="color:rgb(0,0,0);margin:0px"><font face="Input

                    Mono"><br>

                  </font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono">Here is the direct quote from Red

                    Hat when I asked about RAID and Bricks:</font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i><br>

                    </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i>"A typical Gluster

                      configuration would use RAID underneath the

                      bricks. RAID 6 is most typical as it gives you 2

                      disk failure protection, but RAID 5 could be used

                      too. Once you have the RAIDed bricks, you'd then

                      apply the desired replication on top of that. The

                      most popular way of doing this would be

                      distributed replicated with 2x replication. In

                      general you'll get better performance with larger

                      bricks. 12 drives is often a sweet spot. Another

                      option would be to create a separate tier using

                      all SSD’s.” </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><br>

                </div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i>In order to SSD tiering from my

                      understanding you would need 1 x NVMe drive in

                      each server, or 4 x SSD hot tier (it needs to be

                      distributed, replicated for the hot tier if not

                      using NVME). So with you only having 1 SSD drive

                      in each server, I’d suggest maybe looking into the

                      NVME option. </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i><br>

                    </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i>Since your using only

                      3-servers, what I’d probably suggest is to do (2

                      Replicas + Arbiter Node), this setup actually

                      doesn’t require the 3rd server to have big drives

                      at all as it only stores meta-data about the files

                      and not actually a full copy. </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i><br>

                    </i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i>Please see the attached

                      document that was given to me by Red Hat to get

                      more information on this. Hope this information

                      helps you.</i></font></div>

                <div id="bloop_customfont" style="margin:0px"><font

                    face="Input Mono"><i><br>

                    </i></font></div>

                <br>

                <div id="bloop_sign_1502087376725469184"

                  class="bloop_sign"><span style="font-family:'helvetica

                    Neue',helvetica;font-size:14px">--</span><br

                    style="font-family:'helvetica

                    Neue',helvetica;font-size:14px">

                  <div class="gmail_signature"

                    style="font-family:'helvetica

                    Neue',helvetica;font-size:14px">

                    <div dir="ltr">

                      <div><br>

                      </div>

                      <div>Devin Acosta, RHCA, RHVCA</div>

                      <div>Red Hat Certified Architect</div>

                    </div>

                  </div>

                </div>

                <br>

                <p class="airmail_on">On August 6, 2017 at 7:29:29 PM,

                  Moacir Ferreira (<a

                    href="mailto:moacirferreira@hotmail.com"

                    moz-do-not-send="true">moacirferreira@hotmail.com</a>)

                  wrote:</p>

                <blockquote type="cite" class="clean_bq"><span>

                    <div dir="ltr">

                      <div>

                        <title></title>

                        <div id="divtagdefaultwrapper"

style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif"

                          dir="ltr">

                          <p><span>I am willing to assemble a oVirt

                              "pod", made of 3 servers, each with 2 CPU

                              sockets of 12 cores, 256GB RAM, 7 HDD 10K,

                              1 SSD. The idea is to use GlusterFS to

                              provide HA for the VMs. The 3 servers have

                              a dual 40Gb NIC and a dual 10Gb NIC. So my

                              intention is to create a loop like a

                              server triangle using the 40Gb NICs for

                              virtualization files (VMs .qcow2) access

                              and to move VMs around the pod (east /west

                              traffic) while using the 10Gb interfaces

                              for giving services to the outside world

                              (north/south traffic).</span></p>

                          <p><br>

                          </p>

                          <p>This said, my first question is: How should

                            I deploy GlusterFS in such oVirt scenario?

                            My questions are:</p>

                          <p><br>

                          </p>

                          <p>1 - Should I create 3 RAID (i.e.: RAID 5),

                            one on each oVirt node, and then create a

                            GlusterFS using them?</p>

                          <p>2 - Instead, should I create a JBOD array

                            made of all server's disks?</p>

                          <p>3 - What is the best Gluster configuration

                            to provide for HA while not consuming too

                            much disk space?<br>

                          </p>

                          <p>4 - Does a oVirt hypervisor pod like I am

                            planning to build, and the virtualization

                            environment, benefits from tiering when

                            using a SSD disk? And yes, will Gluster do

                            it by default or I have to configure it to

                            do so?</p>

                          <p><br>

                          </p>

                          <p>At the bottom line, what is the good

                            practice for using GlusterFS in small pods

                            for enterprises?<br>

                          </p>

                          <p><br>

                          </p>

                          <p>You opinion/feedback will be really

                            appreciated!</p>

                          <p>Moacir<br>

                          </p>

                        </div>

                        _______________________________________________

                        <br>

                        Users mailing list <br>

                        <a href="mailto:Users@ovirt.org"

                          moz-do-not-send="true">Users@ovirt.org</a> <br>

                        <a

                          href="http://lists.ovirt.org/mailman/listinfo/users"

                          moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>

                        <br>

                      </div>

                    </div>

                  </span></blockquote>

                <br>

                <fieldset class="mimeAttachmentHeader"></fieldset>

                <br>

                <pre wrap="">_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a>

<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>

</pre>

              </blockquote>

              <br>

              <br>

              <fieldset class="mimeAttachmentHeader"></fieldset>

              <br>

              <pre wrap="">_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a>

<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>

</pre>

            </blockquote>

            <br>

          </blockquote>

          <br>

        </blockquote>

        <br>

        <pre class="moz-signature" cols="72">-- 

Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze

Lead Big Data Engineering &amp; DevOps

Rheinwerkallee 2, 53227 Bonn

Tel: +49 228 29974555

E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de" moz-do-not-send="true">erekle.magradze@recogizer.de</a>

Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com" moz-do-not-send="true">www.recogizer.com</a>

Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/" moz-do-not-send="true">https://www.linkedin.com/company-beta/10039182/</a>

Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer" moz-do-not-send="true">https://twitter.com/recogizer</a>

-----------------------------------------------------------------

Recogizer Group GmbH

Geschäftsführer: Oliver Habisch, Carsten Kreutze

Handelsregister: Amtsgericht Bonn HRB 20724

Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,

informieren Sie bitte sofort den Absender und löschen Sie diese Mail.

Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>