<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 8, 2017 at 12:03 AM, FERNANDO FREDIANI <span dir="ltr">&lt;<a href="mailto:fernando.frediani@upx.com" target="_blank">fernando.frediani@upx.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <p>Thanks for the detailed answer Erekle.</p>

    <p>I conclude that it is worth in any scenario to have a arbiter

      node in order to avoid wasting more disk space to RAID X + Gluster

      Replication on the top of it. The cost seems much lower if you

      consider running costs of the whole storage and compare it with

      the cost to build the arbiter node. Even having a fully redundant

      arbiter service with 2 nodes would make it wort on a larger

      deployment.</p></div></blockquote><div><br></div><div><br></div><div>Note that although you get the same consistency as a replica 3 setup, a 2+arbiter gives you data availability as a replica 2 setup. May or may not be OK with your high availability requirements.</div><div>Y.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <p>Regards<span class="HOEnZb"><font color="#888888"><br>

      Fernando</font></span></p><div><div class="h5">

    <div class="m_523492013473682272moz-cite-prefix">On 07/08/2017 17:07, Erekle Magradze

      wrote:<br>

    </div>

    <blockquote type="cite">

      <p>Hi Fernando (sorry for misspelling your name, I used a

        different keyboard),</p>

      <p>So let&#39;s go with the following scenarios:</p>

      <p>1. Let&#39;s say you have two servers (replication factor is 2),

        i.e. two bricks per volume, in this case it is strongly

        recommended to have the arbiter node, the metadata storage that

        will guarantee avoiding the split brain situation, in this case

        for arbiter you don&#39;t even need a disk with lots of space, it&#39;s

        enough to have a tiny ssd but hosted on a separate server.

        Advantage of such setup is that you don&#39;t need the RAID 1 for

        each brick, you have the metadata information stored in arbiter

        node and brick replacement is easy.</p>

      <p>2. If you have odd number of bricks (let&#39;s say 3, i.e.

        replication factor is 3) in your volume and you didn&#39;t create

        the arbiter node as well as you didn&#39;t configure the quorum, in

        this case the entire load for keeping the consistency of the

        volume resides on all 3 servers, each of them is important and

        each brick contains key information, they need to cross-check

        each other (that&#39;s what people usually do with the first try of

        gluster :) ), in this case replacing a brick is a big pain and

        in this case RAID 1 is a good option to have (that&#39;s the

        disadvantage, i.e. loosing the space and not having the JBOD

        option) advantage is that you don&#39;t have the to have additional

        arbiter node.</p>

      <p>3. You have odd number of bricks and configured arbiter node,

        in this case you can easily go with JBOD, however a good

        practice would be to have a RAID 1 for arbiter disks (tiny 128GB

        SSD-s ar perfectly sufficient for volumes with 10s of TB-s in

        size.)</p>

      <p>That&#39;s basically it</p>

      <p>The rest about the reliability and setup scenarios you can find

        in gluster documentation, especially look for quorum and arbiter

        node configs+options.</p>

      <p>Cheers</p>

      <p>Erekle</p>

      P.S. What I was mentioning, regarding a good practice is mostly

      related to the operations of gluster not installation or

      deployment, i.e. not the conceptual understanding of gluster

      (conceptually it&#39;s a JBOD system).<br>

      <br>

      <div class="m_523492013473682272moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO

        FREDIANI wrote:<br>

      </div>

      <blockquote type="cite">

        <p>Thanks for the clarification Erekle.</p>

        <p>However I get surprised with this way of operating from

          GlusterFS as it adds another layer of complexity to the system

          (either a hardware or software RAID) before the gluster config

          and increase the system&#39;s overall costs.<br>

        </p>

        <p>An important point to consider is: In RAID configuration you

          already have space &#39;wasted&#39; in order to build redundancy

          (either RAID 1, 5, or 6). Then when you have GlusterFS on the

          top of several RAIDs you have again more data replicated so

          you end up with the same data consuming more space in a group

          of disks and again on the top of several RAIDs depending on

          the Gluster configuration you have (in a RAID 1 config the

          same data is replicated 4 times).</p>

        <p>Yet another downside of having a RAID (specially RAID 5 or 6)

          is that it reduces considerably the write speeds as each group

          of disks will end up having the write speed of a single disk

          as all other disks of that group have to wait for each other

          to write as well.<br>

        </p>

        <p>Therefore if Gluster already replicates data why does it

          create this big pain you mentioned if the data is replicated

          somewhere else, can still be retrieved to both serve clients

          and reconstruct the equivalent disk when it is replaced ?</p>

        <p>Fernando<br>

        </p>

        <br>

        <div class="m_523492013473682272moz-cite-prefix">On 07/08/2017 10:26, Erekle

          Magradze wrote:<br>

        </div>

        <blockquote type="cite">

          <p>Hi Frenando,</p>

          <p>Here is my experience, if you consider a particular hard

            drive as a brick for gluster volume and it dies, i.e. it

            becomes not accessible it&#39;s a huge hassle to discard that

            brick and exchange with another one, since gluster some

            tries to access that broken brick and it&#39;s causing (at least

            it cause for me) a big pain, therefore it&#39;s better to have a

            RAID as brick, i.e. have RAID 1 (mirroring) for each brick,

            in this case if the disk is down you can easily exchange it

            and rebuild the RAID without going offline, i.e switching

            off the volume doing brick manipulations and switching it

            back on.<br>

          </p>

          <p>Cheers</p>

          <p>Erekle<br>

          </p>

          <br>

          <div class="m_523492013473682272moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO

            FREDIANI wrote:<br>

          </div>

          <blockquote type="cite">

            <p>For any RAID 5 or 6 configuration I normally follow a

              simple gold rule which gave good results so far:<br>

              - up to 4 disks RAID 5<br>

              - 5 or more disks RAID 6</p>

            <p>However I didn&#39;t really understand well the

              recommendation to use any RAID with GlusterFS. I always

              thought that GlusteFS likes to work in JBOD mode and

              control the disks (bricks) directlly so you can create

              whatever distribution rule you wish, and if a single disk

              fails you just replace it and which obviously have the

              data replicated from another. The only downside of using

              in this way is that the replication data will be flow

              accross all servers but that is not much a big issue.</p>

            <p>Anyone can elaborate about Using RAID + GlusterFS and

              JBOD + GlusterFS.</p>

            <p>Thanks<br>

              Regards<br>

              Fernando<br>

            </p>

            <br>

            <div class="m_523492013473682272moz-cite-prefix">On 07/08/2017 03:46, Devin

              Acosta wrote:<br>

            </div>

            <blockquote type="cite">

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono"><br>

                </font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono">Moacir,</font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono"><br>

                </font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono">I have recently installed multiple Red Hat

                  Virtualization hosts for several different companies,

                  and have dealt with the Red Hat Support Team in depth

                  about optimal configuration in regards to setting up

                  GlusterFS most efficiently and I wanted to share with

                  you what I learned.</font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono"><br>

                </font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono">In general Red Hat Virtualization team frowns

                  upon using each DISK of the system as just a JBOD,

                  sure there is some protection by having the data

                  replicated, however, the recommendation is to use RAID

                  6 (preferred) or RAID-5, or at least RAID-1 at the

                  very least.</font></div>

              <div id="m_523492013473682272bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input

                  Mono"><br>

                </font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red

                  Hat when I asked about RAID and Bricks:</font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br>

                  </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i>&quot;A typical Gluster configuration

                    would use RAID underneath the bricks. RAID 6 is most

                    typical as it gives you 2 disk failure protection,

                    but RAID 5 could be used too. Once you have the

                    RAIDed bricks, you&#39;d then apply the desired

                    replication on top of that. The most popular way of

                    doing this would be distributed replicated with 2x

                    replication. In general you&#39;ll get better

                    performance with larger bricks. 12 drives is often a

                    sweet spot. Another option would be to create a

                    separate tier using all SSD’s.” </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><br>

              </div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my

                    understanding you would need 1 x NVMe drive in each

                    server, or 4 x SSD hot tier (it needs to be

                    distributed, replicated for the hot tier if not

                    using NVME). So with you only having 1 SSD drive in

                    each server, I’d suggest maybe looking into the NVME

                    option. </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br>

                  </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers,

                    what I’d probably suggest is to do (2 Replicas +

                    Arbiter Node), this setup actually doesn’t require

                    the 3rd server to have big drives at all as it only

                    stores meta-data about the files and not actually a

                    full copy. </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br>

                  </i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document

                    that was given to me by Red Hat to get more

                    information on this. Hope this information helps

                    you.</i></font></div>

              <div id="m_523492013473682272bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br>

                  </i></font></div>

              <br>

              <div id="m_523492013473682272bloop_sign_1502087376725469184" class="m_523492013473682272bloop_sign"><span>--</span><br>

                <div class="m_523492013473682272gmail_signature">

                  <div dir="ltr">

                    <div><br>

                    </div>

                    <div>Devin Acosta, RHCA, RHVCA</div>

                    <div>Red Hat Certified Architect</div>

                  </div>

                </div>

              </div>

              <br>

              <p class="m_523492013473682272airmail_on">On August 6, 2017 at 7:29:29 PM,

                Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" target="_blank">moacirferreira@hotmail.com</a>)

                wrote:</p>

              <blockquote type="cite" class="m_523492013473682272clean_bq"><span>

                  <div dir="ltr">

                    <div>

                      <div id="m_523492013473682272divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr">

                        <p><span>I am willing to assemble a oVirt &quot;pod&quot;,

                            made of 3 servers, each with 2 CPU sockets

                            of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD.

                            The idea is to use GlusterFS to provide HA

                            for the VMs. The 3 servers have a dual 40Gb

                            NIC and a dual 10Gb NIC. So my intention is

                            to create a loop like a server triangle

                            using the 40Gb NICs for virtualization files

                            (VMs .qcow2) access and to move VMs around

                            the pod (east /west traffic) while using the

                            10Gb interfaces for giving services to the

                            outside world (north/south traffic).</span></p>

                        <p><br>

                        </p>

                        <p>This said, my first question is: How should I

                          deploy GlusterFS in such oVirt scenario? My

                          questions are:</p>

                        <p><br>

                        </p>

                        <p>1 - Should I create 3 RAID (i.e.: RAID 5),

                          one on each oVirt node, and then create a

                          GlusterFS using them?</p>

                        <p>2 - Instead, should I create a JBOD array

                          made of all server&#39;s disks?</p>

                        <p>3 - What is the best Gluster configuration to

                          provide for HA while not consuming too much

                          disk space?<br>

                        </p>

                        <p>4 - Does a oVirt hypervisor pod like I am

                          planning to build, and the virtualization

                          environment, benefits from tiering when using

                          a SSD disk? And yes, will Gluster do it by

                          default or I have to configure it to do so?</p>

                        <p><br>

                        </p>

                        <p>At the bottom line, what is the good practice

                          for using GlusterFS in small pods for

                          enterprises?<br>

                        </p>

                        <p><br>

                        </p>

                        <p>You opinion/feedback will be really

                          appreciated!</p>

                        <p>Moacir<br>

                        </p>

                      </div>

                      ______________________________<wbr>_________________ <br>

                      Users mailing list <br>

                      <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> <br>

                      <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a>

                      <br>

                    </div>

                  </div>

                </span></blockquote>

              <br>

              <fieldset class="m_523492013473682272mimeAttachmentHeader"></fieldset>

              <br>

              <pre>______________________________<wbr>_________________

Users mailing list

<a class="m_523492013473682272moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>

<a class="m_523492013473682272moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a>

</pre>

            </blockquote>

            <br>

            <br>

            <fieldset class="m_523492013473682272mimeAttachmentHeader"></fieldset>

            <br>

            <pre>______________________________<wbr>_________________

Users mailing list

<a class="m_523492013473682272moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>

<a class="m_523492013473682272moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a>

</pre>

          </blockquote>

          <br>

        </blockquote>

        <br>

      </blockquote>

      <br>

      <pre class="m_523492013473682272moz-signature" cols="72">-- 

Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze

Lead Big Data Engineering &amp; DevOps

Rheinwerkallee 2, 53227 Bonn

Tel: <a href="tel:+49%20228%2029974555" value="+4922829974555" target="_blank">+49 228 29974555</a>

E-Mail <a class="m_523492013473682272moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de" target="_blank">erekle.magradze@recogizer.de</a>

Web: <a class="m_523492013473682272moz-txt-link-abbreviated" href="http://www.recogizer.com" target="_blank">www.recogizer.com</a>

Recogizer auf LinkedIn <a class="m_523492013473682272moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/" target="_blank">https://www.linkedin.com/<wbr>company-beta/10039182/</a>

Folgen Sie uns auf Twitter <a class="m_523492013473682272moz-txt-link-freetext" href="https://twitter.com/recogizer" target="_blank">https://twitter.com/recogizer</a>

------------------------------<wbr>------------------------------<wbr>-----

Recogizer Group GmbH

Geschäftsführer: Oliver Habisch, Carsten Kreutze

Handelsregister: Amtsgericht Bonn HRB 20724

Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,

informieren Sie bitte sofort den Absender und löschen Sie diese Mail.

Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre>

    </blockquote>

    <br>

  </div></div></div>

<br>______________________________<wbr>_________________<br>

Users mailing list<br>

<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>

<br></blockquote></div><br></div></div>