<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Fernando,<br>
</p>
<p>Indeed, having and arbiter node is always a good idea, and it
saves costs a lot.</p>
<p>Good luck with your setup.</p>
<p>Cheers</p>
<p>Erekle<br>
</p>
<br>
<div class="moz-cite-prefix">On 07.08.2017 23:03, FERNANDO FREDIANI
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:47b0f3b5-a836-d5c2-7cf4-c0147aa3948f@upx.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<p>Thanks for the detailed answer Erekle.</p>
<p>I conclude that it is worth in any scenario to have a arbiter
node in order to avoid wasting more disk space to RAID X +
Gluster Replication on the top of it. The cost seems much lower
if you consider running costs of the whole storage and compare
it with the cost to build the arbiter node. Even having a fully
redundant arbiter service with 2 nodes would make it wort on a
larger deployment.</p>
<p>Regards<br>
Fernando</p>
<div class="moz-cite-prefix">On 07/08/2017 17:07, Erekle Magradze
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:b61c0164-c933-8204-a949-0aa303983548@recogizer.de">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<p>Hi Fernando (sorry for misspelling your name, I used a
different keyboard),</p>
<p>So let's go with the following scenarios:</p>
<p>1. Let's say you have two servers (replication factor is 2),
i.e. two bricks per volume, in this case it is strongly
recommended to have the arbiter node, the metadata storage
that will guarantee avoiding the split brain situation, in
this case for arbiter you don't even need a disk with lots of
space, it's enough to have a tiny ssd but hosted on a separate
server. Advantage of such setup is that you don't need the
RAID 1 for each brick, you have the metadata information
stored in arbiter node and brick replacement is easy.</p>
<p>2. If you have odd number of bricks (let's say 3, i.e.
replication factor is 3) in your volume and you didn't create
the arbiter node as well as you didn't configure the quorum,
in this case the entire load for keeping the consistency of
the volume resides on all 3 servers, each of them is important
and each brick contains key information, they need to
cross-check each other (that's what people usually do with the
first try of gluster :) ), in this case replacing a brick is a
big pain and in this case RAID 1 is a good option to have
(that's the disadvantage, i.e. loosing the space and not
having the JBOD option) advantage is that you don't have the
to have additional arbiter node.</p>
<p>3. You have odd number of bricks and configured arbiter node,
in this case you can easily go with JBOD, however a good
practice would be to have a RAID 1 for arbiter disks (tiny
128GB SSD-s ar perfectly sufficient for volumes with 10s of
TB-s in size.)</p>
<p>That's basically it</p>
<p>The rest about the reliability and setup scenarios you can
find in gluster documentation, especially look for quorum and
arbiter node configs+options.</p>
<p>Cheers</p>
<p>Erekle</p>
P.S. What I was mentioning, regarding a good practice is mostly
related to the operations of gluster not installation or
deployment, i.e. not the conceptual understanding of gluster
(conceptually it's a JBOD system).<br>
<br>
<div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO
FREDIANI wrote:<br>
</div>
<blockquote type="cite"
cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<p>Thanks for the clarification Erekle.</p>
<p>However I get surprised with this way of operating from
GlusterFS as it adds another layer of complexity to the
system (either a hardware or software RAID) before the
gluster config and increase the system's overall costs.<br>
</p>
<p>An important point to consider is: In RAID configuration
you already have space 'wasted' in order to build redundancy
(either RAID 1, 5, or 6). Then when you have GlusterFS on
the top of several RAIDs you have again more data replicated
so you end up with the same data consuming more space in a
group of disks and again on the top of several RAIDs
depending on the Gluster configuration you have (in a RAID 1
config the same data is replicated 4 times).</p>
<p>Yet another downside of having a RAID (specially RAID 5 or
6) is that it reduces considerably the write speeds as each
group of disks will end up having the write speed of a
single disk as all other disks of that group have to wait
for each other to write as well.<br>
</p>
<p>Therefore if Gluster already replicates data why does it
create this big pain you mentioned if the data is replicated
somewhere else, can still be retrieved to both serve clients
and reconstruct the equivalent disk when it is replaced ?</p>
<p>Fernando<br>
</p>
<br>
<div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle
Magradze wrote:<br>
</div>
<blockquote type="cite"
cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<p>Hi Frenando,</p>
<p>Here is my experience, if you consider a particular hard
drive as a brick for gluster volume and it dies, i.e. it
becomes not accessible it's a huge hassle to discard that
brick and exchange with another one, since gluster some
tries to access that broken brick and it's causing (at
least it cause for me) a big pain, therefore it's better
to have a RAID as brick, i.e. have RAID 1 (mirroring) for
each brick, in this case if the disk is down you can
easily exchange it and rebuild the RAID without going
offline, i.e switching off the volume doing brick
manipulations and switching it back on.<br>
</p>
<p>Cheers</p>
<p>Erekle<br>
</p>
<br>
<div class="moz-cite-prefix">On 08/07/2017 03:04 PM,
FERNANDO FREDIANI wrote:<br>
</div>
<blockquote type="cite"
cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<p>For any RAID 5 or 6 configuration I normally follow a
simple gold rule which gave good results so far:<br>
- up to 4 disks RAID 5<br>
- 5 or more disks RAID 6</p>
<p>However I didn't really understand well the
recommendation to use any RAID with GlusterFS. I always
thought that GlusteFS likes to work in JBOD mode and
control the disks (bricks) directlly so you can create
whatever distribution rule you wish, and if a single
disk fails you just replace it and which obviously have
the data replicated from another. The only downside of
using in this way is that the replication data will be
flow accross all servers but that is not much a big
issue.</p>
<p>Anyone can elaborate about Using RAID + GlusterFS and
JBOD + GlusterFS.</p>
<p>Thanks<br>
Regards<br>
Fernando<br>
</p>
<br>
<div class="moz-cite-prefix">On 07/08/2017 03:46, Devin
Acosta wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com">
<style>body{font-family:Helvetica,Arial;font-size:13px}</style>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono"><br>
</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono">Moacir,</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono"><br>
</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono">I have recently installed multiple Red Hat
Virtualization hosts for several different
companies, and have dealt with the Red Hat Support
Team in depth about optimal configuration in regards
to setting up GlusterFS most efficiently and I
wanted to share with you what I learned.</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono"><br>
</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono">In general Red Hat Virtualization team frowns
upon using each DISK of the system as just a JBOD,
sure there is some protection by having the data
replicated, however, the recommendation is to use
RAID 6 (preferred) or RAID-5, or at least RAID-1 at
the very least.</font></div>
<div id="bloop_customfont"
style="color:rgb(0,0,0);margin:0px"><font face="Input
Mono"><br>
</font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono">Here is the direct quote from Red
Hat when I asked about RAID and Bricks:</font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i><br>
</i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i>"A typical Gluster
configuration would use RAID underneath the
bricks. RAID 6 is most typical as it gives you 2
disk failure protection, but RAID 5 could be used
too. Once you have the RAIDed bricks, you'd then
apply the desired replication on top of that. The
most popular way of doing this would be
distributed replicated with 2x replication. In
general you'll get better performance with larger
bricks. 12 drives is often a sweet spot. Another
option would be to create a separate tier using
all SSD’s.” </i></font></div>
<div id="bloop_customfont" style="margin:0px"><br>
</div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i>In order to SSD tiering from my
understanding you would need 1 x NVMe drive in
each server, or 4 x SSD hot tier (it needs to be
distributed, replicated for the hot tier if not
using NVME). So with you only having 1 SSD drive
in each server, I’d suggest maybe looking into the
NVME option. </i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i><br>
</i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i>Since your using only
3-servers, what I’d probably suggest is to do (2
Replicas + Arbiter Node), this setup actually
doesn’t require the 3rd server to have big drives
at all as it only stores meta-data about the files
and not actually a full copy. </i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i><br>
</i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i>Please see the attached
document that was given to me by Red Hat to get
more information on this. Hope this information
helps you.</i></font></div>
<div id="bloop_customfont" style="margin:0px"><font
face="Input Mono"><i><br>
</i></font></div>
<br>
<div id="bloop_sign_1502087376725469184"
class="bloop_sign"><span style="font-family:'helvetica
Neue',helvetica;font-size:14px">--</span><br
style="font-family:'helvetica
Neue',helvetica;font-size:14px">
<div class="gmail_signature"
style="font-family:'helvetica
Neue',helvetica;font-size:14px">
<div dir="ltr">
<div><br>
</div>
<div>Devin Acosta, RHCA, RHVCA</div>
<div>Red Hat Certified Architect</div>
</div>
</div>
</div>
<br>
<p class="airmail_on">On August 6, 2017 at 7:29:29 PM,
Moacir Ferreira (<a
href="mailto:moacirferreira@hotmail.com"
moz-do-not-send="true">moacirferreira@hotmail.com</a>)
wrote:</p>
<blockquote type="cite" class="clean_bq"><span>
<div dir="ltr">
<div>
<title></title>
<div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif"
dir="ltr">
<p><span>I am willing to assemble a oVirt
"pod", made of 3 servers, each with 2 CPU
sockets of 12 cores, 256GB RAM, 7 HDD 10K,
1 SSD. The idea is to use GlusterFS to
provide HA for the VMs. The 3 servers have
a dual 40Gb NIC and a dual 10Gb NIC. So my
intention is to create a loop like a
server triangle using the 40Gb NICs for
virtualization files (VMs .qcow2) access
and to move VMs around the pod (east /west
traffic) while using the 10Gb interfaces
for giving services to the outside world
(north/south traffic).</span></p>
<p><br>
</p>
<p>This said, my first question is: How should
I deploy GlusterFS in such oVirt scenario?
My questions are:</p>
<p><br>
</p>
<p>1 - Should I create 3 RAID (i.e.: RAID 5),
one on each oVirt node, and then create a
GlusterFS using them?</p>
<p>2 - Instead, should I create a JBOD array
made of all server's disks?</p>
<p>3 - What is the best Gluster configuration
to provide for HA while not consuming too
much disk space?<br>
</p>
<p>4 - Does a oVirt hypervisor pod like I am
planning to build, and the virtualization
environment, benefits from tiering when
using a SSD disk? And yes, will Gluster do
it by default or I have to configure it to
do so?</p>
<p><br>
</p>
<p>At the bottom line, what is the good
practice for using GlusterFS in small pods
for enterprises?<br>
</p>
<p><br>
</p>
<p>You opinion/feedback will be really
appreciated!</p>
<p>Moacir<br>
</p>
</div>
_______________________________________________
<br>
Users mailing list <br>
<a href="mailto:Users@ovirt.org"
moz-do-not-send="true">Users@ovirt.org</a> <br>
<a
href="http://lists.ovirt.org/mailman/listinfo/users"
moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>
<br>
</div>
</div>
</span></blockquote>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a>
<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a>
<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Recogizer Group GmbH
Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555
E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de" moz-do-not-send="true">erekle.magradze@recogizer.de</a>
Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com" moz-do-not-send="true">www.recogizer.com</a>
Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/" moz-do-not-send="true">https://www.linkedin.com/company-beta/10039182/</a>
Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer" moz-do-not-send="true">https://twitter.com/recogizer</a>
-----------------------------------------------------------------
Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>