[ovirt-users] Good practices

Tue Aug 8 06:29:47 UTC 2017

On Tue, Aug 8, 2017 at 12:03 AM, FERNANDO FREDIANI <
fernando.frediani at upx.com> wrote:

> Thanks for the detailed answer Erekle.
>
> I conclude that it is worth in any scenario to have a arbiter node in
> order to avoid wasting more disk space to RAID X + Gluster Replication on
> the top of it. The cost seems much lower if you consider running costs of
> the whole storage and compare it with the cost to build the arbiter node.
> Even having a fully redundant arbiter service with 2 nodes would make it
> wort on a larger deployment.
>

Note that although you get the same consistency as a replica 3 setup, a
2+arbiter gives you data availability as a replica 2 setup. May or may not
be OK with your high availability requirements.
Y.

> Regards
> Fernando
> On 07/08/2017 17:07, Erekle Magradze wrote:
>
> Hi Fernando (sorry for misspelling your name, I used a different keyboard),
>
> So let's go with the following scenarios:
>
> 1. Let's say you have two servers (replication factor is 2), i.e. two
> bricks per volume, in this case it is strongly recommended to have the
> arbiter node, the metadata storage that will guarantee avoiding the split
> brain situation, in this case for arbiter you don't even need a disk with
> lots of space, it's enough to have a tiny ssd but hosted on a separate
> server. Advantage of such setup is that you don't need the RAID 1 for each
> brick, you have the metadata information stored in arbiter node and brick
> replacement is easy.
>
> 2. If you have odd number of bricks (let's say 3, i.e. replication factor
> is 3) in your volume and you didn't create the arbiter node as well as you
> didn't configure the quorum, in this case the entire load for keeping the
> consistency of the volume resides on all 3 servers, each of them is
> important and each brick contains key information, they need to cross-check
> each other (that's what people usually do with the first try of gluster :)
> ), in this case replacing a brick is a big pain and in this case RAID 1 is
> a good option to have (that's the disadvantage, i.e. loosing the space and
> not having the JBOD option) advantage is that you don't have the to have
> additional arbiter node.
>
> 3. You have odd number of bricks and configured arbiter node, in this case
> you can easily go with JBOD, however a good practice would be to have a
> RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for
> volumes with 10s of TB-s in size.)
>
> That's basically it
>
> The rest about the reliability and setup scenarios you can find in gluster
> documentation, especially look for quorum and arbiter node configs+options.
>
> Cheers
>
> Erekle
> P.S. What I was mentioning, regarding a good practice is mostly related to
> the operations of gluster not installation or deployment, i.e. not the
> conceptual understanding of gluster (conceptually it's a JBOD system).
>
> On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
>
> Thanks for the clarification Erekle.
>
> However I get surprised with this way of operating from GlusterFS as it
> adds another layer of complexity to the system (either a hardware or
> software RAID) before the gluster config and increase the system's overall
> costs.
>
> An important point to consider is: In RAID configuration you already have
> space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then
> when you have GlusterFS on the top of several RAIDs you have again more
> data replicated so you end up with the same data consuming more space in a
> group of disks and again on the top of several RAIDs depending on the
> Gluster configuration you have (in a RAID 1 config the same data is
> replicated 4 times).
>
> Yet another downside of having a RAID (specially RAID 5 or 6) is that it
> reduces considerably the write speeds as each group of disks will end up
> having the write speed of a single disk as all other disks of that group
> have to wait for each other to write as well.
>
> Therefore if Gluster already replicates data why does it create this big
> pain you mentioned if the data is replicated somewhere else, can still be
> retrieved to both serve clients and reconstruct the equivalent disk when it
> is replaced ?
>
> Fernando
>
> On 07/08/2017 10:26, Erekle Magradze wrote:
>
> Hi Frenando,
>
> Here is my experience, if you consider a particular hard drive as a brick
> for gluster volume and it dies, i.e. it becomes not accessible it's a huge
> hassle to discard that brick and exchange with another one, since gluster
> some tries to access that broken brick and it's causing (at least it cause
> for me) a big pain, therefore it's better to have a RAID as brick, i.e.
> have RAID 1 (mirroring) for each brick, in this case if the disk is down
> you can easily exchange it and rebuild the RAID without going offline, i.e
> switching off the volume doing brick manipulations and switching it back on.
>
> Cheers
>
> Erekle
>
> On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
>
> For any RAID 5 or 6 configuration I normally follow a simple gold rule
> which gave good results so far:
> - up to 4 disks RAID 5
> - 5 or more disks RAID 6
>
> However I didn't really understand well the recommendation to use any RAID
> with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode
> and control the disks (bricks) directlly so you can create whatever
> distribution rule you wish, and if a single disk fails you just replace it
> and which obviously have the data replicated from another. The only
> downside of using in this way is that the replication data will be flow
> accross all servers but that is not much a big issue.
>
> Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
>
> Thanks
> Regards
> Fernando
>
> On 07/08/2017 03:46, Devin Acosta wrote:
>
>
> Moacir,
>
> I have recently installed multiple Red Hat Virtualization hosts for
> several different companies, and have dealt with the Red Hat Support Team
> in depth about optimal configuration in regards to setting up GlusterFS
> most efficiently and I wanted to share with you what I learned.
>
> In general Red Hat Virtualization team frowns upon using each DISK of the
> system as just a JBOD, sure there is some protection by having the data
> replicated, however, the recommendation is to use RAID 6 (preferred) or
> RAID-5, or at least RAID-1 at the very least.
>
> Here is the direct quote from Red Hat when I asked about RAID and Bricks:
>
> *"A typical Gluster configuration would use RAID underneath the bricks.
> RAID 6 is most typical as it gives you 2 disk failure protection, but RAID
> 5 could be used too. Once you have the RAIDed bricks, you'd then apply the
> desired replication on top of that. The most popular way of doing this
> would be distributed replicated with 2x replication. In general you'll get
> better performance with larger bricks. 12 drives is often a sweet spot.
> Another option would be to create a separate tier using all SSD’s.” *
>
> *In order to SSD tiering from my understanding you would need 1 x NVMe
> drive in each server, or 4 x SSD hot tier (it needs to be distributed,
> replicated for the hot tier if not using NVME). So with you only having 1
> SSD drive in each server, I’d suggest maybe looking into the NVME option. *
>
> *Since your using only 3-servers, what I’d probably suggest is to do (2
> Replicas + Arbiter Node), this setup actually doesn’t require the 3rd
> server to have big drives at all as it only stores meta-data about the
> files and not actually a full copy. *
>
> *Please see the attached document that was given to me by Red Hat to get
> more information on this. Hope this information helps you.*
>
>
> --
>
> Devin Acosta, RHCA, RHVCA
> Red Hat Certified Architect
>
> On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (
> moacirferreira at hotmail.com) wrote:
>
> I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU
> sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use
> GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and
> a dual 10Gb NIC. So my intention is to create a loop like a server triangle
> using the 40Gb NICs for virtualization files (VMs .qcow2) access and to
> move VMs around the pod (east /west traffic) while using the 10Gb
> interfaces for giving services to the outside world (north/south traffic).
>
>
> This said, my first question is: How should I deploy GlusterFS in such
> oVirt scenario? My questions are:
>
>
> 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and
> then create a GlusterFS using them?
>
> 2 - Instead, should I create a JBOD array made of all server's disks?
>
> 3 - What is the best Gluster configuration to provide for HA while not
> consuming too much disk space?
>
> 4 - Does a oVirt hypervisor pod like I am planning to build, and the
> virtualization environment, benefits from tiering when using a SSD disk?
> And yes, will Gluster do it by default or I have to configure it to do so?
>
>
> At the bottom line, what is the good practice for using GlusterFS in small
> pods for enterprises?
>
>
> You opinion/feedback will be really appreciated!
>
> Moacir
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing listUsers at ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> Users mailing listUsers at ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> --
> Recogizer Group GmbH
>
> Dr.rer.nat. Erekle Magradze
> Lead Big Data Engineering & DevOps
> Rheinwerkallee 2, 53227 Bonn
> Tel: +49 228 29974555 <+49%20228%2029974555>
>
> E-Mail erekle.magradze at recogizer.de
> Web: www.recogizer.com
>
> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/
> Folgen Sie uns auf Twitter https://twitter.com/recogizer
>
> -----------------------------------------------------------------
> Recogizer Group GmbH
> Geschäftsführer: Oliver Habisch, Carsten Kreutze
> Handelsregister: Amtsgericht Bonn HRB 20724
> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
> informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170808/e3713841/attachment-0001.html>