[ovirt-users] Good practices
Erekle Magradze
erekle.magradze at recogizer.de
Mon Aug 7 21:11:00 UTC 2017
Hi Fernando,
Indeed, having and arbiter node is always a good idea, and it saves
costs a lot.
Good luck with your setup.
Cheers
Erekle
On 07.08.2017 23:03, FERNANDO FREDIANI wrote:
>
> Thanks for the detailed answer Erekle.
>
> I conclude that it is worth in any scenario to have a arbiter node in
> order to avoid wasting more disk space to RAID X + Gluster Replication
> on the top of it. The cost seems much lower if you consider running
> costs of the whole storage and compare it with the cost to build the
> arbiter node. Even having a fully redundant arbiter service with 2
> nodes would make it wort on a larger deployment.
>
> Regards
> Fernando
>
> On 07/08/2017 17:07, Erekle Magradze wrote:
>>
>> Hi Fernando (sorry for misspelling your name, I used a different
>> keyboard),
>>
>> So let's go with the following scenarios:
>>
>> 1. Let's say you have two servers (replication factor is 2), i.e. two
>> bricks per volume, in this case it is strongly recommended to have
>> the arbiter node, the metadata storage that will guarantee avoiding
>> the split brain situation, in this case for arbiter you don't even
>> need a disk with lots of space, it's enough to have a tiny ssd but
>> hosted on a separate server. Advantage of such setup is that you
>> don't need the RAID 1 for each brick, you have the metadata
>> information stored in arbiter node and brick replacement is easy.
>>
>> 2. If you have odd number of bricks (let's say 3, i.e. replication
>> factor is 3) in your volume and you didn't create the arbiter node as
>> well as you didn't configure the quorum, in this case the entire load
>> for keeping the consistency of the volume resides on all 3 servers,
>> each of them is important and each brick contains key information,
>> they need to cross-check each other (that's what people usually do
>> with the first try of gluster :) ), in this case replacing a brick is
>> a big pain and in this case RAID 1 is a good option to have (that's
>> the disadvantage, i.e. loosing the space and not having the JBOD
>> option) advantage is that you don't have the to have additional
>> arbiter node.
>>
>> 3. You have odd number of bricks and configured arbiter node, in this
>> case you can easily go with JBOD, however a good practice would be to
>> have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly
>> sufficient for volumes with 10s of TB-s in size.)
>>
>> That's basically it
>>
>> The rest about the reliability and setup scenarios you can find in
>> gluster documentation, especially look for quorum and arbiter node
>> configs+options.
>>
>> Cheers
>>
>> Erekle
>>
>> P.S. What I was mentioning, regarding a good practice is mostly
>> related to the operations of gluster not installation or deployment,
>> i.e. not the conceptual understanding of gluster (conceptually it's a
>> JBOD system).
>>
>> On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
>>>
>>> Thanks for the clarification Erekle.
>>>
>>> However I get surprised with this way of operating from GlusterFS as
>>> it adds another layer of complexity to the system (either a hardware
>>> or software RAID) before the gluster config and increase the
>>> system's overall costs.
>>>
>>> An important point to consider is: In RAID configuration you already
>>> have space 'wasted' in order to build redundancy (either RAID 1, 5,
>>> or 6). Then when you have GlusterFS on the top of several RAIDs you
>>> have again more data replicated so you end up with the same data
>>> consuming more space in a group of disks and again on the top of
>>> several RAIDs depending on the Gluster configuration you have (in a
>>> RAID 1 config the same data is replicated 4 times).
>>>
>>> Yet another downside of having a RAID (specially RAID 5 or 6) is
>>> that it reduces considerably the write speeds as each group of disks
>>> will end up having the write speed of a single disk as all other
>>> disks of that group have to wait for each other to write as well.
>>>
>>> Therefore if Gluster already replicates data why does it create this
>>> big pain you mentioned if the data is replicated somewhere else, can
>>> still be retrieved to both serve clients and reconstruct the
>>> equivalent disk when it is replaced ?
>>>
>>> Fernando
>>>
>>>
>>> On 07/08/2017 10:26, Erekle Magradze wrote:
>>>>
>>>> Hi Frenando,
>>>>
>>>> Here is my experience, if you consider a particular hard drive as a
>>>> brick for gluster volume and it dies, i.e. it becomes not
>>>> accessible it's a huge hassle to discard that brick and exchange
>>>> with another one, since gluster some tries to access that broken
>>>> brick and it's causing (at least it cause for me) a big pain,
>>>> therefore it's better to have a RAID as brick, i.e. have RAID 1
>>>> (mirroring) for each brick, in this case if the disk is down you
>>>> can easily exchange it and rebuild the RAID without going offline,
>>>> i.e switching off the volume doing brick manipulations and
>>>> switching it back on.
>>>>
>>>> Cheers
>>>>
>>>> Erekle
>>>>
>>>>
>>>> On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
>>>>>
>>>>> For any RAID 5 or 6 configuration I normally follow a simple gold
>>>>> rule which gave good results so far:
>>>>> - up to 4 disks RAID 5
>>>>> - 5 or more disks RAID 6
>>>>>
>>>>> However I didn't really understand well the recommendation to use
>>>>> any RAID with GlusterFS. I always thought that GlusteFS likes to
>>>>> work in JBOD mode and control the disks (bricks) directlly so you
>>>>> can create whatever distribution rule you wish, and if a single
>>>>> disk fails you just replace it and which obviously have the data
>>>>> replicated from another. The only downside of using in this way is
>>>>> that the replication data will be flow accross all servers but
>>>>> that is not much a big issue.
>>>>>
>>>>> Anyone can elaborate about Using RAID + GlusterFS and JBOD +
>>>>> GlusterFS.
>>>>>
>>>>> Thanks
>>>>> Regards
>>>>> Fernando
>>>>>
>>>>>
>>>>> On 07/08/2017 03:46, Devin Acosta wrote:
>>>>>>
>>>>>> Moacir,
>>>>>>
>>>>>> I have recently installed multiple Red Hat Virtualization hosts
>>>>>> for several different companies, and have dealt with the Red Hat
>>>>>> Support Team in depth about optimal configuration in regards to
>>>>>> setting up GlusterFS most efficiently and I wanted to share with
>>>>>> you what I learned.
>>>>>>
>>>>>> In general Red Hat Virtualization team frowns upon using each
>>>>>> DISK of the system as just a JBOD, sure there is some protection
>>>>>> by having the data replicated, however, the recommendation is to
>>>>>> use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very
>>>>>> least.
>>>>>>
>>>>>> Here is the direct quote from Red Hat when I asked about RAID and
>>>>>> Bricks:
>>>>>> /
>>>>>> /
>>>>>> /"A typical Gluster configuration would use RAID underneath the
>>>>>> bricks. RAID 6 is most typical as it gives you 2 disk failure
>>>>>> protection, but RAID 5 could be used too. Once you have the
>>>>>> RAIDed bricks, you'd then apply the desired replication on top of
>>>>>> that. The most popular way of doing this would be distributed
>>>>>> replicated with 2x replication. In general you'll get better
>>>>>> performance with larger bricks. 12 drives is often a sweet spot.
>>>>>> Another option would be to create a separate tier using all SSD’s.” /
>>>>>>
>>>>>> /In order to SSD tiering from my understanding you would need 1 x
>>>>>> NVMe drive in each server, or 4 x SSD hot tier (it needs to be
>>>>>> distributed, replicated for the hot tier if not using NVME). So
>>>>>> with you only having 1 SSD drive in each server, I’d suggest
>>>>>> maybe looking into the NVME option. /
>>>>>> /
>>>>>> /
>>>>>> /Since your using only 3-servers, what I’d probably suggest is to
>>>>>> do (2 Replicas + Arbiter Node), this setup actually doesn’t
>>>>>> require the 3rd server to have big drives at all as it only
>>>>>> stores meta-data about the files and not actually a full copy. /
>>>>>> /
>>>>>> /
>>>>>> /Please see the attached document that was given to me by Red Hat
>>>>>> to get more information on this. Hope this information helps you./
>>>>>> /
>>>>>> /
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Devin Acosta, RHCA, RHVCA
>>>>>> Red Hat Certified Architect
>>>>>>
>>>>>> On August 6, 2017 at 7:29:29 PM, Moacir Ferreira
>>>>>> (moacirferreira at hotmail.com <mailto:moacirferreira at hotmail.com>)
>>>>>> wrote:
>>>>>>
>>>>>>> I am willing to assemble a oVirt "pod", made of 3 servers, each
>>>>>>> with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The
>>>>>>> idea is to use GlusterFS to provide HA for the VMs. The 3
>>>>>>> servers have a dual 40Gb NIC and a dual 10Gb NIC. So my
>>>>>>> intention is to create a loop like a server triangle using the
>>>>>>> 40Gb NICs for virtualization files (VMs .qcow2) access and to
>>>>>>> move VMs around the pod (east /west traffic) while using the
>>>>>>> 10Gb interfaces for giving services to the outside world
>>>>>>> (north/south traffic).
>>>>>>>
>>>>>>>
>>>>>>> This said, my first question is: How should I deploy GlusterFS
>>>>>>> in such oVirt scenario? My questions are:
>>>>>>>
>>>>>>>
>>>>>>> 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt
>>>>>>> node, and then create a GlusterFS using them?
>>>>>>>
>>>>>>> 2 - Instead, should I create a JBOD array made of all server's
>>>>>>> disks?
>>>>>>>
>>>>>>> 3 - What is the best Gluster configuration to provide for HA
>>>>>>> while not consuming too much disk space?
>>>>>>>
>>>>>>> 4 - Does a oVirt hypervisor pod like I am planning to build, and
>>>>>>> the virtualization environment, benefits from tiering when using
>>>>>>> a SSD disk? And yes, will Gluster do it by default or I have to
>>>>>>> configure it to do so?
>>>>>>>
>>>>>>>
>>>>>>> At the bottom line, what is the good practice for using
>>>>>>> GlusterFS in small pods for enterprises?
>>>>>>>
>>>>>>>
>>>>>>> You opinion/feedback will be really appreciated!
>>>>>>>
>>>>>>> Moacir
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org <mailto:Users at ovirt.org>
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>
>> --
>> Recogizer Group GmbH
>>
>> Dr.rer.nat. Erekle Magradze
>> Lead Big Data Engineering & DevOps
>> Rheinwerkallee 2, 53227 Bonn
>> Tel: +49 228 29974555
>>
>> E-Mailerekle.magradze at recogizer.de
>> Web:www.recogizer.com
>>
>> Recogizer auf LinkedInhttps://www.linkedin.com/company-beta/10039182/
>> Folgen Sie uns auf Twitterhttps://twitter.com/recogizer
>>
>> -----------------------------------------------------------------
>> Recogizer Group GmbH
>> Geschäftsführer: Oliver Habisch, Carsten Kreutze
>> Handelsregister: Amtsgericht Bonn HRB 20724
>> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>>
>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.
>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
>> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170807/d276f5de/attachment-0001.html>
More information about the Users
mailing list