[ovirt-users] Good practices

Mon Aug 7 21:11:00 UTC 2017

Hi Fernando,

Indeed, having and arbiter node is always a good idea, and it saves 
costs a lot.

Good luck with your setup.

Cheers

Erekle

On 07.08.2017 23:03, FERNANDO FREDIANI wrote:
>
> Thanks for the detailed answer Erekle.
>
> I conclude that it is worth in any scenario to have a arbiter node in 
> order to avoid wasting more disk space to RAID X + Gluster Replication 
> on the top of it. The cost seems much lower if you consider running 
> costs of the whole storage and compare it with the cost to build the 
> arbiter node. Even having a fully redundant arbiter service with 2 
> nodes would make it wort on a larger deployment.
>
> Regards
> Fernando
>
> On 07/08/2017 17:07, Erekle Magradze wrote:
>>
>> Hi Fernando (sorry for misspelling your name, I used a different 
>> keyboard),
>>
>> So let's go with the following scenarios:
>>
>> 1. Let's say you have two servers (replication factor is 2), i.e. two 
>> bricks per volume, in this case it is strongly recommended to have 
>> the arbiter node, the metadata storage that will guarantee avoiding 
>> the split brain situation, in this case for arbiter you don't even 
>> need a disk with lots of space, it's enough to have a tiny ssd but 
>> hosted on a separate server. Advantage of such setup is that you 
>> don't need the RAID 1 for each brick, you have the metadata 
>> information stored in arbiter node and brick replacement is easy.
>>
>> 2. If you have odd number of bricks (let's say 3, i.e. replication 
>> factor is 3) in your volume and you didn't create the arbiter node as 
>> well as you didn't configure the quorum, in this case the entire load 
>> for keeping the consistency of the volume resides on all 3 servers, 
>> each of them is important and each brick contains key information, 
>> they need to cross-check each other (that's what people usually do 
>> with the first try of gluster :) ), in this case replacing a brick is 
>> a big pain and in this case RAID 1 is a good option to have (that's 
>> the disadvantage, i.e. loosing the space and not having the JBOD 
>> option) advantage is that you don't have the to have additional 
>> arbiter node.
>>
>> 3. You have odd number of bricks and configured arbiter node, in this 
>> case you can easily go with JBOD, however a good practice would be to 
>> have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly 
>> sufficient for volumes with 10s of TB-s in size.)
>>
>> That's basically it
>>
>> The rest about the reliability and setup scenarios you can find in 
>> gluster documentation, especially look for quorum and arbiter node 
>> configs+options.
>>
>> Cheers
>>
>> Erekle
>>
>> P.S. What I was mentioning, regarding a good practice is mostly 
>> related to the operations of gluster not installation or deployment, 
>> i.e. not the conceptual understanding of gluster (conceptually it's a 
>> JBOD system).
>>
>> On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
>>>
>>> Thanks for the clarification Erekle.
>>>
>>> However I get surprised with this way of operating from GlusterFS as 
>>> it adds another layer of complexity to the system (either a hardware 
>>> or software RAID) before the gluster config and increase the 
>>> system's overall costs.
>>>
>>> An important point to consider is: In RAID configuration you already 
>>> have space 'wasted' in order to build redundancy (either RAID 1, 5, 
>>> or 6). Then when you have GlusterFS on the top of several RAIDs you 
>>> have again more data replicated so you end up with the same data 
>>> consuming more space in a group of disks and again on the top of 
>>> several RAIDs depending on the Gluster configuration you have (in a 
>>> RAID 1 config the same data is replicated 4 times).
>>>
>>> Yet another downside of having a RAID (specially RAID 5 or 6) is 
>>> that it reduces considerably the write speeds as each group of disks 
>>> will end up having the write speed of a single disk as all other 
>>> disks of that group have to wait for each other to write as well.
>>>
>>> Therefore if Gluster already replicates data why does it create this 
>>> big pain you mentioned if the data is replicated somewhere else, can 
>>> still be retrieved to both serve clients and reconstruct the 
>>> equivalent disk when it is replaced ?
>>>
>>> Fernando
>>>
>>>
>>> On 07/08/2017 10:26, Erekle Magradze wrote:
>>>>
>>>> Hi Frenando,
>>>>
>>>> Here is my experience, if you consider a particular hard drive as a 
>>>> brick for gluster volume and it dies, i.e. it becomes not 
>>>> accessible it's a huge hassle to discard that brick and exchange 
>>>> with another one, since gluster some tries to access that broken 
>>>> brick and it's causing (at least it cause for me) a big pain, 
>>>> therefore it's better to have a RAID as brick, i.e. have RAID 1 
>>>> (mirroring) for each brick, in this case if the disk is down you 
>>>> can easily exchange it and rebuild the RAID without going offline, 
>>>> i.e switching off the volume doing brick manipulations and 
>>>> switching it back on.
>>>>
>>>> Cheers
>>>>
>>>> Erekle
>>>>
>>>>
>>>> On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
>>>>>
>>>>> For any RAID 5 or 6 configuration I normally follow a simple gold 
>>>>> rule which gave good results so far:
>>>>> - up to 4 disks RAID 5
>>>>> - 5 or more disks RAID 6
>>>>>
>>>>> However I didn't really understand well the recommendation to use 
>>>>> any RAID with GlusterFS. I always thought that GlusteFS likes to 
>>>>> work in JBOD mode and control the disks (bricks) directlly so you 
>>>>> can create whatever distribution rule you wish, and if a single 
>>>>> disk fails you just replace it and which obviously have the data 
>>>>> replicated from another. The only downside of using in this way is 
>>>>> that the replication data will be flow accross all servers but 
>>>>> that is not much a big issue.
>>>>>
>>>>> Anyone can elaborate about Using RAID + GlusterFS and JBOD + 
>>>>> GlusterFS.
>>>>>
>>>>> Thanks
>>>>> Regards
>>>>> Fernando
>>>>>
>>>>>
>>>>> On 07/08/2017 03:46, Devin Acosta wrote:
>>>>>>
>>>>>> Moacir,
>>>>>>
>>>>>> I have recently installed multiple Red Hat Virtualization hosts 
>>>>>> for several different companies, and have dealt with the Red Hat 
>>>>>> Support Team in depth about optimal configuration in regards to 
>>>>>> setting up GlusterFS most efficiently and I wanted to share with 
>>>>>> you what I learned.
>>>>>>
>>>>>> In general Red Hat Virtualization team frowns upon using each 
>>>>>> DISK of the system as just a JBOD, sure there is some protection 
>>>>>> by having the data replicated, however, the recommendation is to 
>>>>>> use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very 
>>>>>> least.
>>>>>>
>>>>>> Here is the direct quote from Red Hat when I asked about RAID and 
>>>>>> Bricks:
>>>>>> /
>>>>>> /
>>>>>> /"A typical Gluster configuration would use RAID underneath the 
>>>>>> bricks. RAID 6 is most typical as it gives you 2 disk failure 
>>>>>> protection, but RAID 5 could be used too. Once you have the 
>>>>>> RAIDed bricks, you'd then apply the desired replication on top of 
>>>>>> that. The most popular way of doing this would be distributed 
>>>>>> replicated with 2x replication. In general you'll get better 
>>>>>> performance with larger bricks. 12 drives is often a sweet spot. 
>>>>>> Another option would be to create a separate tier using all SSD’s.” /
>>>>>>
>>>>>> /In order to SSD tiering from my understanding you would need 1 x 
>>>>>> NVMe drive in each server, or 4 x SSD hot tier (it needs to be 
>>>>>> distributed, replicated for the hot tier if not using NVME). So 
>>>>>> with you only having 1 SSD drive in each server, I’d suggest 
>>>>>> maybe looking into the NVME option. /
>>>>>> /
>>>>>> /
>>>>>> /Since your using only 3-servers, what I’d probably suggest is to 
>>>>>> do (2 Replicas + Arbiter Node), this setup actually doesn’t 
>>>>>> require the 3rd server to have big drives at all as it only 
>>>>>> stores meta-data about the files and not actually a full copy. /
>>>>>> /
>>>>>> /
>>>>>> /Please see the attached document that was given to me by Red Hat 
>>>>>> to get more information on this. Hope this information helps you./
>>>>>> /
>>>>>> /
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Devin Acosta, RHCA, RHVCA
>>>>>> Red Hat Certified Architect
>>>>>>
>>>>>> On August 6, 2017 at 7:29:29 PM, Moacir Ferreira 
>>>>>> (moacirferreira at hotmail.com <mailto:moacirferreira at hotmail.com>) 
>>>>>> wrote:
>>>>>>
>>>>>>> I am willing to assemble a oVirt "pod", made of 3 servers, each 
>>>>>>> with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The 
>>>>>>> idea is to use GlusterFS to provide HA for the VMs. The 3 
>>>>>>> servers have a dual 40Gb NIC and a dual 10Gb NIC. So my 
>>>>>>> intention is to create a loop like a server triangle using the 
>>>>>>> 40Gb NICs for virtualization files (VMs .qcow2) access and to 
>>>>>>> move VMs around the pod (east /west traffic) while using the 
>>>>>>> 10Gb interfaces for giving services to the outside world 
>>>>>>> (north/south traffic).
>>>>>>>
>>>>>>>
>>>>>>> This said, my first question is: How should I deploy GlusterFS 
>>>>>>> in such oVirt scenario? My questions are:
>>>>>>>
>>>>>>>
>>>>>>> 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt 
>>>>>>> node, and then create a GlusterFS using them?
>>>>>>>
>>>>>>> 2 - Instead, should I create a JBOD array made of all server's 
>>>>>>> disks?
>>>>>>>
>>>>>>> 3 - What is the best Gluster configuration to provide for HA 
>>>>>>> while not consuming too much disk space?
>>>>>>>
>>>>>>> 4 - Does a oVirt hypervisor pod like I am planning to build, and 
>>>>>>> the virtualization environment, benefits from tiering when using 
>>>>>>> a SSD disk? And yes, will Gluster do it by default or I have to 
>>>>>>> configure it to do so?
>>>>>>>
>>>>>>>
>>>>>>> At the bottom line, what is the good practice for using 
>>>>>>> GlusterFS in small pods for enterprises?
>>>>>>>
>>>>>>>
>>>>>>> You opinion/feedback will be really appreciated!
>>>>>>>
>>>>>>> Moacir
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org <mailto:Users at ovirt.org>
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>
>> -- 
>> Recogizer Group GmbH
>>
>> Dr.rer.nat. Erekle Magradze
>> Lead Big Data Engineering & DevOps
>> Rheinwerkallee 2, 53227 Bonn
>> Tel: +49 228 29974555
>>
>> E-Mailerekle.magradze at recogizer.de
>> Web:www.recogizer.com
>>   
>> Recogizer auf LinkedInhttps://www.linkedin.com/company-beta/10039182/
>> Folgen Sie uns auf Twitterhttps://twitter.com/recogizer
>>   
>> -----------------------------------------------------------------
>> Recogizer Group GmbH
>> Geschäftsführer: Oliver Habisch, Carsten Kreutze
>> Handelsregister: Amtsgericht Bonn HRB 20724
>> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>>   
>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.
>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
>> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170807/d276f5de/attachment-0001.html>