[ovirt-users] Users Digest, Vol 71, Issue 37

Yaniv Kaul ykaul at redhat.com
Tue Aug 8 06:35:56 UTC 2017


On Tue, Aug 8, 2017 at 12:42 AM, Moacir Ferreira <moacirferreira at hotmail.com
> wrote:

> Fabrice,
>
>
> If you choose to have jumbo frames all over, then when the traffic goes
> outside of your "jumbo frames" enabled network it will be necessary to be
> fragmented back again to the destination MTU. Most of the datacenters will
> provide services to the outside world where the MTU is 1500 bytes. In this
> case, you will slow down your performance because your router will be doing
> the fragmentation. So I would always use jumbo frames in the datacenter for
> east/west traffic and standard (1500 bytes) for north/south traffic.
>

I doubt this would happen with modern TCP/IP stacks, for TCP connections.
It'll adjust to the path most likely, using PMTUD. Of course, this does not
always work (depends on HW en-route).
UDP packets might fail miserably too (dropped), depending on the HW
en-route, but UDP traffic (and specifically large packets) are not that
common these days.

Nevertheless, I don't see a huge advantage in enabling this for north-south
traffic, TBH, and the mysterious, random traffic drop issues it may cause
is not worth it.
Y.

>
> Moacir
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 7 Aug 2017 21:50:36 +0200
> From: Fabrice Bacchella <fabrice.bacchella at orange.fr>
> To: FERNANDO FREDIANI <fernando.frediani at upx.com>
> Cc: users at ovirt.org
> Subject: Re: [ovirt-users] Good practices
> Message-ID: <4365E3F7-4C77-4FF5-8401-1CDA2F0029EE at orange.fr>
> Content-Type: text/plain; charset="windows-1252"
>
> >> Moacir: Yes! This is another reason to have separate networks for
> north/south and east/west. In that way I can use the standard MTU on the
> 10Gb NICs and jumbo frames on the file/move 40Gb NICs.
>
> Why not Jumbo frame every where ?
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ovirt.org/pipermail/users/attachments/
> 20170807/4ba55f08/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 7 Aug 2017 16:52:40 -0300
> From: FERNANDO FREDIANI <fernando.frediani at upx.com>
> To: Fabrice Bacchella <fabrice.bacchella at orange.fr>
> Cc: users at ovirt.org
> Subject: Re: [ovirt-users] Good practices
> Message-ID: <40d044ae-a41d-082e-131a-bf5fb5503513 at upx.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> What you mentioned is a specific case and not a generic situation. The
> main point there is that RAID 5 or 6 impacts write performance compared
> when you write to only 2 given disks at a time. That was the comparison
> made.
>
> Fernando
>
>
> On 07/08/2017 16:49, Fabrice Bacchella wrote:
> >
> >> Le 7 ao?t 2017 ? 17:41, FERNANDO FREDIANI <fernando.frediani at upx.com
> >> <mailto:fernando.frediani at upx.com <fernando.frediani at upx.com>>> a
> ?crit :
> >>
> >
> >> Yet another downside of having a RAID (specially RAID 5 or 6) is that
> >> it reduces considerably the write speeds as each group of disks will
> >> end up having the write speed of a single disk as all other disks of
> >> that group have to wait for each other to write as well.
> >>
> >
> > That's not true if you have medium to high range hardware raid. For
> > example, HP Smart Array come with a flash cache of about 1 or 2 Gb
> > that hides that from the OS.
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ovirt.org/pipermail/users/attachments/
> 20170807/db3094e7/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 7 Aug 2017 22:05:19 +0200
> From: Erekle Magradze <erekle.magradze at recogizer.de>
> To: FERNANDO FREDIANI <fernando.frediani at upx.com>, users at ovirt.org
> Subject: Re: [ovirt-users] Good practices
> Message-ID: <bac362c7-daba-918c-f728-13e1a74d6cc9 at recogizer.de>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> Hi Franando,
>
> So let's go with the following scenarios:
>
> 1. Let's say you have two servers (replication factor is 2), i.e. two
> bricks per volume, in this case it is strongly recommended to have the
> arbiter node, the metadata storage that will guarantee avoiding the
> split brain situation, in this case for arbiter you don't even need a
> disk with lots of space, it's enough to have a tiny ssd but hosted on a
> separate server. Advantage of such setup is that you don't need the RAID
> 1 for each brick, you have the metadata information stored in arbiter
> node and brick replacement is easy.
>
> 2. If you have odd number of bricks (let's say 3, i.e. replication
> factor is 3) in your volume and you didn't create the arbiter node as
> well as you didn't configure the quorum, in this case the entire load
> for keeping the consistency of the volume resides on all 3 servers, each
> of them is important and each brick contains key information, they need
> to cross-check each other (that's what people usually do with the first
> try of gluster :) ), in this case replacing a brick is a big pain and in
> this case RAID 1 is a good option to have (that's the disadvantage, i.e.
> loosing the space and not having the JBOD option) advantage is that you
> don't have the to have additional arbiter node.
>
> 3. You have odd number of bricks and configured arbiter node, in this
> case you can easily go with JBOD, however a good practice would be to
> have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly
> sufficient for volumes with 10s of TB-s in size.)
>
> That's basically it
>
> The rest about the reliability and setup scenarios you can find in
> gluster documentation, especially look for quorum and arbiter node
> configs+options.
>
> Cheers
>
> Erekle
>
> P.S. What I was mentioning, regarding a good practice is mostly related
> to the operations of gluster not installation or deployment, i.e. not
> the conceptual understanding of gluster (conceptually it's a JBOD system).
>
>
> On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
> >
> > Thanks for the clarification Erekle.
> >
> > However I get surprised with this way of operating from GlusterFS as
> > it adds another layer of complexity to the system (either a hardware
> > or software RAID) before the gluster config and increase the system's
> > overall costs.
> >
> > An important point to consider is: In RAID configuration you already
> > have space 'wasted' in order to build redundancy (either RAID 1, 5, or
> > 6). Then when you have GlusterFS on the top of several RAIDs you have
> > again more data replicated so you end up with the same data consuming
> > more space in a group of disks and again on the top of several RAIDs
> > depending on the Gluster configuration you have (in a RAID 1 config
> > the same data is replicated 4 times).
> >
> > Yet another downside of having a RAID (specially RAID 5 or 6) is that
> > it reduces considerably the write speeds as each group of disks will
> > end up having the write speed of a single disk as all other disks of
> > that group have to wait for each other to write as well.
> >
> > Therefore if Gluster already replicates data why does it create this
> > big pain you mentioned if the data is replicated somewhere else, can
> > still be retrieved to both serve clients and reconstruct the
> > equivalent disk when it is replaced ?
> >
> > Fernando
> >
> >
> > On 07/08/2017 10:26, Erekle Magradze wrote:
> >>
> >> Hi Frenando,
> >>
> >> Here is my experience, if you consider a particular hard drive as a
> >> brick for gluster volume and it dies, i.e. it becomes not accessible
> >> it's a huge hassle to discard that brick and exchange with another
> >> one, since gluster some tries to access that broken brick and it's
> >> causing (at least it cause for me) a big pain, therefore it's better
> >> to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick,
> >> in this case if the disk is down you can easily exchange it and
> >> rebuild the RAID without going offline, i.e switching off the volume
> >> doing brick manipulations and switching it back on.
> >>
> >> Cheers
> >>
> >> Erekle
> >>
> >>
> >> On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
> >>>
> >>> For any RAID 5 or 6 configuration I normally follow a simple gold
> >>> rule which gave good results so far:
> >>> - up to 4 disks RAID 5
> >>> - 5 or more disks RAID 6
> >>>
> >>> However I didn't really understand well the recommendation to use
> >>> any RAID with GlusterFS. I always thought that GlusteFS likes to
> >>> work in JBOD mode and control the disks (bricks) directlly so you
> >>> can create whatever distribution rule you wish, and if a single disk
> >>> fails you just replace it and which obviously have the data
> >>> replicated from another. The only downside of using in this way is
> >>> that the replication data will be flow accross all servers but that
> >>> is not much a big issue.
> >>>
> >>> Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
> >>>
> >>> Thanks
> >>> Regards
> >>> Fernando
> >>>
> >>>
> >>> On 07/08/2017 03:46, Devin Acosta wrote:
> >>>>
> >>>> Moacir,
> >>>>
> >>>> I have recently installed multiple Red Hat Virtualization hosts for
> >>>> several different companies, and have dealt with the Red Hat
> >>>> Support Team in depth about optimal configuration in regards to
> >>>> setting up GlusterFS most efficiently and I wanted to share with
> >>>> you what I learned.
> >>>>
> >>>> In general Red Hat Virtualization team frowns upon using each DISK
> >>>> of the system as just a JBOD, sure there is some protection by
> >>>> having the data replicated, however, the recommendation is to use
> >>>> RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
> >>>>
> >>>> Here is the direct quote from Red Hat when I asked about RAID and
> >>>> Bricks:
> >>>> /
> >>>> /
> >>>> /"A typical Gluster configuration would use RAID underneath the
> >>>> bricks. RAID 6 is most typical as it gives you 2 disk failure
> >>>> protection, but RAID 5 could be used too. Once you have the RAIDed
> >>>> bricks, you'd then apply the desired replication on top of that.
> >>>> The most popular way of doing this would be distributed replicated
> >>>> with 2x replication. In general you'll get better performance with
> >>>> larger bricks. 12 drives is often a sweet spot. Another option
> >>>> would be to create a separate tier using all SSD?s.? /
> >>>>
> >>>> /In order to SSD tiering from my understanding you would need 1 x
> >>>> NVMe drive in each server, or 4 x SSD hot tier (it needs to be
> >>>> distributed, replicated for the hot tier if not using NVME). So
> >>>> with you only having 1 SSD drive in each server, I?d suggest maybe
> >>>> looking into the NVME option. /
> >>>> /
> >>>> /
> >>>> /Since your using only 3-servers, what I?d probably suggest is to
> >>>> do (2 Replicas + Arbiter Node), this setup actually doesn?t require
> >>>> the 3rd server to have big drives at all as it only stores
> >>>> meta-data about the files and not actually a full copy. /
> >>>> /
> >>>> /
> >>>> /Please see the attached document that was given to me by Red Hat
> >>>> to get more information on this. Hope this information helps you./
> >>>> /
> >>>> /
> >>>>
> >>>> --
> >>>>
> >>>> Devin Acosta, RHCA, RHVCA
> >>>> Red Hat Certified Architect
> >>>>
> >>>> On August 6, 2017 at 7:29:29 PM, Moacir Ferreira
> >>>> (moacirferreira at hotmail.com <mailto:moacirferreira at hotmail.com
> <moacirferreira at hotmail.com>>) wrote:
> >>>>
> >>>>> I am willing to assemble a oVirt "pod", made of 3 servers, each
> >>>>> with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The
> >>>>> idea is to use GlusterFS to provide HA for the VMs. The 3 servers
> >>>>> have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to
> >>>>> create a loop like a server triangle using the 40Gb NICs for
> >>>>> virtualization files (VMs .qcow2) access and to move VMs around
> >>>>> the pod (east /west traffic) while using the 10Gb interfaces for
> >>>>> giving services to the outside world (north/south traffic).
> >>>>>
> >>>>>
> >>>>> This said, my first question is: How should I deploy GlusterFS in
> >>>>> such oVirt scenario? My questions are:
> >>>>>
> >>>>>
> >>>>> 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node,
> >>>>> and then create a GlusterFS using them?
> >>>>>
> >>>>> 2 - Instead, should I create a JBOD array made of all server's disks?
> >>>>>
> >>>>> 3 - What is the best Gluster configuration to provide for HA while
> >>>>> not consuming too much disk space?
> >>>>>
> >>>>> 4 - Does a oVirt hypervisor pod like I am planning to build, and
> >>>>> the virtualization environment, benefits from tiering when using a
> >>>>> SSD disk? And yes, will Gluster do it by default or I have to
> >>>>> configure it to do so?
> >>>>>
> >>>>>
> >>>>> At the bottom line, what is the good practice for using GlusterFS
> >>>>> in small pods for enterprises?
> >>>>>
> >>>>>
> >>>>> You opinion/feedback will be really appreciated!
> >>>>>
> >>>>> Moacir
> >>>>>
> >>>>> _______________________________________________
> >>>>> Users mailing list
> >>>>> Users at ovirt.org <mailto:Users at ovirt.org <Users at ovirt.org>>
> >>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >
>
> --
> Recogizer Group GmbH
>
> Dr.rer.nat. Erekle Magradze
> Lead Big Data Engineering & DevOps
> Rheinwerkallee 2, 53227 Bonn
> Tel: +49 228 29974555 <+49%20228%2029974555>
>
> E-Mail erekle.magradze at recogizer.de
> Web: www.recogizer.com
>
> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/
> Folgen Sie uns auf Twitter https://twitter.com/recogizer
>
> -----------------------------------------------------------------
> Recogizer Group GmbH
> Gesch?ftsf?hrer: Oliver Habisch, Carsten Kreutze
> Handelsregister: Amtsgericht Bonn HRB 20724
> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>
> Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich
> erhalten haben,
> informieren Sie bitte sofort den Absender und l?schen Sie diese Mail.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der
> darin enthaltenen Informationen ist nicht gestattet.
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ovirt.org/pipermail/users/attachments/
> 20170807/1a5c2ac2/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
> End of Users Digest, Vol 71, Issue 37
> *************************************
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170808/d1c05275/attachment-0001.html>


More information about the Users mailing list