Re: [ovirt-users] Users Digest, Vol 71, Issue 37

older
Move VM from FC storage cluster...

Moacir Ferreira

7 Aug 2017 7 Aug '17

11:42 p.m.

--_000_DB6P190MB02803D87B263C1D3C3672993C8B50DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Fabrice, If you choose to have jumbo frames all over, then when the traffic goes out= side of your "jumbo frames" enabled network it will be necessary to be frag= mented back again to the destination MTU. Most of the datacenters will prov= ide services to the outside world where the MTU is 1500 bytes. In this case= , you will slow down your performance because your router will be doing the= fragmentation. So I would always use jumbo frames in the datacenter for ea= st/west traffic and standard (1500 bytes) for north/south traffic. Moacir ---------------------------------------------------------------------- Message: 1 Date: Mon, 7 Aug 2017 21:50:36 +0200 From: Fabrice Bacchella <fabrice.bacchella@orange.fr> To: FERNANDO FREDIANI <fernando.frediani@upx.com> Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices Message-ID: <4365E3F7-4C77-4FF5-8401-1CDA2F0029EE@orange.fr> Content-Type: text/plain; charset=3D"windows-1252"

...

...
Moacir: Yes! This is another reason to have separate networks for north/= south and east/west. In that way I can use the standard MTU on the 10Gb NIC= s and jumbo frames on the file/move 40Gb NICs.

Why not Jumbo frame every where ?

Attachments:

attachment.bin (multipart/alternative — 32.1 KB)

Show replies by date

Yaniv Kaul

8 Aug 8 Aug

8:35 a.m.

New subject: Users Digest, Vol 71, Issue 37

On Tue, Aug 8, 2017 at 12:42 AM, Moacir Ferreira <moacirferreira@hotmail.com

...

wrote:

...

Fabrice,

If you choose to have jumbo frames all over, then when the traffic goes outside of your "jumbo frames" enabled network it will be necessary to be fragmented back again to the destination MTU. Most of the datacenters will provide services to the outside world where the MTU is 1500 bytes. In this case, you will slow down your performance because your router will be doing the fragmentation. So I would always use jumbo frames in the datacenter for east/west traffic and standard (1500 bytes) for north/south traffic.

I doubt this would happen with modern TCP/IP stacks, for TCP connections. It'll adjust to the path most likely, using PMTUD. Of course, this does not always work (depends on HW en-route). UDP packets might fail miserably too (dropped), depending on the HW en-route, but UDP traffic (and specifically large packets) are not that common these days. Nevertheless, I don't see a huge advantage in enabling this for north-south traffic, TBH, and the mysterious, random traffic drop issues it may cause is not worth it. Y.

...

Moacir

----------------------------------------------------------------------

Message: 1 Date: Mon, 7 Aug 2017 21:50:36 +0200 From: Fabrice Bacchella <fabrice.bacchella@orange.fr> To: FERNANDO FREDIANI <fernando.frediani@upx.com> Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices Message-ID: <4365E3F7-4C77-4FF5-8401-1CDA2F0029EE@orange.fr> Content-Type: text/plain; charset="windows-1252"

...
...
Moacir: Yes! This is another reason to have separate networks for north/south and east/west. In that way I can use the standard MTU on the 10Gb NICs and jumbo frames on the file/move 40Gb NICs.

Why not Jumbo frame every where ? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ovirt.org/pipermail/users/attachments/ 20170807/4ba55f08/attachment-0001.html>

------------------------------

Message: 2 Date: Mon, 7 Aug 2017 16:52:40 -0300 From: FERNANDO FREDIANI <fernando.frediani@upx.com> To: Fabrice Bacchella <fabrice.bacchella@orange.fr> Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices Message-ID: <40d044ae-a41d-082e-131a-bf5fb5503513@upx.com> Content-Type: text/plain; charset="utf-8"; Format="flowed"

What you mentioned is a specific case and not a generic situation. The main point there is that RAID 5 or 6 impacts write performance compared when you write to only 2 given disks at a time. That was the comparison made.

Fernando

On 07/08/2017 16:49, Fabrice Bacchella wrote:

...
...
Le 7 ao?t 2017 ? 17:41, FERNANDO FREDIANI <fernando.frediani@upx.com <mailto:fernando.frediani@upx.com <fernando.frediani@upx.com>>> a

?crit :

...
...
...
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.

That's not true if you have medium to high range hardware raid. For example, HP Smart Array come with a flash cache of about 1 or 2 Gb that hides that from the OS.

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ovirt.org/pipermail/users/attachments/ 20170807/db3094e7/attachment-0001.html>

------------------------------

Message: 3 Date: Mon, 7 Aug 2017 22:05:19 +0200 From: Erekle Magradze <erekle.magradze@recogizer.de> To: FERNANDO FREDIANI <fernando.frediani@upx.com>, users@ovirt.org Subject: Re: [ovirt-users] Good practices Message-ID: <bac362c7-daba-918c-f728-13e1a74d6cc9@recogizer.de> Content-Type: text/plain; charset="utf-8"; Format="flowed"

Hi Franando,

So let's go with the following scenarios:

1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.

2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.

3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)

That's basically it

The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.

Cheers

Erekle

P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).

On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:

...
Thanks for the clarification Erekle.

However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.

An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).

Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.

Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?

Fernando

On 07/08/2017 10:26, Erekle Magradze wrote:

...
Hi Frenando,

Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.

Cheers

Erekle

On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:

...
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6

However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.

Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.

Thanks Regards Fernando

On 07/08/2017 03:46, Devin Acosta wrote:

...
Moacir,

I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.

In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.

Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD?s.? /

/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I?d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I?d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn?t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /

--

Devin Acosta, RHCA, RHVCA Red Hat Certified Architect

On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com

<moacirferreira@hotmail.com>>) wrote:

...
...
...
...
...
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).

This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:

1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?

2 - Instead, should I create a JBOD array made of all server's disks?

3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?

4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?

At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?

You opinion/feedback will be really appreciated!

Moacir

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org <Users@ovirt.org>> http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 <+49%20228%2029974555>

E-Mail erekle.magradze@recogizer.de Web: www.recogizer.com

Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitter https://twitter.com/recogizer

----------------------------------------------------------------- Recogizer Group GmbH Gesch?ftsf?hrer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993

Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und l?schen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ovirt.org/pipermail/users/attachments/ 20170807/1a5c2ac2/attachment.html>

------------------------------

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

End of Users Digest, Vol 71, Issue 37 *************************************

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Moacir Ferreira

11:49 a.m.

New subject: Users Digest, Vol 71, Issue 37

--_000_DB6P190MB0280AA323133DF5B38214555C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable This is by far more complex. A good NIC will have an offload engine (LSO - = Large Segment Offload) and, if so, the NIC driver will report a MTU of 64K = to the IP stack. The IP stack will then send data to the NIC as if the MTU = were 64K and the NIC will fragment it to the size of the "declared" MTU on = the interface so PMTUD will not be efficient in such scenario. If all this = takes place in the server, then you get no problem. But if a standard route= r is configured to support 9K jumbo frame in one interface (i.e.: LAN conne= ction) and 1500 in another (i.e.: WAN connection) then the router will be r= esponsible for the fragmentation. However, most of the routers out there ar= e not able to deal with this in high traffic demands. Splitting the very intensive east/west traffic like disk copies, VM moves, = etc. from the "service" traffic will not only prevent contention but also f= ix this problem with MTU. Moacir ________________________________ From: Yaniv Kaul <ykaul@redhat.com> Sent: Tuesday, August 8, 2017 7:35 AM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 On Tue, Aug 8, 2017 at 12:42 AM, Moacir Ferreira <moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>> wrote: Fabrice, If you choose to have jumbo frames all over, then when the traffic goes out= side of your "jumbo frames" enabled network it will be necessary to be frag= mented back again to the destination MTU. Most of the datacenters will prov= ide services to the outside world where the MTU is 1500 bytes. In this case= , you will slow down your performance because your router will be doing the= fragmentation. So I would always use jumbo frames in the datacenter for ea= st/west traffic and standard (1500 bytes) for north/south traffic. I doubt this would happen with modern TCP/IP stacks, for TCP connections. I= t'll adjust to the path most likely, using PMTUD. Of course, this does not = always work (depends on HW en-route). UDP packets might fail miserably too (dropped), depending on the HW en-rout= e, but UDP traffic (and specifically large packets) are not that common the= se days. Nevertheless, I don't see a huge advantage in enabling this for north-south= traffic, TBH, and the mysterious, random traffic drop issues it may cause = is not worth it. Y. Moacir ---------------------------------------------------------------------- Message: 1 Date: Mon, 7 Aug 2017 21:50:36 +0200 From: Fabrice Bacchella <fabrice.bacchella@orange.fr<mailto:fabrice.bacchel= la@orange.fr>> To: FERNANDO FREDIANI <fernando.frediani@upx.com<mailto:fernando.frediani@u= px.com>> Cc: users@ovirt.org<mailto:users@ovirt.org> Subject: Re: [ovirt-users] Good practices Message-ID: <4365E3F7-4C77-4FF5-8401-1CDA2F0029EE@orange.fr<mailto:4365E3F7= -4C77-4FF5-8401-1CDA2F0029EE@orange.fr>> Content-Type: text/plain; charset=3D"windows-1252"

...

...
Moacir: Yes! This is another reason to have separate networks for north/= south and east/west. In that way I can use the standard MTU on the 10Gb NIC= s and jumbo frames on the file/move 40Gb NICs.

Why not Jumbo frame every where ?

Fabrice Bacchella

1:23 p.m.

New subject: Users Digest, Vol 71, Issue 37

...

Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira = <moacirferreira@hotmail.com> a =E9crit : =20 This is by far more complex. A good NIC will have an offload engine = (LSO - Large Segment Offload) and, if so, the NIC driver will report a = MTU of 64K to the IP stack. The IP stack will then send data to the NIC = as if the MTU were 64K and the NIC will fragment it to the size of the = "declared" MTU on the interface so PMTUD will not be efficient in such = scenario. If all this takes place in the server, then you get no =

--Apple-Mail=_7EF4733A-7CCD-4EDB-83B5-111DEFD694DB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 problem. But if a standard router is configured to support 9K jumbo = frame in one interface (i.e.: LAN connection) and 1500 in another (i.e.: = WAN connection) then the router will be responsible for the = fragmentation. That's happen only if the bit don't fragment is not set, otherwise = router are not allowed to do that and send back a "packet to big" ICMP, = it's called path mtu discovery. To my knowledge, it's usually set, and = even mandatory on IPv6. --Apple-Mail=_7EF4733A-7CCD-4EDB-83B5-111DEFD694DB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira <<a = href=3D"mailto:moacirferreira@hotmail.com" = class=3D"">moacirferreira@hotmail.com</a>> a =E9crit :</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div = id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-stroke-width: 0px; font-size: 12pt; font-family: Calibri, = Helvetica, sans-serif;" class=3D""><div style=3D"margin-top: 0px; = margin-bottom: 0px;" class=3D"">This is by far more complex. A good NIC = will have an offload engine (LSO - Large Segment Offload) and, if so, = the NIC driver will report a MTU of 64K to the IP stack. The IP stack = will then send data to the NIC as if the MTU were 64K and the NIC will = fragment it to the size of the "declared" MTU on the interface so PMTUD = will not be efficient in such scenario. If all this takes place in the = server, then you get no problem. But if a standard router is configured = to support 9K jumbo frame in one interface (i.e.: LAN connection) and = 1500 in another (i.e.: WAN connection) then the router will be = responsible for the fragmentation.</div></div></div></blockquote><br = class=3D""></div><div>That's happen only if the bit don't fragment is = not set, otherwise router are not allowed to do that and send back a = "packet to big" ICMP, it's called path mtu discovery. To my knowledge, = it's usually set, and even mandatory on IPv6.</div><br = class=3D""></body></html>= --Apple-Mail=_7EF4733A-7CCD-4EDB-83B5-111DEFD694DB--

Moacir Ferreira

1:34 p.m.

New subject: Users Digest, Vol 71, Issue 37

...

> a =E9crit :</div> <br class=3D"Apple-interchange-newline"> <div class=3D""> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style= :normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-alig= n:start; text-indent:0px; text-transform:none; white-space:normal; widows:a= uto; word-spacing:0px; font-size:12pt; font-family:Calibri,Helvetica,sans-s= erif"> <div class=3D"" style=3D"margin-top:0px; margin-bottom:0px">This is by far = more complex. A good NIC will have an offload engine (LSO - Large Segment O= ffload) and, if so, the NIC driver will report a MTU of 64K to the IP stack= . The IP stack will then send data to

--_000_DB6P190MB02805E2A155CB87A27E94611C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable True! But in some point of the network it may be necessary to make the MTU = 1500. For example, if your data need to cross the Internet. The border rout= er in between your LAN and the Internet will have to fragment a large frame= back to a normal one to send it over the Internet. This router will just "= die" if you have a heavy load. Moacir ________________________________ From: Fabrice Bacchella <fabrice.bacchella@orange.fr> Sent: Tuesday, August 8, 2017 12:23 PM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira <moacirferreira@hotmail.com<mai= lto:moacirferreira@hotmail.com>> a =E9crit : This is by far more complex. A good NIC will have an offload engine (LSO - = Large Segment Offload) and, if so, the NIC driver will report a MTU of 64K = to the IP stack. The IP stack will then send data to the NIC as if the MTU = were 64K and the NIC will fragment it to the size of the "declared" MTU on = the interface so PMTUD will not be efficient in such scenario. If all this = takes place in the server, then you get no problem. But if a standard route= r is configured to support 9K jumbo frame in one interface (i.e.: LAN conne= ction) and 1500 in another (i.e.: WAN connection) then the router will be r= esponsible for the fragmentation. That's happen only if the bit don't fragment is not set, otherwise router a= re not allowed to do that and send back a "packet to big" ICMP, it's called= path mtu discovery. To my knowledge, it's usually set, and even mandatory = on IPv6. --_000_DB6P190MB02805E2A155CB87A27E94611C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>True! But in some point of the network it may be necessary to make the M= TU 1500. For example, if your data need to cross the Internet. The border r= outer in between your LAN and the Internet will have to fragment a large fr= ame back to a normal one to send it over the Internet. This router will just "die" if you have a = heavy load. <br> </p> <br> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Fabrice Bacchella &= lt;fabrice.bacchella@orange.fr><br> <b>Sent:</b> Tuesday, August 8, 2017 12:23 PM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div> </div> </div> <div><br class=3D""> <div> <blockquote type=3D"cite" class=3D""> <div class=3D"">Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira <<a href=3D"= mailto:moacirferreira@hotmail.com" class=3D"">moacirferreira@hotmail.com</a= the NIC as if the MTU were 64K and the NIC will fragment it to the size of= the "declared" MTU on the interface so PMTUD will not be efficie= nt in such scenario. If all this takes place in the server, then you get no= problem. But if a standard router is configured to support 9K jumbo frame in one interface (i.e.: LAN connection) and 1500= in another (i.e.: WAN connection) then the router will be responsible for = the fragmentation.</div> </div> </div> </blockquote> <br class=3D""> </div> <div>That's happen only if the bit don't fragment is not set, otherwise rou= ter are not allowed to do that and send back a "packet to big" IC= MP, it's called path mtu discovery. To my knowledge, it's usually set, and = even mandatory on IPv6.</div> <br class=3D""> </div> </div> </div> </body> </html> --_000_DB6P190MB02805E2A155CB87A27E94611C88A0DB6P190MB0280EURP_--

Fabrice Bacchella

2:37 p.m.

New subject: Users Digest, Vol 71, Issue 37

...

Le 8 ao=FBt 2017 =E0 13:34, Moacir Ferreira = <moacirferreira@hotmail.com> a =E9crit : =20 True! But in some point of the network it may be necessary to make the = MTU 1500. For example, if your data need to cross the Internet. The = border router in between your LAN and the Internet will have to fragment = a large frame back to a normal one to send it over the Internet. This = router will just "die" if you have a heavy load.=20 =20 Moacir =20 From: Fabrice Bacchella <fabrice.bacchella@orange.fr = <mailto:fabrice.bacchella@orange.fr>> Sent: Tuesday, August 8, 2017 12:23 PM To: Moacir Ferreira Cc: users@ovirt.org <mailto:users@ovirt.org> Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 =20 =20

...
Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira = <moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>> a =E9crit= : =20 This is by far more complex. A good NIC will have an offload engine = (LSO - Large Segment Offload) and, if so, the NIC driver will report a = MTU of 64K to the IP stack. The IP stack will then send data to the NIC = as if the MTU were 64K and the NIC will fragment it to the size of the = "declared" MTU on the interface so PMTUD will not be efficient in such = scenario. If all this takes place in the server, then you get no =

--Apple-Mail=_F6670B9A-B615-4233-98F3-E288BA07A1A0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 The border router will do like any other router on the world. If the DF = bit is set (common case) or if it's IPv6, it will not fragment but send = an ICMP. problem. But if a standard router is configured to support 9K jumbo = frame in one interface (i.e.: LAN connection) and 1500 in another (i.e.: = WAN connection) then the router will be responsible for the = fragmentation.

...

=20 That's happen only if the bit don't fragment is not set, otherwise = router are not allowed to do that and send back a "packet to big" ICMP, = it's called path mtu discovery. To my knowledge, it's usually set, and = even mandatory on IPv6.

--Apple-Mail=_F6670B9A-B615-4233-98F3-E288BA07A1A0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">The border router will do like any other router on the world. = If the DF bit is set (common case) or if it's IPv6, it will not fragment = but send an ICMP.<div class=3D""><br class=3D""><div><blockquote = type=3D"cite" class=3D""><div class=3D"">Le 8 ao=FBt 2017 =E0 13:34, = Moacir Ferreira <<a href=3D"mailto:moacirferreira@hotmail.com" = class=3D"">moacirferreira@hotmail.com</a>> a =E9crit :</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div = id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-stroke-width: 0px; font-size: 12pt; font-family: Calibri, = Helvetica, sans-serif;" class=3D""><div style=3D"margin-top: 0px; = margin-bottom: 0px;" class=3D"">True! But in some point of the network = it may be necessary to make the MTU 1500. For example, if your data need = to cross the Internet. The border router in between your LAN and the = Internet will have to fragment a large frame back to a normal one to = send it over the Internet. This router will just "die" if you have a = heavy load.<span class=3D"Apple-converted-space"> </span><br = class=3D""></div><br class=3D"">Moacir<br class=3D""><br class=3D""><div = style=3D"color: rgb(49, 55, 57);" class=3D""><hr tabindex=3D"-1" = style=3D"display: inline-block; width: 919.234375px;" class=3D""><div = id=3D"divRplyFwdMsg" dir=3D"ltr" class=3D""><font face=3D"Calibri, = sans-serif" style=3D"font-size: 11pt;" class=3D""><b = class=3D"">From:</b><span = class=3D"Apple-converted-space"> </span>Fabrice Bacchella <<a = href=3D"mailto:fabrice.bacchella@orange.fr" = class=3D"">fabrice.bacchella@orange.fr</a>><br class=3D""><b = class=3D"">Sent:</b><span = class=3D"Apple-converted-space"> </span>Tuesday, August 8, 2017 = 12:23 PM<br class=3D""><b class=3D"">To:</b><span = class=3D"Apple-converted-space"> </span>Moacir Ferreira<br = class=3D""><b class=3D"">Cc:</b><span = class=3D"Apple-converted-space"> </span><a = href=3D"mailto:users@ovirt.org" class=3D"">users@ovirt.org</a><br = class=3D""><b class=3D"">Subject:</b><span = class=3D"Apple-converted-space"> </span>Re: [ovirt-users] Users = Digest, Vol 71, Issue 37</font><div class=3D""> </div></div><div = class=3D""><br class=3D""><div class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D"">Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira = <<a href=3D"mailto:moacirferreira@hotmail.com" = class=3D"">moacirferreira@hotmail.com</a>> a =E9crit :</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div = id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style: = normal; font-weight: normal; letter-spacing: normal; orphans: auto; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; widows: auto; word-spacing: 0px; font-size: 12pt; font-family: = Calibri, Helvetica, sans-serif;"><div class=3D"" style=3D"margin-top: = 0px; margin-bottom: 0px;">This is by far more complex. A good NIC will = have an offload engine (LSO - Large Segment Offload) and, if so, the NIC = driver will report a MTU of 64K to the IP stack. The IP stack will then = send data to the NIC as if the MTU were 64K and the NIC will fragment it = to the size of the "declared" MTU on the interface so PMTUD will not be = efficient in such scenario. If all this takes place in the server, then = you get no problem. But if a standard router is configured to support 9K = jumbo frame in one interface (i.e.: LAN connection) and 1500 in another = (i.e.: WAN connection) then the router will be responsible for the = fragmentation.</div></div></div></blockquote><br class=3D""></div><div = class=3D"">That's happen only if the bit don't fragment is not set, = otherwise router are not allowed to do that and send back a "packet to = big" ICMP, it's called path mtu discovery. To my knowledge, it's usually = set, and even mandatory on = IPv6.</div></div></div></div></div></blockquote></div><br = class=3D""></div></body></html>= --Apple-Mail=_F6670B9A-B615-4233-98F3-E288BA07A1A0--

Moacir Ferreira

2:53 p.m.

New subject: Users Digest, Vol 71, Issue 37

...

> a =E9crit :</div> <br class=3D"Apple-interchange-newline"> <div class=3D""> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style= :normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-alig= n:start; text-indent:0px; text-transform:none; white-space:normal; widows:a= uto; word-spacing:0px; font-size:12pt; font-family:Calibri,Helvetica,sans-s= erif"> <div class=3D"" style=3D"margin-top:0px; margin-bottom:0px">True! But in so= me point of the network it may be necessary to make the MTU 1500. For examp= le, if your data need to cross the Internet. The border router in between y= our LAN and the Internet will have to fragment a large frame back to a normal one to send it over the Internet. = This router will just "die" if you have a heavy load.<span class= =3D"Apple-converted-space"> </span><br class=3D""> </div> <br class=3D""> Moacir<br class=3D""> <br class=3D""> <div class=3D"" style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" class=3D"" style=3D"display:inline-block; width:919.234= 375px"> <div id=3D"divRplyFwdMsg" dir=3D"ltr" class=3D""><font class=3D"" style=3D"= font-size:11pt" face=3D"Calibri, sans-serif"><b class=3D"">From:</b><span c= lass=3D"Apple-converted-space"> </span>Fabrice Bacchella <<a href= =3D"mailto:fabrice.bacchella@orange.fr" class=3D"">fabrice.bacchella@orange= .fr</a>><br class=3D""> <b class=3D"">Sent:</b><span class=3D"Apple-converted-space"> </span>T= uesday, August 8, 2017 12:23 PM<br class=3D""> <b class=3D"">To:</b><span class=3D"Apple-converted-space"> </span>Moa= cir Ferreira<br class=3D""> <b class=3D"">Cc:</b><span class=3D"Apple-converted-space"> </span><a =

...

> a =E9crit :</div> <br class=3D"Apple-interchange-newline"> <div class=3D""> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style= :normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-alig= n:start; text-indent:0px; text-transform:none; white-space:normal; widows:a= uto; word-spacing:0px; font-size:12pt; font-family:Calibri,Helvetica,sans-s= erif"> <div class=3D"" style=3D"margin-top:0px; margin-bottom:0px">This is by far = more complex. A good NIC will have an offload engine (LSO - Large Segment O= ffload) and, if so, the NIC driver will report a MTU of 64K to the IP stack= . The IP stack will then send data to

--_000_DB6P190MB0280166D013A47FAE38E1AA8C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable But if you receive a 9000 MTU frame on an "input" interface that results se= nding it out on an interface of a 1500 MTU, then if you set DF bit the fram= e will just be dropped by the router. If you want your data to "cross" your= frame over a different MTU path, then you can not set DF to 1. This is a q= uite simple and easy thing to demonstrate. Just create a simple virtual lab= with 3 Linux doing routing and test it. So, if your goal is to communicate= over paths that may have a MTU lower than 9000 you better make sure your s= erver sends out a frame that the path can support. Moacir ________________________________ From: Fabrice Bacchella <fabrice.bacchella@orange.fr> Sent: Tuesday, August 8, 2017 1:37 PM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 The border router will do like any other router on the world. If the DF bit= is set (common case) or if it's IPv6, it will not fragment but send an ICM= P. Le 8 ao=FBt 2017 =E0 13:34, Moacir Ferreira <moacirferreira@hotmail.com<mai= lto:moacirferreira@hotmail.com>> a =E9crit : True! But in some point of the network it may be necessary to make the MTU = 1500. For example, if your data need to cross the Internet. The border rout= er in between your LAN and the Internet will have to fragment a large frame= back to a normal one to send it over the Internet. This router will just "= die" if you have a heavy load. Moacir ________________________________ From: Fabrice Bacchella <fabrice.bacchella@orange.fr<mailto:fabrice.bacchel= la@orange.fr>> Sent: Tuesday, August 8, 2017 12:23 PM To: Moacir Ferreira Cc: users@ovirt.org<mailto:users@ovirt.org> Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira <moacirferreira@hotmail.com<mai= lto:moacirferreira@hotmail.com>> a =E9crit : This is by far more complex. A good NIC will have an offload engine (LSO - = Large Segment Offload) and, if so, the NIC driver will report a MTU of 64K = to the IP stack. The IP stack will then send data to the NIC as if the MTU = were 64K and the NIC will fragment it to the size of the "declared" MTU on = the interface so PMTUD will not be efficient in such scenario. If all this = takes place in the server, then you get no problem. But if a standard route= r is configured to support 9K jumbo frame in one interface (i.e.: LAN conne= ction) and 1500 in another (i.e.: WAN connection) then the router will be r= esponsible for the fragmentation. That's happen only if the bit don't fragment is not set, otherwise router a= re not allowed to do that and send back a "packet to big" ICMP, it's called= path mtu discovery. To my knowledge, it's usually set, and even mandatory = on IPv6. --_000_DB6P190MB0280166D013A47FAE38E1AA8C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>But if you receive a 9000 MTU frame on an "input" interface th= at results sending it out on an interface of a 1500 MTU, then if you s= et DF bit the frame will just be dropped by the router. If you want your da= ta to "cross" your frame over a different MTU path, then you can not set DF to 1. This is a quite simple and easy thing to dem= onstrate. Just create a simple virtual lab with 3 Linux doing routing and t= est it. So, if your goal is to communicate over paths that may have a MTU l= ower than 9000 you better make sure your server sends out a frame that the path can support.<br> </p> <br> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Fabrice Bacchella &= lt;fabrice.bacchella@orange.fr><br> <b>Sent:</b> Tuesday, August 8, 2017 1:37 PM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div> </div> </div> <div>The border router will do like any other router on the world. If the D= F bit is set (common case) or if it's IPv6, it will not fragment but send a= n ICMP. <div class=3D""><br class=3D""> <div> <blockquote type=3D"cite" class=3D""> <div class=3D"">Le 8 ao=FBt 2017 =E0 13:34, Moacir Ferreira <<a href=3D"= mailto:moacirferreira@hotmail.com" class=3D"">moacirferreira@hotmail.com</a= href=3D"mailto:users@ovirt.org" class=3D"">users@ovirt.org</a><br class=3D"= "> <b class=3D"">Subject:</b><span class=3D"Apple-converted-space"> </spa= n>Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div class=3D""> </div> </div> <div class=3D""><br class=3D""> <div class=3D""> <blockquote type=3D"cite" class=3D""> <div class=3D"">Le 8 ao=FBt 2017 =E0 11:49, Moacir Ferreira <<a href=3D"= mailto:moacirferreira@hotmail.com" class=3D"">moacirferreira@hotmail.com</a= the NIC as if the MTU were 64K and the NIC will fragment it to the size of= the "declared" MTU on the interface so PMTUD will not be efficie= nt in such scenario. If all this takes place in the server, then you get no= problem. But if a standard router is configured to support 9K jumbo frame in one interface (i.e.: LAN connection) and 1500= in another (i.e.: WAN connection) then the router will be responsible for = the fragmentation.</div> </div> </div> </blockquote> <br class=3D""> </div> <div class=3D"">That's happen only if the bit don't fragment is not set, ot= herwise router are not allowed to do that and send back a "packet to b= ig" ICMP, it's called path mtu discovery. To my knowledge, it's usuall= y set, and even mandatory on IPv6.</div> </div> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </div> </div> </div> </body> </html> --_000_DB6P190MB0280166D013A47FAE38E1AA8C88A0DB6P190MB0280EURP_--

Fabrice Bacchella

3:50 p.m.

New subject: Users Digest, Vol 71, Issue 37

--Apple-Mail=_7C933544-4635-4400-B1DC-F80968D9C397 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1

...

Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira = <moacirferreira@hotmail.com> a =E9crit : =20 But if you receive a 9000 MTU frame on an "input" interface that = results sending it out on an interface of a 1500 MTU, then if you set DF = bit the frame will just be dropped by the router.

The frame will be dropped and the router will send an ICMP message = "packet to big" to the sender, it's network stack will received that, = learn that the PMTU is lower and try with smaller fragment, see = https://en.wikipedia.org/wiki/Path_MTU_Discovery. --Apple-Mail=_7C933544-4635-4400-B1DC-F80968D9C397 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira <<a = href=3D"mailto:moacirferreira@hotmail.com" = class=3D"">moacirferreira@hotmail.com</a>> a =E9crit :</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div = id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-stroke-width: 0px; font-size: 12pt; font-family: Calibri, = Helvetica, sans-serif;" class=3D""><div style=3D"margin-top: 0px; = margin-bottom: 0px;" class=3D"">But if you receive a 9000 MTU frame on = an "input" interface that results sending it out on an interface of = a 1500 MTU, then if you set DF bit the frame will just be dropped by the = router.</div></div></div></blockquote><br class=3D""></div>The frame = will be dropped and the router will send an ICMP message "packet to big" = to the sender, it's network stack will received that, learn that the = PMTU is lower and try with smaller fragment, see <a = href=3D"https://en.wikipedia.org/wiki/Path_MTU_Discovery" = class=3D"">https://en.wikipedia.org/wiki/Path_MTU_Discovery</a>.<div = class=3D""><br class=3D""></div></body></html>= --Apple-Mail=_7C933544-4635-4400-B1DC-F80968D9C397--

Yaniv Kaul

4:15 p.m.

New subject: Users Digest, Vol 71, Issue 37

On Tue, Aug 8, 2017 at 4:50 PM, Fabrice Bacchella < fabrice.bacchella@orange.fr> wrote:

...

Le 8 août 2017 à 14:53, Moacir Ferreira <moacirferreira@hotmail.com> a écrit :

But if you receive a 9000 MTU frame on an "input" interface that results sending it out on an interface of a 1500 MTU, then if you set DF bit the frame will just be dropped by the router.

The frame will be dropped and the router will send an ICMP message "packet to big" to the sender, it's network stack will received that, learn that the PMTU is lower and try with smaller fragment, see https://en.wikipedia.org/wiki/Path_MTU_Discovery.

If they are allowed to cross and are not dropped by a firewall in the middle... But I believe we are a bit far from the original discussion. Y.

...

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Moacir Ferreira

4:16 p.m.

New subject: Users Digest, Vol 71, Issue 37

--_000_DB6P190MB02801B5C2E03D2E5CA6AB7B3C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Exactly Fabrice! In this case the router will fragment the "bigger" MTU to = fit the "smaller" MTU but only when the DF is not set. However, fragmentati= on on routers are made by the control plane, meaning you will overload the = router CPU doing too much fragmentation. On a good NIC the announced MTU to= the IP stack is very big (like 64Kb) because the off-load engine will frag= ment this very large MTU and send it. But on this kind of NIC the fragmenta= tion is done by dedicated AISCs that does not require any CPU intervention = to do it. Just give it a try... Assemble a lab using Linux and you will see= what I am trying to explain. Moacir ________________________________ From: Fabrice Bacchella <fabrice.bacchella@orange.fr> Sent: Tuesday, August 8, 2017 2:50 PM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira <moacirferreira@hotmail.com<mai= lto:moacirferreira@hotmail.com>> a =E9crit : But if you receive a 9000 MTU frame on an "input" interface that results se= nding it out on an interface of a 1500 MTU, then if you set DF bit the fram= e will just be dropped by the router. The frame will be dropped and the router will send an ICMP message "packet = to big" to the sender, it's network stack will received that, learn that th= e PMTU is lower and try with smaller fragment, see https://en.wikipedia.org= /wiki/Path_MTU_Discovery. --_000_DB6P190MB02801B5C2E03D2E5CA6AB7B3C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Exactly Fabrice! In this case the router will fragment the "bigger&= quot; MTU to fit the "smaller" MTU but only when the DF is not se= t. However, fragmentation on routers are made by the control plane, meaning= you will overload the router CPU doing too much fragmentation. On a good NIC the announced MTU to the IP stack is very big (like 64Kb) be= cause the off-load engine will fragment this very large MTU and send it. Bu= t on this kind of NIC the fragmentation is done by dedicated AISCs that doe= s not require any CPU intervention to do it. Just give it a try... Assemble a lab using Linux and you will se= e what I am trying to explain.<br> </p> <br> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Fabrice Bacchella &= lt;fabrice.bacchella@orange.fr><br> <b>Sent:</b> Tuesday, August 8, 2017 2:50 PM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div> </div> </div> <div><br class=3D""> <div> <blockquote type=3D"cite" class=3D""> <div class=3D"">Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira <<a href=3D"= mailto:moacirferreira@hotmail.com" class=3D"">moacirferreira@hotmail.com</a=

...

> a =E9crit :</div> <br class=3D"Apple-interchange-newline"> <div class=3D""> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style= :normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-alig= n:start; text-indent:0px; text-transform:none; white-space:normal; widows:a= uto; word-spacing:0px; font-size:12pt; font-family:Calibri,Helvetica,sans-s= erif"> <div class=3D"" style=3D"margin-top:0px; margin-bottom:0px">But if you rece= ive a 9000 MTU frame on an "input" interface that results sending= it out on an interface of a 1500 MTU, then if you set DF bit the fram= e will just be dropped by the router.</div> </div> </div> </blockquote> <br class=3D""> </div> The frame will be dropped and the router will send an ICMP message "pa= cket to big" to the sender, it's network stack will received that, lea= rn that the PMTU is lower and try with smaller fragment, see <a href= =3D"https://en.wikipedia.org/wiki/Path_MTU_Discovery" class=3D"">https://en= .wikipedia.org/wiki/Path_MTU_Discovery</a>. <div class=3D""><br class=3D""> </div> </div> </div> </div> </body> </html>

--_000_DB6P190MB02801B5C2E03D2E5CA6AB7B3C88A0DB6P190MB0280EURP_--

Moacir Ferreira

4:24 p.m.

New subject: Users Digest, Vol 71, Issue 37

--_000_DB6P190MB0280AE2969AE42A441462D47C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sorry... I guess our discussion here is in line with the "good practices" d= iscussion. For a long time I see a lot of mentions on having a front-end an= d a back-end network when dealing with distributed file systems like Gluste= r and Ceph. What I would like to here from those who already implemented oV= irt on the field is "what is the real life" approach for good performance a= s large file/memory transfers will strongly benefit from having a big MTU. = However, big MTU must be driven correctly otherwise you may end-up having t= he problem we are discussing. Moacir ________________________________ From: Moacir Ferreira <moacirferreira@hotmail.com> Sent: Tuesday, August 8, 2017 3:16 PM To: Fabrice Bacchella Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 Exactly Fabrice! In this case the router will fragment the "bigger" MTU to = fit the "smaller" MTU but only when the DF is not set. However, fragmentati= on on routers are made by the control plane, meaning you will overload the = router CPU doing too much fragmentation. On a good NIC the announced MTU to= the IP stack is very big (like 64Kb) because the off-load engine will frag= ment this very large MTU and send it. But on this kind of NIC the fragmenta= tion is done by dedicated AISCs that does not require any CPU intervention = to do it. Just give it a try... Assemble a lab using Linux and you will see= what I am trying to explain. Moacir ________________________________ From: Fabrice Bacchella <fabrice.bacchella@orange.fr> Sent: Tuesday, August 8, 2017 2:50 PM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Users Digest, Vol 71, Issue 37 Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira <moacirferreira@hotmail.com<mai= lto:moacirferreira@hotmail.com>> a =E9crit : But if you receive a 9000 MTU frame on an "input" interface that results se= nding it out on an interface of a 1500 MTU, then if you set DF bit the fram= e will just be dropped by the router. The frame will be dropped and the router will send an ICMP message "packet = to big" to the sender, it's network stack will received that, learn that th= e PMTU is lower and try with smaller fragment, see https://en.wikipedia.org= /wiki/Path_MTU_Discovery. --_000_DB6P190MB0280AE2969AE42A441462D47C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Sorry... I guess our discussion here is in line with the "good prac= tices" discussion. For a long time I see a lot of mentions on having a= front-end and a back-end network when dealing with distributed file system= s like Gluster and Ceph. What I would like to here from those who already implemented oVirt on the field is "wha= t is the real life" approach for good performance as large file/memory= transfers will strongly benefit from having a big MTU. However, big MTU mu= st be driven correctly otherwise you may end-up having the problem we are discussing.<br> </p> <br> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Moacir Ferreira <= ;moacirferreira@hotmail.com><br> <b>Sent:</b> Tuesday, August 8, 2017 3:16 PM<br> <b>To:</b> Fabrice Bacchella<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div> </div> </div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Exactly Fabrice! In this case the router will fragment the "bigger&= quot; MTU to fit the "smaller" MTU but only when the DF is not se= t. However, fragmentation on routers are made by the control plane, meaning= you will overload the router CPU doing too much fragmentation. On a good NIC the announced MTU to the IP stack is very big (like 64Kb) be= cause the off-load engine will fragment this very large MTU and send it. Bu= t on this kind of NIC the fragmentation is done by dedicated AISCs that doe= s not require any CPU intervention to do it. Just give it a try... Assemble a lab using Linux and you will se= e what I am trying to explain.<br> </p> <br> Moacir<br> <br> <div style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Fabrice Bacchella &= lt;fabrice.bacchella@orange.fr><br> <b>Sent:</b> Tuesday, August 8, 2017 2:50 PM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Users Digest, Vol 71, Issue 37</font> <div> </div> </div> <div><br class=3D""> <div> <blockquote type=3D"cite" class=3D""> <div class=3D"">Le 8 ao=FBt 2017 =E0 14:53, Moacir Ferreira <<a href=3D"= mailto:moacirferreira@hotmail.com" class=3D"">moacirferreira@hotmail.com</a=

...

> a =E9crit :</div> <br class=3D"Apple-interchange-newline"> <div class=3D""> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" class=3D"" style=3D"font-style= :normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-alig= n:start; text-indent:0px; text-transform:none; white-space:normal; widows:a= uto; word-spacing:0px; font-size:12pt; font-family:Calibri,Helvetica,sans-s= erif"> <div class=3D"" style=3D"margin-top:0px; margin-bottom:0px">But if you rece= ive a 9000 MTU frame on an "input" interface that results sending= it out on an interface of a 1500 MTU, then if you set DF bit the fram= e will just be dropped by the router.</div> </div> </div> </blockquote> <br class=3D""> </div> The frame will be dropped and the router will send an ICMP message "pa= cket to big" to the sender, it's network stack will received that, lea= rn that the PMTU is lower and try with smaller fragment, see <a href= =3D"https://en.wikipedia.org/wiki/Path_MTU_Discovery" class=3D"">https://en= .wikipedia.org/wiki/Path_MTU_Discovery</a>. <div class=3D""><br class=3D""> </div> </div> </div> </div> </div> </div> </div> </body> </html>

--_000_DB6P190MB0280AE2969AE42A441462D47C88A0DB6P190MB0280EURP_--

3036

Age (days ago)

3037

Last active (days ago)

List overview

Download

10 comments

3 participants

participants (3)

Fabrice Bacchella
Moacir Ferreira
Yaniv Kaul

Re: [ovirt-users] Users Digest, Vol 71, Issue 37

tags

participants (3)