
--_000_DB6P190MB0280A69BAE2A377274B72375C8B40DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir --_000_DB6P190MB0280A69BAE2A377274B72375C8B40DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> </body> </html> --_000_DB6P190MB0280A69BAE2A377274B72375C8B40DB6P190MB0280EURP_--

1) RAID5 may be a performance hit 2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done) 3) I think the above 4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache. Where are you putting the OS? Hope I understood the question... Thanks On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <moacirferreira@hotmail.com
wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--_000_VI1P190MB02858CCC4D7DCD6A86090CC9C8B50VI1P190MB0285EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Colin, I am in Portugal, so sorry for this late response. It is quite confusing fo= r me, please consider: 1 - What if the RAID is done by the server's disk controller, not by softwa= re? 2 - For JBOD I am just using gdeploy to deploy it. However, I am not using = the oVirt node GUI to do this. 3 - As the VM .qcow2 files are quite big, tiering would only help if made b= y an intelligent system that uses SSD for chunks of data not for the entire= .qcow2 file. But I guess this is a problem everybody else has. So, Do you = know how tiering works in Gluster? 4 - I am putting the OS on the first disk. However, would you do differentl= y? Moacir ________________________________ From: Colin Coe <colin.coe@gmail.com> Sent: Monday, August 7, 2017 4:48 AM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices 1) RAID5 may be a performance hit- 2) I'd be inclined to do this as JBOD by creating a distributed disperse vo= lume on each server. Something like echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERV= ER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done) 3) I think the above. 4) Gluster does support tiering, but IIRC you'd need the same number of SSD= as spindle drives. There may be another way to use the SSD as a fast cach= e. Where are you putting the OS? Hope I understood the question... Thanks On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>> wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_VI1P190MB02858CCC4D7DCD6A86090CC9C8B50VI1P190MB0285EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Hi Colin,</p> <p><br> </p> <p><span>I am in Portugal</span>, so sorry for this late response. It is qu= ite confusing for me, please consider:</p> <p></p> <div><b><br> </b>1<b> - </b>What if the RAID is done by the server's disk controller, no= t by software?</div> <br> <p></p> <p>2 -<b> </b>For JBOD I am just using gdeploy to deploy it. However, I am = not using the oVirt node GUI to do this.</p> <p><br> </p> <p>3 -<b> </b>As the VM .qcow2 files are quite big, tiering would only= help if made by an intelligent system that uses SSD for chunks of data not= for the entire .qcow2 file. But I guess this is a problem everybody else h= as. So, Do you know how tiering works in Gluster?<br> </p> <p><br> </p> <p>4 - I am putting the OS on the first disk. However, would you do di= fferently?<br> </p> <p><br> </p> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Colin Coe <colin= .coe@gmail.com><br> <b>Sent:</b> Monday, August 7, 2017 4:48 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr">1) RAID5 may be a performance hit- <br> <div><br> </div> <div>2) I'd be inclined to do this as JBOD by creating a distributed disper= se volume on each server. Something like <div><br> </div> <div>echo gluster volume create dispersevol disperse-data 5 redundancy 2 \<= /div> <div>$(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "se= rver${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done)</d= iv> <div><br> </div> <div>3) I think the above. <b></b></div> <div><br> </div> <div>4) Gluster does support tiering, but IIRC you'd need the same number o= f SSD as spindle drives. There may be another way to use the SSD as a= fast cache. </div> <div><br> </div> <div>Where are you putting the OS?</div> <div><br> </div> <div>Hope I understood the question...</div> <div><br> </div> <div>Thanks</div> </div> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira= <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_2460985691746498322divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class=3D"HOEnZb"><font color=3D"#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </div> </div> </div> </body> </html> --_000_VI1P190MB02858CCC4D7DCD6A86090CC9C8B50VI1P190MB0285EURP_--

Hi I just thought that you'd do hardware RAID if you had the controller or JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely to have a hardware RAID controller. I've never done JBOD with hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each on a single disk, but thats not based on anything other than my prejudices. I thought gluster tiering was for the most frequently accessed files, in which case all the VMs disks would end up in the hot tier. However, I have been wrong before... I just wanted to know where the OS was going as I didn't see it mentioned in the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lot of wasted disk. Honestly, I think Yaniv's answer was far better than my own and made the important point about having an arbiter. Thanks On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira <moacirferreira@hotmail.com> wrote:
Hi Colin,
I am in Portugal, so sorry for this late response. It is quite confusing for me, please consider:
1* - *What if the RAID is done by the server's disk controller, not by software?
2 - For JBOD I am just using gdeploy to deploy it. However, I am not using the oVirt node GUI to do this.
3 - As the VM .qcow2 files are quite big, tiering would only help if made by an intelligent system that uses SSD for chunks of data not for the entire .qcow2 file. But I guess this is a problem everybody else has. So, Do you know how tiering works in Gluster?
4 - I am putting the OS on the first disk. However, would you do differently?
Moacir
------------------------------ *From:* Colin Coe <colin.coe@gmail.com> *Sent:* Monday, August 7, 2017 4:48 AM *To:* Moacir Ferreira *Cc:* users@ovirt.org *Subject:* Re: [ovirt-users] Good practices
1) RAID5 may be a performance hit-
2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like
echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done)
3) I think the above.
4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache.
Where are you putting the OS?
Hope I understood the question...
Thanks
On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira < moacirferreira@hotmail.com> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--_000_VI1P190MB02854F23232C41E2AD3C7223C8B50VI1P190MB0285EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Colin, Take a look on Devin's response. Also, read the doc he shared that gives so= me hints on how to deploy Gluster. It is more like that if you want high-performance you should have the brick= s created as RAID (5 or 6) by the server's disk controller and them assembl= e a JBOD GlusterFS. The attached document is Gluster specific and not for o= Virt. But at this point I think that having SSD will not be a plus as using= the RAID controller Gluster will not be aware of the SSD. Regarding the OS= , my idea is to have a RAID 1, made of 2 low cost HDDs, to install it. So far, based on the information received I should create a single RAID 5 o= r 6 on each server and then use this disk as a brick to create my Gluster c= luster, made of 2 replicas + 1 arbiter. What is new for me is the detail th= at the arbiter does not need a lot of space as it only keeps meta data. Thanks for your response! Moacir ________________________________ From: Colin Coe <colin.coe@gmail.com> Sent: Monday, August 7, 2017 12:41 PM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices Hi I just thought that you'd do hardware RAID if you had the controller or JBO= D if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely = to have a hardware RAID controller. I've never done JBOD with hardware RAI= D. I think having a single gluster brick on hardware JBOD would be riskier= than multiple bricks, each on a single disk, but thats not based on anythi= ng other than my prejudices. I thought gluster tiering was for the most frequently accessed files, in wh= ich case all the VMs disks would end up in the hot tier. However, I have b= een wrong before... I just wanted to know where the OS was going as I didn't see it mentioned i= n the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lo= t of wasted disk. Honestly, I think Yaniv's answer was far better than my own and made the im= portant point about having an arbiter. Thanks On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira <moacirferreira@hotmail.com= <mailto:moacirferreira@hotmail.com>> wrote: Hi Colin, I am in Portugal, so sorry for this late response. It is quite confusing fo= r me, please consider: 1 - What if the RAID is done by the server's disk controller, not by softwa= re? 2 - For JBOD I am just using gdeploy to deploy it. However, I am not using = the oVirt node GUI to do this. 3 - As the VM .qcow2 files are quite big, tiering would only help if made b= y an intelligent system that uses SSD for chunks of data not for the entire= .qcow2 file. But I guess this is a problem everybody else has. So, Do you = know how tiering works in Gluster? 4 - I am putting the OS on the first disk. However, would you do differentl= y? Moacir ________________________________ From: Colin Coe <colin.coe@gmail.com<mailto:colin.coe@gmail.com>> Sent: Monday, August 7, 2017 4:48 AM To: Moacir Ferreira Cc: users@ovirt.org<mailto:users@ovirt.org> Subject: Re: [ovirt-users] Good practices 1) RAID5 may be a performance hit- 2) I'd be inclined to do this as JBOD by creating a distributed disperse vo= lume on each server. Something like echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERV= ER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done) 3) I think the above. 4) Gluster does support tiering, but IIRC you'd need the same number of SSD= as spindle drives. There may be another way to use the SSD as a fast cach= e. Where are you putting the OS? Hope I understood the question... Thanks On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>> wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_VI1P190MB02854F23232C41E2AD3C7223C8B50VI1P190MB0285EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Hi Colin,</p> <p><br> </p> <p>Take a look on Devin's response. Also, read the doc he shared that gives= some hints on how to deploy Gluster.</p> <p><br> </p> <p>It is more like that if you want high-performance you should have the br= icks created as RAID (5 or 6) by the server's disk controller and them= assemble a JBOD GlusterFS. The attached document is Gluster specific and n= ot for oVirt. But at this point I think that having SSD will not be a plus as using the RAID controller Gluster wi= ll not be aware of the SSD. Regarding the OS, my idea is to have a RAID 1, = made of 2 low cost HDDs, to install it.</p> <p><br> </p> <p>So far, based on the information received I should create a si= ngle RAID 5 or 6 on each server and then use this disk as a brick to create= my Gluster cluster, made of 2 replicas + 1 arbiter. What is new for me= is the detail that the arbiter does not need a lot of space as it only keeps meta data.</p> <p><br> </p> <p>Thanks for your response!<br> </p> Moacir<br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Colin Coe <colin= .coe@gmail.com><br> <b>Sent:</b> Monday, August 7, 2017 12:41 PM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr">Hi <div><br> </div> <div>I just thought that you'd do hardware RAID if you had the controller o= r JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pret= ty likely to have a hardware RAID controller. I've never done JBOD wi= th hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each= on a single disk, but thats not based on anything other than my prejudices= .</div> <div><br> </div> <div>I thought gluster tiering was for the most frequently accessed files, = in which case all the VMs disks would end up in the hot tier. However= , I have been wrong before...</div> <div><br> </div> <div>I just wanted to know where the OS was going as I didn't see it mentio= ned in the OP. Normally, I'd have the OS on a RAID1 but in your case = thats a lot of wasted disk.</div> <div><br> </div> <div>Honestly, I think Yaniv's answer was far better than my own and made t= he important point about having an arbiter. </div> <div><br> </div> <div>Thanks</div> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira = <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_4831886883305672718divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Colin,</p> <p><br> </p> <p><span>I am in Portugal</span>, so sorry for this late response. It is qu= ite confusing for me, please consider:</p> <p></p> <div><b><br> </b>1<b> - </b>What if the RAID is done by the server's disk controller, no= t by software?</div> <br> <p></p> <p>2 -<b> </b>For JBOD I am just using gdeploy to deploy it. However, I am = not using the oVirt node GUI to do this.</p> <p><br> </p> <p>3 -<b> </b>As the VM .qcow2 files are quite big, tiering would only= help if made by an intelligent system that uses SSD for chunks of data not= for the entire .qcow2 file. But I guess this is a problem everybody else h= as. So, Do you know how tiering works in Gluster?<br> </p> <p><br> </p> <p>4 - I am putting the OS on the first disk. However, would you do di= fferently?<br> </p> <p><br> </p> Moacir<br> <br> <div style=3D"color:rgb(49,55,57)"> <hr style=3D"display:inline-block; width:98%"> <div id=3D"m_4831886883305672718divRplyFwdMsg" dir=3D"ltr"><font style=3D"f= ont-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> = Colin Coe <<a href=3D"mailto:colin.coe@gmail.com" target=3D"_blank">coli= n.coe@gmail.com</a>><br> <b>Sent:</b> Monday, August 7, 2017 4:48 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a href=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt= .org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr">1) RAID5 may be a performance hit- <br> <div><br> </div> <div><span class=3D"">2) I'd be inclined to do this as JBOD by creating a d= istributed disperse volume on each server. Something like <div><br> </div> <div>echo gluster volume create dispersevol disperse-data 5 redundancy 2 \<= /div> <div>$(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "se= rver${SERVER}:/brick/brick-<wbr>${SERVER}${BRICK}/brick \c"; done; don= e)</div> <div><br> </div> </span> <div>3) I think the above. <b></b></div> <span class=3D""> <div><br> </div> <div>4) Gluster does support tiering, but IIRC you'd need the same number o= f SSD as spindle drives. There may be another way to use the SSD as a= fast cache. </div> <div><br> </div> <div>Where are you putting the OS?</div> <div><br> </div> <div>Hope I understood the question...</div> <div><br> </div> <div>Thanks</div> </span></div> </div> <span class=3D""> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira= <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_4831886883305672718m_2460985691746498322divtagdefaultwrapper" = dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Hel= vetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class=3D"m_4831886883305672718HOEnZb"><font color=3D"#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><br=
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </span></div> </div> </div> </div> </blockquote> </div> <br> </div> </div> </div> </div> </body> </html> --_000_VI1P190MB02854F23232C41E2AD3C7223C8B50VI1P190MB0285EURP_--

Moacir, I understand that if you do this type of configuration you will be severely impacted on storage performance, specially for writes. Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have. Fernando 2017-08-07 10:03 GMT-03:00 Moacir Ferreira <moacirferreira@hotmail.com>:
Hi Colin,
Take a look on Devin's response. Also, read the doc he shared that gives some hints on how to deploy Gluster.
It is more like that if you want high-performance you should have the bricks created as RAID (5 or 6) by the server's disk controller and them assemble a JBOD GlusterFS. The attached document is Gluster specific and not for oVirt. But at this point I think that having SSD will not be a plus as using the RAID controller Gluster will not be aware of the SSD. Regarding the OS, my idea is to have a RAID 1, made of 2 low cost HDDs, to install it.
So far, based on the information received I should create a single RAID 5 or 6 on each server and then use this disk as a brick to create my Gluster cluster, made of 2 replicas + 1 arbiter. What is new for me is the detail that the arbiter does not need a lot of space as it only keeps meta data.
Thanks for your response! Moacir
------------------------------ *From:* Colin Coe <colin.coe@gmail.com> *Sent:* Monday, August 7, 2017 12:41 PM
*To:* Moacir Ferreira *Cc:* users@ovirt.org *Subject:* Re: [ovirt-users] Good practices
Hi
I just thought that you'd do hardware RAID if you had the controller or JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely to have a hardware RAID controller. I've never done JBOD with hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each on a single disk, but thats not based on anything other than my prejudices.
I thought gluster tiering was for the most frequently accessed files, in which case all the VMs disks would end up in the hot tier. However, I have been wrong before...
I just wanted to know where the OS was going as I didn't see it mentioned in the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lot of wasted disk.
Honestly, I think Yaniv's answer was far better than my own and made the important point about having an arbiter.
Thanks
On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira < moacirferreira@hotmail.com> wrote:
Hi Colin,
I am in Portugal, so sorry for this late response. It is quite confusing for me, please consider:
1* - *What if the RAID is done by the server's disk controller, not by software?
2 - For JBOD I am just using gdeploy to deploy it. However, I am not using the oVirt node GUI to do this.
3 - As the VM .qcow2 files are quite big, tiering would only help if made by an intelligent system that uses SSD for chunks of data not for the entire .qcow2 file. But I guess this is a problem everybody else has. So, Do you know how tiering works in Gluster?
4 - I am putting the OS on the first disk. However, would you do differently?
Moacir
------------------------------ *From:* Colin Coe <colin.coe@gmail.com> *Sent:* Monday, August 7, 2017 4:48 AM *To:* Moacir Ferreira *Cc:* users@ovirt.org *Subject:* Re: [ovirt-users] Good practices
1) RAID5 may be a performance hit-
2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like
echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done)
3) I think the above.
4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache.
Where are you putting the OS?
Hope I understood the question...
Thanks
On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira < moacirferreira@hotmail.com> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Le 8 août 2017 à 04:08, FERNANDO FREDIANI <fernando.frediani@upx.com> a écrit :
Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.
Nope again,from my experience with HP Smart Array and write back cache, write, that goes in the cache, are even faster that read that must goes to the disks. of course if the write are too fast and to big, they will over overflow the cache. But on todays controller they are multi-gigabyte cache, you must write a lot to fill them. And if you can afford 40Gb card, you can afford decent controller.

On Tue, Aug 8, 2017 at 9:16 AM, Fabrice Bacchella < fabrice.bacchella@orange.fr> wrote:
Le 8 août 2017 à 04:08, FERNANDO FREDIANI <fernando.frediani@upx.com> a écrit :
Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.
Nope again,from my experience with HP Smart Array and write back cache, write, that goes in the cache, are even faster that read that must goes to the disks. of course if the write are too fast and to big, they will over overflow the cache. But on todays controller they are multi-gigabyte cache, you must write a lot to fill them. And if you can afford 40Gb card, you can afford decent controller.
The last sentence raises an excellent point: balance your resources. Don't spend a fortune on one component while another will end up being your bottleneck. Storage is usually the slowest link in the chain. I personally believe that spending the money on NVMe drives makes more sense than 40Gb (except [1], which is suspiciously cheap!) Y. [1] http://a.co/4hsCTqG
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_88F0455F-3BE8-4421-89AF-B80F9A90AAF2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
Le 8 ao=C3=BBt 2017 =C3=A0 08:50, Yaniv Kaul <ykaul@redhat.com> a = =C3=A9crit : =20
Storage is usually the slowest link in the chain. I personally believe = that spending the money on NVMe drives makes more sense than 40Gb = (except [1], which is suspiciously cheap!) =20 Y. [1] http://a.co/4hsCTqG <http://a.co/4hsCTqG>
= http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=3Demr_na-c0437407= 8 It's supported on old Gen8 servers (G10 is comming). It must be coming = from an attic. --Apple-Mail=_88F0455F-3BE8-4421-89AF-B80F9A90AAF2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">Le 8 ao=C3=BBt 2017 =C3=A0 08:50, Yaniv Kaul <<a = href=3D"mailto:ykaul@redhat.com" class=3D"">ykaul@redhat.com</a>> a = =C3=A9crit :</div><br class=3D"Apple-interchange-newline"><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"></div></div></div></div></blockquote><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D"">Storage is usually the slowest = link in the chain. I personally believe that spending the money on NVMe = drives makes more sense than 40Gb (except [1], which is suspiciously = cheap!)</div><div class=3D""><br class=3D""></div><div = class=3D"">Y.</div><div class=3D"">[1] <span = style=3D"color:rgb(17,17,17);font-family:"Amazon = Ember",Arial,sans-serif;font-size:13px" class=3D""><a = href=3D"http://a.co/4hsCTqG" = class=3D"">http://a.co/4hsCTqG</a></span></div></div></div></div></div></b= lockquote><br class=3D""></div><div><br class=3D""></div><div><a = href=3D"http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=3Demr_na-= c04374078" = class=3D"">http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=3Demr_= na-c04374078</a></div><div><br class=3D""></div><div>It's supported on = old Gen8 servers (G10 is comming). It must be coming from an = attic.</div><div><br class=3D""></div><br class=3D""></body></html>= --Apple-Mail=_88F0455F-3BE8-4421-89AF-B80F9A90AAF2--

That's something on the way RAID works, regardless what most 'super-ultra' powerfull hardware controller you may have. RAID 5 or 6 will never have the same write performance as a RAID 10 o 0 for example. Writeback caches can deal with bursts well but they have a limit therefore there will always be a penalty compared to what else you could have. If you have a continuous stream of data (a big VM deployment or a large data copy) there will be a continuous write and that will likely fill up the cache making the disks underneath the bottleneck. That's why on some other scenarios, like ZFS people have multiple groups of RAID 6 (called RAIDZ2) so it improves the write speeds for these type of scenarios. In the scenario given in this thread with just 3 servers, each with a RAID 6 there will be a bare limit on the write performance specially for streammed data for most powerfull your hardware controller can do write-back. Also I agree the 40Gb NICs may not be used fully and 10Gb can do the job well, but if they were available at the begining, why not use them. Fernando On 08/08/2017 03:16, Fabrice Bacchella wrote:
Le 8 août 2017 à 04:08, FERNANDO FREDIANI <fernando.frediani@upx.com> a écrit : Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.
Nope again,from my experience with HP Smart Array and write back cache, write, that goes in the cache, are even faster that read that must goes to the disks. of course if the write are too fast and to big, they will over overflow the cache. But on todays controller they are multi-gigabyte cache, you must write a lot to fill them. And if you can afford 40Gb card, you can afford decent controller.

On tis, 2017-08-08 at 10:24 -0300, FERNANDO FREDIANI wrote:
That's something on the way RAID works, regardless what most 'super-ultra' powerfull hardware controller you may have. RAID 5 or 6 will never have the same write performance as a RAID 10 o 0 for example. Writeback caches can deal with bursts well but they have a limit therefore there will always be a penalty compared to what else you could have.
If you have a continuous stream of data (a big VM deployment or a large data copy) there will be a continuous write and that will likely fill up the cache making the disks underneath the bottleneck. That's why on some other scenarios, like ZFS people have multiple groups of RAID 6 (called RAIDZ2) so it improves the write speeds for these type of scenarios.
Just pointing out that it is commonly known as RAID 60, outside of the ZFS lingo: https://en.wikipedia.org/wiki/Nested_RAID_levels#RAID_60 /K
In the scenario given in this thread with just 3 servers, each with a RAID 6 there will be a bare limit on the write performance specially for streammed data for most powerfull your hardware controller can do write-back.
Also I agree the 40Gb NICs may not be used fully and 10Gb can do the job well, but if they were available at the begining, why not use them.
Fernando
On 08/08/2017 03:16, Fabrice Bacchella wrote:
Le 8 août 2017 à 04:08, FERNANDO FREDIANI <fernando.frediani@upx. com> a écrit : Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.
Nope again,from my experience with HP Smart Array and write back cache, write, that goes in the cache, are even faster that read that must goes to the disks. of course if the write are too fast and to big, they will over overflow the cache. But on todays controller they are multi-gigabyte cache, you must write a lot to fill them. And if you can afford 40Gb card, you can afford decent controller.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Le 8 août 2017 à 15:24, FERNANDO FREDIANI <fernando.frediani@upx.com> a écrit :
That's something on the way RAID works, regardless what most 'super-ultra' powerfull hardware controller you may have. RAID 5 or 6 will never have the same write performance as a RAID 10 o 0 for example. Writeback caches can deal with bursts well but they have a limit therefore there will always be a penalty compared to what else you could have.
Hardware RAID5/6 can have better performance with quite common hardware that software RAID0. I have seen many time on on even old servers that write latency (hitting the cache) was smaller that read latency that was going directly to the disk. I'm not talking about 'super-ultra' powerfull hardware. An HP Smart Array P440ar with 2 GB flash is sell at 560€, public price. Not cheap, but not ultra powerfull. It's now a matter of identifying the bootle neck, and how much money you can throw at it.

Fernando, I agree that RAID is not required here by common sense. The only point to setup RAID is a lack of manageability of GlusterFS. So you just buy manageability for extra hardware cost and write performance in some scenarios. That is it. On 08/08/2017, 16:24, "users-bounces@ovirt.org on behalf of FERNANDO FREDIANI" <users-bounces@ovirt.org on behalf of fernando.frediani@upx.com> wrote: That's something on the way RAID works, regardless what most 'super-ultra' powerfull hardware controller you may have. RAID 5 or 6 will never have the same write performance as a RAID 10 o 0 for example. Writeback caches can deal with bursts well but they have a limit therefore there will always be a penalty compared to what else you could have. If you have a continuous stream of data (a big VM deployment or a large data copy) there will be a continuous write and that will likely fill up the cache making the disks underneath the bottleneck. That's why on some other scenarios, like ZFS people have multiple groups of RAID 6 (called RAIDZ2) so it improves the write speeds for these type of scenarios. In the scenario given in this thread with just 3 servers, each with a RAID 6 there will be a bare limit on the write performance specially for streammed data for most powerfull your hardware controller can do write-back. Also I agree the 40Gb NICs may not be used fully and 10Gb can do the job well, but if they were available at the begining, why not use them. Fernando On 08/08/2017 03:16, Fabrice Bacchella wrote: >> Le 8 août 2017 à 04:08, FERNANDO FREDIANI <fernando.frediani@upx.com> a écrit : >> Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have. >> > Nope again,from my experience with HP Smart Array and write back cache, write, that goes in the cache, are even faster that read that must goes to the disks. of course if the write are too fast and to big, they will over overflow the cache. But on todays controller they are multi-gigabyte cache, you must write a lot to fill them. And if you can afford 40Gb card, you can afford decent controller. > > > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--_000_DB6P190MB02801EA0892B38F503220896C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Fernando, Let's see what people say... But this is what I understood Red Hat says is = the best performance model. This is the main reason to open this discussion= because as long as I can see, some of you in the community, do not agree. But when I think about a "distributed file system", that can make any numbe= r of copies you want, it does not make sense using a RAIDed brick, what it = makes sense is to use JBOD. Moacir ________________________________ From: fernando.frediani@upx.com.br <fernando.frediani@upx.com.br> on behalf= of FERNANDO FREDIANI <fernando.frediani@upx.com> Sent: Tuesday, August 8, 2017 3:08 AM To: Moacir Ferreira Cc: Colin Coe; users@ovirt.org Subject: Re: [ovirt-users] Good practices Moacir, I understand that if you do this type of configuration you will be = severely impacted on storage performance, specially for writes. Even if you= have a Hardware RAID Controller with Writeback cache you will have a signi= ficant performance penalty and may not fully use all the resources you ment= ioned you have. Fernando 2017-08-07 10:03 GMT-03:00 Moacir Ferreira <moacirferreira@hotmail.com<mail= to:moacirferreira@hotmail.com>>: Hi Colin, Take a look on Devin's response. Also, read the doc he shared that gives so= me hints on how to deploy Gluster. It is more like that if you want high-performance you should have the brick= s created as RAID (5 or 6) by the server's disk controller and them assembl= e a JBOD GlusterFS. The attached document is Gluster specific and not for o= Virt. But at this point I think that having SSD will not be a plus as using= the RAID controller Gluster will not be aware of the SSD. Regarding the OS= , my idea is to have a RAID 1, made of 2 low cost HDDs, to install it. So far, based on the information received I should create a single RAID 5 o= r 6 on each server and then use this disk as a brick to create my Gluster c= luster, made of 2 replicas + 1 arbiter. What is new for me is the detail th= at the arbiter does not need a lot of space as it only keeps meta data. Thanks for your response! Moacir ________________________________ From: Colin Coe <colin.coe@gmail.com<mailto:colin.coe@gmail.com>> Sent: Monday, August 7, 2017 12:41 PM To: Moacir Ferreira Cc: users@ovirt.org<mailto:users@ovirt.org> Subject: Re: [ovirt-users] Good practices Hi I just thought that you'd do hardware RAID if you had the controller or JBO= D if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely = to have a hardware RAID controller. I've never done JBOD with hardware RAI= D. I think having a single gluster brick on hardware JBOD would be riskier= than multiple bricks, each on a single disk, but thats not based on anythi= ng other than my prejudices. I thought gluster tiering was for the most frequently accessed files, in wh= ich case all the VMs disks would end up in the hot tier. However, I have b= een wrong before... I just wanted to know where the OS was going as I didn't see it mentioned i= n the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lo= t of wasted disk. Honestly, I think Yaniv's answer was far better than my own and made the im= portant point about having an arbiter. Thanks On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira <moacirferreira@hotmail.com= <mailto:moacirferreira@hotmail.com>> wrote: Hi Colin, I am in Portugal, so sorry for this late response. It is quite confusing fo= r me, please consider: 1 - What if the RAID is done by the server's disk controller, not by softwa= re? 2 - For JBOD I am just using gdeploy to deploy it. However, I am not using = the oVirt node GUI to do this. 3 - As the VM .qcow2 files are quite big, tiering would only help if made b= y an intelligent system that uses SSD for chunks of data not for the entire= .qcow2 file. But I guess this is a problem everybody else has. So, Do you = know how tiering works in Gluster? 4 - I am putting the OS on the first disk. However, would you do differentl= y? Moacir ________________________________ From: Colin Coe <colin.coe@gmail.com<mailto:colin.coe@gmail.com>> Sent: Monday, August 7, 2017 4:48 AM To: Moacir Ferreira Cc: users@ovirt.org<mailto:users@ovirt.org> Subject: Re: [ovirt-users] Good practices 1) RAID5 may be a performance hit- 2) I'd be inclined to do this as JBOD by creating a distributed disperse vo= lume on each server. Something like echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERV= ER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done) 3) I think the above. 4) Gluster does support tiering, but IIRC you'd need the same number of SSD= as spindle drives. There may be another way to use the SSD as a fast cach= e. Where are you putting the OS? Hope I understood the question... Thanks On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>> wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_DB6P190MB02801EA0892B38F503220896C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Fernando,</p> <p><br> </p> <p>Let's see what people say... But this is what I understood Red Hat says = is the best performance model. This is the main reason to open this discuss= ion because as long as I can see, some of you in the community, do not agre= e.<br> </p> <br> <p>But when I think about a "distributed file system", that can m= ake any number of copies you want, it does not make sense using a RAIDed br= ick, what it makes sense is to use JBOD.</p> <p><br> </p> <p>Moacir<br> </p> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> fernando.frediani@u= px.com.br <fernando.frediani@upx.com.br> on behalf of FERNANDO FREDIA= NI <fernando.frediani@upx.com><br> <b>Sent:</b> Tuesday, August 8, 2017 3:08 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> Colin Coe; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr"> <div>Moacir, I understand that if you do this type of configuration you wil= l be severely impacted on storage performance, specially for writes. Even i= f you have a Hardware RAID Controller with Writeback cache you will have a = significant performance penalty and may not fully use all the resources you mentioned you have.<br> <br> </div> Fernando<br> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">2017-08-07 10:03 GMT-03:00 Moacir Ferreira <span= dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span>:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_1874859859842763104divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Colin,</p> <p><br> </p> <p>Take a look on Devin's response. Also, read the doc he shared that gives= some hints on how to deploy Gluster.</p> <p><br> </p> <p>It is more like that if you want high-performance you should have the br= icks created as RAID (5 or 6) by the server's disk controller and them= assemble a JBOD GlusterFS. The attached document is Gluster specific and n= ot for oVirt. But at this point I think that having SSD will not be a plus as using the RAID controller Gluster wi= ll not be aware of the SSD. Regarding the OS, my idea is to have a RAID 1, = made of 2 low cost HDDs, to install it.</p> <p><br> </p> <p>So far, based on the information received I should create a si= ngle RAID 5 or 6 on each server and then use this disk as a brick to create= my Gluster cluster, made of 2 replicas + 1 arbiter. What is new for me= is the detail that the arbiter does not need a lot of space as it only keeps meta data.</p> <p><br> </p> <p>Thanks for your response!<br> </p> Moacir<br> <br> <div style=3D"color:rgb(49,55,57)"> <hr style=3D"display:inline-block; width:98%"> <div id=3D"m_1874859859842763104divRplyFwdMsg" dir=3D"ltr"><font style=3D"f= ont-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> = Colin Coe <<a href=3D"mailto:colin.coe@gmail.com" target=3D"_blank">coli= n.coe@gmail.com</a>><br> <b>Sent:</b> Monday, August 7, 2017 12:41 PM <div> <div class=3D"h5"><br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a href=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt= .org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</div> </div> </font> <div> </div> </div> <div> <div class=3D"h5"> <div> <div dir=3D"ltr">Hi <div><br> </div> <div>I just thought that you'd do hardware RAID if you had the controller o= r JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pret= ty likely to have a hardware RAID controller. I've never done JBOD wi= th hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each= on a single disk, but thats not based on anything other than my prejudices= .</div> <div><br> </div> <div>I thought gluster tiering was for the most frequently accessed files, = in which case all the VMs disks would end up in the hot tier. However= , I have been wrong before...</div> <div><br> </div> <div>I just wanted to know where the OS was going as I didn't see it mentio= ned in the OP. Normally, I'd have the OS on a RAID1 but in your case = thats a lot of wasted disk.</div> <div><br> </div> <div>Honestly, I think Yaniv's answer was far better than my own and made t= he important point about having an arbiter. </div> <div><br> </div> <div>Thanks</div> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira = <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_1874859859842763104m_4831886883305672718divtagdefaultwrapper" = dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Hel= vetica,sans-serif"> <p>Hi Colin,</p> <p><br> </p> <p><span>I am in Portugal</span>, so sorry for this late response. It is qu= ite confusing for me, please consider:</p> <p></p> <div><b><br> </b>1<b> - </b>What if the RAID is done by the server's disk controller, no= t by software?</div> <br> <p></p> <p>2 -<b> </b>For JBOD I am just using gdeploy to deploy it. However, I am = not using the oVirt node GUI to do this.</p> <p><br> </p> <p>3 -<b> </b>As the VM .qcow2 files are quite big, tiering would only= help if made by an intelligent system that uses SSD for chunks of data not= for the entire .qcow2 file. But I guess this is a problem everybody else h= as. So, Do you know how tiering works in Gluster?<br> </p> <p><br> </p> <p>4 - I am putting the OS on the first disk. However, would you do di= fferently?<br> </p> <p><br> </p> Moacir<br> <br> <div style=3D"color:rgb(49,55,57)"> <hr style=3D"display:inline-block; width:98%"> <div id=3D"m_1874859859842763104m_4831886883305672718divRplyFwdMsg" dir=3D"= ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif" color=3D"#= 000000"><b>From:</b> Colin Coe <<a href=3D"mailto:colin.coe@gmail.com" t= arget=3D"_blank">colin.coe@gmail.com</a>><br> <b>Sent:</b> Monday, August 7, 2017 4:48 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a href=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt= .org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr">1) RAID5 may be a performance hit- <br> <div><br> </div> <div><span>2) I'd be inclined to do this as JBOD by creating a distributed = disperse volume on each server. Something like <div><br> </div> <div>echo gluster volume create dispersevol disperse-data 5 redundancy 2 \<= /div> <div>$(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "se= rver${SERVER}:/brick/brick-<wbr>${SERVER}${BRICK}/brick \c"; done; don= e)</div> <div><br> </div> </span> <div>3) I think the above. <b></b></div> <span> <div><br> </div> <div>4) Gluster does support tiering, but IIRC you'd need the same number o= f SSD as spindle drives. There may be another way to use the SSD as a= fast cache. </div> <div><br> </div> <div>Where are you putting the OS?</div> <div><br> </div> <div>Hope I understood the question...</div> <div><br> </div> <div>Thanks</div> </span></div> </div> <span> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira= <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_1874859859842763104m_4831886883305672718m_2460985691746498322d= ivtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:#000000; fo= nt-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class=3D"m_1874859859842763104m_4831886883305672718HOEnZb"><font colo= r=3D"#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><br=
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </span></div> </div> </div> </div> </blockquote> </div> <br> </div> </div> </div> </div> </div> </div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </div> </div> </div> </body> </html> --_000_DB6P190MB02801EA0892B38F503220896C88A0DB6P190MB0280EURP_--

This is a multi-part message in MIME format. --------------A95CF1C0E4312049A540BB6A Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Exactly Moacir, that is my point. A proper Distributed FIlesystem should not rely on any type of RAID as it can make its own redundancy without having to rely on any underneath layer (look at CEPH). Using RAID may help with management and in certain scenarios to replace a faulty disk, but at a cost, not cheap by the way. That's why in terms of resourcing saving, if a replica 3 brings those issues mentioned it is much worth to have a small arbiter somewhere instead of wasting a significant amount of disk space. Fernando On 08/08/2017 06:09, Moacir Ferreira wrote:
Fernando,
Let's see what people say... But this is what I understood Red Hat says is the best performance model. This is the main reason to open this discussion because as long as I can see, some of you in the community, do not agree.
But when I think about a "distributed file system", that can make any number of copies you want, it does not make sense using a RAIDed brick, what it makes sense is to use JBOD.
Moacir
------------------------------------------------------------------------ *From:* fernando.frediani@upx.com.br <fernando.frediani@upx.com.br> on behalf of FERNANDO FREDIANI <fernando.frediani@upx.com> *Sent:* Tuesday, August 8, 2017 3:08 AM *To:* Moacir Ferreira *Cc:* Colin Coe; users@ovirt.org *Subject:* Re: [ovirt-users] Good practices Moacir, I understand that if you do this type of configuration you will be severely impacted on storage performance, specially for writes. Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.
Fernando
2017-08-07 10:03 GMT-03:00 Moacir Ferreira <moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>>:
Hi Colin,
Take a look on Devin's response. Also, read the doc he shared that gives some hints on how to deploy Gluster.
It is more like that if you want high-performance you should have the bricks created as RAID (5 or 6) by the server's disk controller and them assemble a JBOD GlusterFS. The attached document is Gluster specific and not for oVirt. But at this point I think that having SSD will not be a plus as using the RAID controller Gluster will not be aware of the SSD. Regarding the OS, my idea is to have a RAID 1, made of 2 low cost HDDs, to install it.
So far, based on the information received I should create a single RAID 5 or 6 on each server and then use this disk as a brick to create my Gluster cluster, made of 2 replicas + 1 arbiter. What is new for me is the detail that the arbiter does not need a lot of space as it only keeps meta data.
Thanks for your response!
Moacir
------------------------------------------------------------------------ *From:* Colin Coe <colin.coe@gmail.com <mailto:colin.coe@gmail.com>> *Sent:* Monday, August 7, 2017 12:41 PM
*To:* Moacir Ferreira *Cc:* users@ovirt.org <mailto:users@ovirt.org> *Subject:* Re: [ovirt-users] Good practices Hi
I just thought that you'd do hardware RAID if you had the controller or JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely to have a hardware RAID controller. I've never done JBOD with hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each on a single disk, but thats not based on anything other than my prejudices.
I thought gluster tiering was for the most frequently accessed files, in which case all the VMs disks would end up in the hot tier. However, I have been wrong before...
I just wanted to know where the OS was going as I didn't see it mentioned in the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lot of wasted disk.
Honestly, I think Yaniv's answer was far better than my own and made the important point about having an arbiter.
Thanks
On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira <moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>> wrote:
Hi Colin,
I am in Portugal, so sorry for this late response. It is quite confusing for me, please consider:
* *1*- *What if the RAID is done by the server's disk controller, not by software?
2 -**For JBOD I am just using gdeploy to deploy it. However, I am not using the oVirt node GUI to do this.
3 -**As the VM .qcow2 files are quite big, tiering would only help if made by an intelligent system that uses SSD for chunks of data not for the entire .qcow2 file. But I guess this is a problem everybody else has. So, Do you know how tiering works in Gluster?
4 - I am putting the OS on the first disk. However, would you do differently?
Moacir
------------------------------------------------------------------------ *From:* Colin Coe <colin.coe@gmail.com <mailto:colin.coe@gmail.com>> *Sent:* Monday, August 7, 2017 4:48 AM *To:* Moacir Ferreira *Cc:* users@ovirt.org <mailto:users@ovirt.org> *Subject:* Re: [ovirt-users] Good practices 1) RAID5 may be a performance hit-
2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like
echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done)
3) I think the above.
4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache.
Where are you putting the OS?
Hope I understood the question...
Thanks
On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
--------------A95CF1C0E4312049A540BB6A Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Exactly Moacir, that is my point.</p> <p><br> </p> <p>A proper Distributed FIlesystem should not rely on any type of RAID as it can make its own redundancy without having to rely on any underneath layer (look at CEPH). Using RAID may help with management and in certain scenarios to replace a faulty disk, but at a cost, not cheap by the way.<br> That's why in terms of resourcing saving, if a replica 3 brings those issues mentioned it is much worth to have a small arbiter somewhere instead of wasting a significant amount of disk space.</p> <p><br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 08/08/2017 06:09, Moacir Ferreira wrote:<br> </div> <blockquote type="cite" cite="mid:DB6P190MB02801EA0892B38F503220896C88A0@DB6P190MB0280.EURP190.PROD.OUTLOOK.COM"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr"> <p>Fernando,</p> <p><br> </p> <p>Let's see what people say... But this is what I understood Red Hat says is the best performance model. This is the main reason to open this discussion because as long as I can see, some of you in the community, do not agree.<br> </p> <br> <p>But when I think about a "distributed file system", that can make any number of copies you want, it does not make sense using a RAIDed brick, what it makes sense is to use JBOD.</p> <p><br> </p> <p>Moacir<br> </p> <br> <div style="color: rgb(49, 55, 57);"> <hr tabindex="-1" style="display:inline-block; width:98%"> <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>From:</b> <a class="moz-txt-link-abbreviated" href="mailto:fernando.frediani@upx.com.br">fernando.frediani@upx.com.br</a> <a class="moz-txt-link-rfc2396E" href="mailto:fernando.frediani@upx.com.br"><fernando.frediani@upx.com.br></a> on behalf of FERNANDO FREDIANI <a class="moz-txt-link-rfc2396E" href="mailto:fernando.frediani@upx.com"><fernando.frediani@upx.com></a><br> <b>Sent:</b> Tuesday, August 8, 2017 3:08 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> Colin Coe; <a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir="ltr"> <div>Moacir, I understand that if you do this type of configuration you will be severely impacted on storage performance, specially for writes. Even if you have a Hardware RAID Controller with Writeback cache you will have a significant performance penalty and may not fully use all the resources you mentioned you have.<br> <br> </div> Fernando<br> </div> <div class="gmail_extra"><br> <div class="gmail_quote">2017-08-07 10:03 GMT-03:00 Moacir Ferreira <span dir="ltr"> <<a href="mailto:moacirferreira@hotmail.com" target="_blank" moz-do-not-send="true">moacirferreira@hotmail.com</a>></span>:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_1874859859842763104divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Colin,</p> <p><br> </p> <p>Take a look on Devin's response. Also, read the doc he shared that gives some hints on how to deploy Gluster.</p> <p><br> </p> <p>It is more like that if you want high-performance you should have the bricks created as RAID (5 or 6) by the server's disk controller and them assemble a JBOD GlusterFS. The attached document is Gluster specific and not for oVirt. But at this point I think that having SSD will not be a plus as using the RAID controller Gluster will not be aware of the SSD. Regarding the OS, my idea is to have a RAID 1, made of 2 low cost HDDs, to install it.</p> <p><br> </p> <p>So far, based on the information received I should create a single RAID 5 or 6 on each server and then use this disk as a brick to create my Gluster cluster, made of 2 replicas + 1 arbiter. What is new for me is the detail that the arbiter does not need a lot of space as it only keeps meta data.</p> <p><br> </p> <p>Thanks for your response!<br> </p> Moacir<br> <br> <div style="color:rgb(49,55,57)"> <hr style="display:inline-block; width:98%"> <div id="m_1874859859842763104divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>From:</b> Colin Coe <<a href="mailto:colin.coe@gmail.com" target="_blank" moz-do-not-send="true">colin.coe@gmail.com</a>><br> <b>Sent:</b> Monday, August 7, 2017 12:41 PM <div> <div class="h5"><br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a href="mailto:users@ovirt.org" target="_blank" moz-do-not-send="true">users@ovirt.org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</div> </div> </font> <div> </div> </div> <div> <div class="h5"> <div> <div dir="ltr">Hi <div><br> </div> <div>I just thought that you'd do hardware RAID if you had the controller or JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely to have a hardware RAID controller. I've never done JBOD with hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each on a single disk, but thats not based on anything other than my prejudices.</div> <div><br> </div> <div>I thought gluster tiering was for the most frequently accessed files, in which case all the VMs disks would end up in the hot tier. However, I have been wrong before...</div> <div><br> </div> <div>I just wanted to know where the OS was going as I didn't see it mentioned in the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lot of wasted disk.</div> <div><br> </div> <div>Honestly, I think Yaniv's answer was far better than my own and made the important point about having an arbiter. </div> <div><br> </div> <div>Thanks</div> </div> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira <span dir="ltr"> <<a href="mailto:moacirferreira@hotmail.com" target="_blank" moz-do-not-send="true">moacirferreira@hotmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_1874859859842763104m_4831886883305672718divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Colin,</p> <p><br> </p> <p><span>I am in Portugal</span>, so sorry for this late response. It is quite confusing for me, please consider:</p> <div><b><br> </b>1<b> - </b>What if the RAID is done by the server's disk controller, not by software?</div> <br> <p>2 -<b> </b>For JBOD I am just using gdeploy to deploy it. However, I am not using the oVirt node GUI to do this.</p> <p><br> </p> <p>3 -<b> </b>As the VM .qcow2 files are quite big, tiering would only help if made by an intelligent system that uses SSD for chunks of data not for the entire .qcow2 file. But I guess this is a problem everybody else has. So, Do you know how tiering works in Gluster?<br> </p> <p><br> </p> <p>4 - I am putting the OS on the first disk. However, would you do differently?<br> </p> <p><br> </p> Moacir<br> <br> <div style="color:rgb(49,55,57)"> <hr style="display:inline-block; width:98%"> <div id="m_1874859859842763104m_4831886883305672718divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>From:</b> Colin Coe <<a href="mailto:colin.coe@gmail.com" target="_blank" moz-do-not-send="true">colin.coe@gmail.com</a>><br> <b>Sent:</b> Monday, August 7, 2017 4:48 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a href="mailto:users@ovirt.org" target="_blank" moz-do-not-send="true">users@ovirt.org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir="ltr">1) RAID5 may be a performance hit- <br> <div><br> </div> <div><span>2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like <div><br> </div> <div>echo gluster volume create dispersevol disperse-data 5 redundancy 2 \</div> <div>$(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-<wbr>${SERVER}${BRICK}/brick \c"; done; done)</div> <div><br> </div> </span> <div>3) I think the above. </div> <span> <div><br> </div> <div>4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache. </div> <div><br> </div> <div>Where are you putting the OS?</div> <div><br> </div> <div>Hope I understood the question...</div> <div><br> </div> <div>Thanks</div> </span></div> </div> <span> <div class="gmail_extra"><br> <div class="gmail_quote">On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira <span dir="ltr"> <<a href="mailto:moacirferreira@hotmail.com" target="_blank" moz-do-not-send="true">moacirferreira@hotmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_1874859859842763104m_4831886883305672718m_2460985691746498322divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class="m_1874859859842763104m_4831886883305672718HOEnZb"><font color="#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </span></div> </div> </div> </div> </blockquote> </div> <br> </div> </div> </div> </div> </div> </div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </div> </div> </blockquote> <br> </body> </html> --------------A95CF1C0E4312049A540BB6A--

On Mon, Aug 7, 2017 at 2:41 PM, Colin Coe <colin.coe@gmail.com> wrote:
Hi
I just thought that you'd do hardware RAID if you had the controller or JBOD if you didn't. In hindsight, a server with 40Gbps NICs is pretty likely to have a hardware RAID controller. I've never done JBOD with hardware RAID. I think having a single gluster brick on hardware JBOD would be riskier than multiple bricks, each on a single disk, but thats not based on anything other than my prejudices.
I thought gluster tiering was for the most frequently accessed files, in which case all the VMs disks would end up in the hot tier. However, I have been wrong before...
The most frequent shards, may not be complete files. Y.
I just wanted to know where the OS was going as I didn't see it mentioned in the OP. Normally, I'd have the OS on a RAID1 but in your case thats a lot of wasted disk.
Honestly, I think Yaniv's answer was far better than my own and made the important point about having an arbiter.
Thanks
On Mon, Aug 7, 2017 at 5:56 PM, Moacir Ferreira < moacirferreira@hotmail.com> wrote:
Hi Colin,
I am in Portugal, so sorry for this late response. It is quite confusing for me, please consider:
1* - *What if the RAID is done by the server's disk controller, not by software?
2 - For JBOD I am just using gdeploy to deploy it. However, I am not using the oVirt node GUI to do this.
3 - As the VM .qcow2 files are quite big, tiering would only help if made by an intelligent system that uses SSD for chunks of data not for the entire .qcow2 file. But I guess this is a problem everybody else has. So, Do you know how tiering works in Gluster?
4 - I am putting the OS on the first disk. However, would you do differently?
Moacir
------------------------------ *From:* Colin Coe <colin.coe@gmail.com> *Sent:* Monday, August 7, 2017 4:48 AM *To:* Moacir Ferreira *Cc:* users@ovirt.org *Subject:* Re: [ovirt-users] Good practices
1) RAID5 may be a performance hit-
2) I'd be inclined to do this as JBOD by creating a distributed disperse volume on each server. Something like
echo gluster volume create dispersevol disperse-data 5 redundancy 2 \ $(for SERVER in a b c; do for BRICK in $(seq 1 5); do echo -e "server${SERVER}:/brick/brick-${SERVER}${BRICK}/brick \c"; done; done)
3) I think the above.
4) Gluster does support tiering, but IIRC you'd need the same number of SSD as spindle drives. There may be another way to use the SSD as a fast cache.
Where are you putting the OS?
Hope I understood the question...
Thanks
On Sun, Aug 6, 2017 at 10:49 PM, Moacir Ferreira < moacirferreira@hotmail.com> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sun, Aug 6, 2017 at 5:49 PM, Moacir Ferreira <moacirferreira@hotmail.com> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
Very nice gear. How are you planning the network exactly? Without a switch, back-to-back? (sounds OK to me, just wanted to ensure this is what the 'dual' is used for). However, I'm unsure if you have the correct balance between the interface speeds (40g) and the disks (too many HDDs?).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
I would assume RAID 1 for the operating system (you don't want a single point of failure there?) and the rest JBODs. The SSD will be used for caching, I reckon? (I personally would add more SSDs instead of HDDs, but it does depend on the disk sizes and your space requirements.
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
Replica 2 + Arbiter sounds good to me.
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
Yes, I believe using lvmcache is the best way to go.
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
Don't forget jumbo frames. libgfapi (coming hopefully in 4.1.5). Sharding (enabled out of the box if you use a hyper-converged setup via gdeploy). Y.
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--_000_VI1P190MB02856C6CA513853E8EDA2DB2C8B50VI1P190MB0285EURP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, in-line responses. Thanks, Moacir ________________________________ From: Yaniv Kaul <ykaul@redhat.com> Sent: Monday, August 7, 2017 7:42 AM To: Moacir Ferreira Cc: users@ovirt.org Subject: Re: [ovirt-users] Good practices On Sun, Aug 6, 2017 at 5:49 PM, Moacir Ferreira <moacirferreira@hotmail.com= <mailto:moacirferreira@hotmail.com>> wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). Very nice gear. How are you planning the network exactly? Without a switch,= back-to-back? (sounds OK to me, just wanted to ensure this is what the 'du= al' is used for). However, I'm unsure if you have the correct balance betwe= en the interface speeds (40g) and the disks (too many HDDs?). Moacir: The idea is to have a very high performance network for the distrib= uted file system and to prevent bottlenecks when we move one VM from a node= to another. Using 40Gb NICs I can just connect the servers back-to-back. I= n this case I don't need the expensive 40Gb switch, I get very high speed a= nd no contention between north/south traffic with east/west. This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? I would assume RAID 1 for the operating system (you don't want a single poi= nt of failure there?) and the rest JBODs. The SSD will be used for caching,= I reckon? (I personally would add more SSDs instead of HDDs, but it does d= epend on the disk sizes and your space requirements. Moacir: Yes, I agree that I need a RAID-1 for the OS. Now, generic JBOD or = a JBOD assembled using RAID-5 "disks" created by the server's disk controll= er? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? Replica 2 + Arbiter sounds good to me. Moacir: I agree, and that is what I am using. 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? Yes, I believe using lvmcache is the best way to go. Moacir: Are you sure? I say that because the qcow2 files will be quite big.= So if tiering is "file based" the SSD would have to be very, very big unle= ss Gluster tiering do it by "chunks of data". At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? Don't forget jumbo frames. libgfapi (coming hopefully in 4.1.5). Sharding (= enabled out of the box if you use a hyper-converged setup via gdeploy). Moacir: Yes! This is another reason to have separate networks for north/sou= th and east/west. In that way I can use the standard MTU on the 10Gb NICs a= nd jumbo frames on the file/move 40Gb NICs. Y. You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_VI1P190MB02856C6CA513853E8EDA2DB2C8B50VI1P190MB0285EURP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Hi, in-line responses.<br> </p> <br> <p>Thanks,</p> <p>Moacir<br> </p> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Yaniv Kaul <ykau= l@redhat.com><br> <b>Sent:</b> Monday, August 7, 2017 7:42 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir=3D"ltr"><br> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Sun, Aug 6, 2017 at 5:49 PM, Moacir Ferreira = <span dir=3D"ltr"> <<a href=3D"mailto:moacirferreira@hotmail.com" target=3D"_blank">moacirf= erreira@hotmail.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> </div> </div> </blockquote> <div><br> </div> <div>Very nice gear. How are you planning the network exactly? Without a sw= itch, back-to-back? (sounds OK to me, just wanted to ensure this is what th= e 'dual' is used for). However, I'm unsure if you have the correct balance = between the interface speeds (40g) and the disks (too many HDDs?).</div> <div><br> <span style=3D"color: rgb(0, 0, 0);">M</span><span style=3D"color: rgb(0, 0= , 0);">o</span><span style=3D"color: rgb(0, 0, 0);">a</span><span style=3D"= color: rgb(0, 0, 0);">c</span><span style=3D"color: rgb(0, 0, 0);">i</span>= <span style=3D"color: rgb(0, 0, 0);">r</span><span style=3D"color: rgb(0, 0= , 0);">:</span><span style=3D"color: rgb(0, 0, 0);"> The idea is to have a very high performance network for the distributed fi= le system and to prevent bottlenecks when we move one VM from a node to ano= ther. Using </span><span style=3D"color: rgb(0, 0, 0);">40Gb NICs I can just connect th= e servers back-to-back. In this case I don't need the expensive 40Gb switch= , I get very high speed and no contention between north/south traffic with = east/west.</span><br> <br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> </div> </div> </blockquote> <div>I would assume RAID 1 for the operating system (you don't want a singl= e point of failure there?) and the rest JBODs. The SSD will be used for cac= hing, I reckon? (I personally would add more SSDs instead of HDDs, but it d= oes depend on the disk sizes and your space requirements.<br> <br> <span style=3D"color: rgb(0, 0, 0);">Moacir: Yes, I agree that I need a RAI= D-1 for the OS. Now, generic JBOD or a JBOD assembled using RAID-5 "di= sks" created</span><span style=3D"color: rgb(0, 0, 0);"> by the server= 's disk </span><span style=3D"color: rgb(0, 0, 0);">controller?</span><br> </div> <span style=3D"color: rgb(0, 0, 0);"></span> <div> <br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> </div> </div> </blockquote> <div><br> </div> <div>Replica 2 + Arbiter sounds good to me.</div> <div><span style=3D"color: rgb(0, 0, 0);">M</span><span style=3D"color: rgb= (0, 0, 0);">o</span><span style=3D"color: rgb(0, 0, 0);">a</span><span styl= e=3D"color: rgb(0, 0, 0);">c</span><span style=3D"color: rgb(0, 0, 0);">i</= span><span style=3D"color: rgb(0, 0, 0);">r</span><span style=3D"color: rgb= (0, 0, 0);">:</span><span style=3D"color: rgb(0, 0, 0);"> </span><span style=3D"color: rgb(0, 0, 0);">I agree, and that is what I am = using.</span><br> <br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p></p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> </div> </div> </blockquote> <div><br> </div> <div>Yes, I believe using lvmcache is the best way to go. </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Moacir: Are you sure? I say that because the qcow2 files will be quite b= ig. So if tiering is "file based" the SSD would have to be very, = very big unless Gluster tiering do it by "chunks of data".<br> </p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> </div> </div> </blockquote> <div><br> </div> <div>Don't forget jumbo frames. libgfapi (coming hopefully in 4.1.5). Shard= ing (enabled out of the box if you use a hyper-converged setup via gdeploy)= .<br> <b>Moacir:</b> Yes! This is another reason to have separate networks for no= rth/south and east/west. In that way I can use the standard MTU on the 10Gb= NICs and jumbo frames on the file/move 40Gb NICs.<br> <br> </div> <div>Y.</div> <div> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1= px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_5509585889569690791divtagdefaultwrapper" dir=3D"ltr" style=3D"= font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p></p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class=3D"HOEnZb"><font color=3D"#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </div> </div> </div> </div> </body> </html> --_000_VI1P190MB02856C6CA513853E8EDA2DB2C8B50VI1P190MB0285EURP_--

This is a multi-part message in MIME format. --------------7999E232532992C2B4184FF7 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Moacir, I beleive for using the 3 servers directly connected to each others without switch you have to have a Bridge on each server for every 2 physical interfaces to allow the traffic passthrough in layer2 (Is it possible to create this from the oVirt Engine Web Interface?). If your ovirtmgmt network is separate from other (should really be) that should be fine to do. Fernando On 07/08/2017 07:13, Moacir Ferreira wrote:
Hi, in-line responses.
Thanks,
Moacir
------------------------------------------------------------------------ *From:* Yaniv Kaul <ykaul@redhat.com> *Sent:* Monday, August 7, 2017 7:42 AM *To:* Moacir Ferreira *Cc:* users@ovirt.org *Subject:* Re: [ovirt-users] Good practices
On Sun, Aug 6, 2017 at 5:49 PM, Moacir Ferreira <moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>> wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
Very nice gear. How are you planning the network exactly? Without a switch, back-to-back? (sounds OK to me, just wanted to ensure this is what the 'dual' is used for). However, I'm unsure if you have the correct balance between the interface speeds (40g) and the disks (too many HDDs?).
Moacir:The idea is to have a very high performance network for the distributed file system and to prevent bottlenecks when we move one VM from a node to another. Using 40Gb NICs I can just connect the servers back-to-back. In this case I don't need the expensive 40Gb switch, I get very high speed and no contention between north/south traffic with east/west.
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
I would assume RAID 1 for the operating system (you don't want a single point of failure there?) and the rest JBODs. The SSD will be used for caching, I reckon? (I personally would add more SSDs instead of HDDs, but it does depend on the disk sizes and your space requirements.
Moacir: Yes, I agree that I need a RAID-1 for the OS. Now, generic JBOD or a JBOD assembled using RAID-5 "disks" createdby the server's disk controller?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
Replica 2 + Arbiter sounds good to me. Moacir:I agree, and that is what I am using.
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
Yes, I believe using lvmcache is the best way to go.
Moacir: Are you sure? I say that because the qcow2 files will be quite big. So if tiering is "file based" the SSD would have to be very, very big unless Gluster tiering do it by "chunks of data".
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
Don't forget jumbo frames. libgfapi (coming hopefully in 4.1.5). Sharding (enabled out of the box if you use a hyper-converged setup via gdeploy). *Moacir:* Yes! This is another reason to have separate networks for north/south and east/west. In that way I can use the standard MTU on the 10Gb NICs and jumbo frames on the file/move 40Gb NICs.
Y.
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------7999E232532992C2B4184FF7 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Moacir, I beleive for using the 3 servers directly connected to each others without switch you have to have a Bridge on each server for every 2 physical interfaces to allow the traffic passthrough in layer2 (Is it possible to create this from the oVirt Engine Web Interface?). If your ovirtmgmt network is separate from other (should really be) that should be fine to do.</p> <p><br> </p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 07:13, Moacir Ferreira wrote:<br> </div> <blockquote type="cite" cite="mid:VI1P190MB02856C6CA513853E8EDA2DB2C8B50@VI1P190MB0285.EURP190.PROD.OUTLOOK.COM"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr"> <p>Hi, in-line responses.<br> </p> <br> <p>Thanks,</p> <p>Moacir<br> </p> <br> <div style="color: rgb(49, 55, 57);"> <hr tabindex="-1" style="display:inline-block; width:98%"> <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>From:</b> Yaniv Kaul <a class="moz-txt-link-rfc2396E" href="mailto:ykaul@redhat.com"><ykaul@redhat.com></a><br> <b>Sent:</b> Monday, August 7, 2017 7:42 AM<br> <b>To:</b> Moacir Ferreira<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a><br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Sun, Aug 6, 2017 at 5:49 PM, Moacir Ferreira <span dir="ltr"> <<a href="mailto:moacirferreira@hotmail.com" target="_blank" moz-do-not-send="true">moacirferreira@hotmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> </div> </div> </blockquote> <div><br> </div> <div>Very nice gear. How are you planning the network exactly? Without a switch, back-to-back? (sounds OK to me, just wanted to ensure this is what the 'dual' is used for). However, I'm unsure if you have the correct balance between the interface speeds (40g) and the disks (too many HDDs?).</div> <div><br> <span style="color: rgb(0, 0, 0);">M</span><span style="color: rgb(0, 0, 0);">o</span><span style="color: rgb(0, 0, 0);">a</span><span style="color: rgb(0, 0, 0);">c</span><span style="color: rgb(0, 0, 0);">i</span><span style="color: rgb(0, 0, 0);">r</span><span style="color: rgb(0, 0, 0);">:</span><span style="color: rgb(0, 0, 0);"> The idea is to have a very high performance network for the distributed file system and to prevent bottlenecks when we move one VM from a node to another. Using </span><span style="color: rgb(0, 0, 0);">40Gb NICs I can just connect the servers back-to-back. In this case I don't need the expensive 40Gb switch, I get very high speed and no contention between north/south traffic with east/west.</span><br> <br> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><br> <span></span></p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> </div> </div> </blockquote> <div>I would assume RAID 1 for the operating system (you don't want a single point of failure there?) and the rest JBODs. The SSD will be used for caching, I reckon? (I personally would add more SSDs instead of HDDs, but it does depend on the disk sizes and your space requirements.<br> <br> <span style="color: rgb(0, 0, 0);">Moacir: Yes, I agree that I need a RAID-1 for the OS. Now, generic JBOD or a JBOD assembled using RAID-5 "disks" created</span><span style="color: rgb(0, 0, 0);"> by the server's disk </span><span style="color: rgb(0, 0, 0);">controller?</span><br> </div> <span style="color: rgb(0, 0, 0);"></span> <div> <br> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> </div> </div> </blockquote> <div><br> </div> <div>Replica 2 + Arbiter sounds good to me.</div> <div><span style="color: rgb(0, 0, 0);">M</span><span style="color: rgb(0, 0, 0);">o</span><span style="color: rgb(0, 0, 0);">a</span><span style="color: rgb(0, 0, 0);">c</span><span style="color: rgb(0, 0, 0);">i</span><span style="color: rgb(0, 0, 0);">r</span><span style="color: rgb(0, 0, 0);">:</span><span style="color: rgb(0, 0, 0);"> </span><span style="color: rgb(0, 0, 0);">I agree, and that is what I am using.</span><br> <br> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> </div> </div> </blockquote> <div><br> </div> <div>Yes, I believe using lvmcache is the best way to go. </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Moacir: Are you sure? I say that because the qcow2 files will be quite big. So if tiering is "file based" the SSD would have to be very, very big unless Gluster tiering do it by "chunks of data".<br> </p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> </div> </div> </blockquote> <div><br> </div> <div>Don't forget jumbo frames. libgfapi (coming hopefully in 4.1.5). Sharding (enabled out of the box if you use a hyper-converged setup via gdeploy).<br> <b>Moacir:</b> Yes! This is another reason to have separate networks for north/south and east/west. In that way I can use the standard MTU on the 10Gb NICs and jumbo frames on the file/move 40Gb NICs.<br> <br> </div> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> <div dir="ltr"> <div id="m_5509585889569690791divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif"> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <span class="HOEnZb"><font color="#888888"> <p>Moacir<br> </p> </font></span></div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </div> </div> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------7999E232532992C2B4184FF7--

--Apple-Mail=_E33F2A30-713D-4209-8D44-2134E4046305 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252
Moacir: Yes! This is another reason to have separate networks for = north/south and east/west. In that way I can use the standard MTU on the = 10Gb NICs and jumbo frames on the file/move 40Gb NICs.
Why not Jumbo frame every where ?= --Apple-Mail=_E33F2A30-713D-4209-8D44-2134E4046305 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D""><blockquote type=3D"cite" = cite=3D"mid:VI1P190MB02856C6CA513853E8EDA2DB2C8B50@VI1P190MB0285.EURP190.P= ROD.OUTLOOK.COM" style=3D"font-family: Helvetica; font-size: 12px; = font-style: normal; font-variant-caps: normal; font-weight: normal; = letter-spacing: normal; orphans: auto; text-align: start; text-indent: = 0px; text-transform: none; white-space: normal; widows: auto; = word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: = rgb(255, 255, 255);" class=3D""><div id=3D"divtagdefaultwrapper" = dir=3D"ltr" style=3D"font-size: 12pt; font-family: Calibri, Helvetica, = sans-serif;" class=3D""><div style=3D"color: rgb(49, 55, 57);" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><b = class=3D"">Moacir:</b><span = class=3D"Apple-converted-space"> </span>Yes! This is another reason = to have separate networks for north/south and east/west. In that way I = can use the standard MTU on the 10Gb NICs and jumbo frames on the = file/move 40Gb NICs.<br = class=3D""></div></div></div></div></div></div></div></blockquote></div></= blockquote></div><br class=3D""><div class=3D"">Why not Jumbo frame = every where ?</div></body></html>= --Apple-Mail=_E33F2A30-713D-4209-8D44-2134E4046305--

Moacir, I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned. In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least. Here is the direct quote from Red Hat when I asked about RAID and Bricks: *"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” * *In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. * *Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. * *Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.* -- Devin Acosta, RHCA, RHVCA Red Hat Certified Architect On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com) wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Here is the direct quote from Red Hat when I asked about RAID and Bri= cks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>"A typical Gluster configuration would use RAID underneath the bri= cks. RAID 6 is most typical as it gives you 2 disk failure protection, but = RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. Th= e most popular way of doing this would be distributed replicated with 2x re=
--_000_VI1P190MB0285197D4F878904201462F9C8B50VI1P190MB0285EURP_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Devin, Many, many thaks for your response. I will read the doc you sent and if I s= till have questions I will post them here. But why would I use a RAIDed brick if Gluster, by itself, already "protects= " the data by making replicas. You see, that is what is confusing to me... Thanks, Moacir ________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices Moacir, I have recently installed multiple Red Hat Virtualization hosts for several= different companies, and have dealt with the Red Hat Support Team in depth= about optimal configuration in regards to setting up GlusterFS most effici= ently and I wanted to share with you what I learned. In general Red Hat Virtualization team frowns upon using each DISK of the s= ystem as just a JBOD, sure there is some protection by having the data repl= icated, however, the recommendation is to use RAID 6 (preferred) or RAID-5,= or at least RAID-1 at the very least. Here is the direct quote from Red Hat when I asked about RAID and Bricks: "A typical Gluster configuration would use RAID underneath the bricks. RAID= 6 is most typical as it gives you 2 disk failure protection, but RAID 5 co= uld be used too. Once you have the RAIDed bricks, you'd then apply the desi= red replication on top of that. The most popular way of doing this would be= distributed replicated with 2x replication. In general you'll get better p= erformance with larger bricks. 12 drives is often a sweet spot. Another opt= ion would be to create a separate tier using all SSD=92s.=94 In order to SSD tiering from my understanding you would need 1 x NVMe drive= in each server, or 4 x SSD hot tier (it needs to be distributed, replicate= d for the hot tier if not using NVME). So with you only having 1 SSD drive = in each server, I=92d suggest maybe looking into the NVME option. Since your using only 3-servers, what I=92d probably suggest is to do (2 Re= plicas + Arbiter Node), this setup actually doesn=92t require the 3rd serve= r to have big drives at all as it only stores meta-data about the files and= not actually a full copy. Please see the attached document that was given to me by Red Hat to get mor= e information on this. Hope this information helps you. -- Devin Acosta, RHCA, RHVCA Red Hat Certified Architect On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>) wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_VI1P190MB0285197D4F878904201462F9C8B50VI1P190MB0285EURP_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Devin,</p> <p><br> </p> <p>Many, many thaks for your response. I will read the doc you sent and if = I still have questions I will post them here.</p> <p><br> </p> <p>But why would I use a RAIDed brick if Gluster, by itself, already "= protects" the data by making replicas. You see, that is what is confus= ing to me... <br> </p> <br> <p>Thanks,</p> <p>Moacir<br> </p> <br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <de= vin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">I have recently installed multiple Red Hat Virtualizatio= n hosts for several different companies, and have dealt with the Red Hat Su= pport Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share = with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">In general Red Hat Virtualization team frowns upon using= each DISK of the system as just a JBOD, sure there is some protection by h= aving the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 = at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= plication. In general you'll get better performance with larger bricks= . 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=92s.=94 </i><= /font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i>In order to SSD tiering from my understanding you would need 1 x N= VMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, = replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=92d suggest may= be looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Since your using only 3-servers, what I=92d probably suggest is to do (= 2 Replicas + Arbiter Node), this setup actually doesn=92t require the 3= rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></= font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Please see the attached document that was given to me by Red Hat to get= more information on this. Hope this information helps you.</i></font></div=
<div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style= =3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span><br st= yle=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvet= ica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<= a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovir= t.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> </body> </html> --_000_VI1P190MB0285197D4F878904201462F9C8B50VI1P190MB0285EURP_--

This is a multi-part message in MIME format. --------------ECDF126785BB7A19D9FCBAF5 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6 However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue. Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS. Thanks Regards Fernando On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------ECDF126785BB7A19D9FCBAF5 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------ECDF126785BB7A19D9FCBAF5--

This is a multi-part message in MIME format. --------------C2F37362CE639F36E44EBBD8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Frenando, Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on. Cheers Erekle On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------C2F37362CE639F36E44EBBD8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------C2F37362CE639F36E44EBBD8--

This is a multi-part message in MIME format. --------------A5088546CDF99006314CCEE8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Thanks for the clarification Erekle. However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs. An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times). Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well. Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ? Fernando On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------A5088546CDF99006314CCEE8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Thanks for the clarification Erekle.</p> <p>However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.<br> </p> <p>An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).</p> <p>Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br> </p> <p>Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </body> </html> --------------A5088546CDF99006314CCEE8--

On Mon, Aug 7, 2017 at 6:41 PM, FERNANDO FREDIANI <fernando.frediani@upx.com
wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
It does, but with HW based RAID it's not a big deal. The complexity is all the stripe size math... which I personally don't like to calculate.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
I think it's a matter of how fast you can replace a disk (over a long weekend?), how reliably you can do it (please, don't pull the wrong disk! I've seen it happening too many times!) and how much of a performance hit are you willing to accept while in degraded mode (and how long it took to detect it. HDDs, unlike SSDs, die slowly. At least when SSD dies, it dies a quick and determined death. HDDs may accumulate errors and errors and still function). Y. Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks:
*"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” *
*In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. *
*Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. *
*Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.*
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira ( moacirferreira@hotmail.com) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_912FAEC6-5476-4359-94B5-21FE877C0F94 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
Le 7 ao=C3=BBt 2017 =C3=A0 17:41, FERNANDO FREDIANI = <fernando.frediani@upx.com> a =C3=A9crit : =20
Yet another downside of having a RAID (specially RAID 5 or 6) is that = it reduces considerably the write speeds as each group of disks will end = up having the write speed of a single disk as all other disks of that = group have to wait for each other to write as well. =20
That's not true if you have medium to high range hardware raid. For = example, HP Smart Array come with a flash cache of about 1 or 2 Gb that = hides that from the OS.= --Apple-Mail=_912FAEC6-5476-4359-94B5-21FE877C0F94 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">Le 7 ao=C3=BBt 2017 =C3=A0 17:41, FERNANDO FREDIANI <<a = href=3D"mailto:fernando.frediani@upx.com" = class=3D"">fernando.frediani@upx.com</a>> a =C3=A9crit :</div><br = class=3D"Apple-interchange-newline"><div class=3D""><p = style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" = class=3D""></p></div></blockquote><br class=3D""><blockquote type=3D"cite"= class=3D""><div class=3D""><p style=3D"font-family: Helvetica; = font-size: 12px; font-style: normal; font-variant-caps: normal; = font-weight: normal; letter-spacing: normal; orphans: auto; text-align: = start; text-indent: 0px; text-transform: none; white-space: normal; = widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; = background-color: rgb(255, 255, 255);" class=3D"">Yet another downside = of having a RAID (specially RAID 5 or 6) is that it reduces considerably = the write speeds as each group of disks will end up having the write = speed of a single disk as all other disks of that group have to wait for = each other to write as well.<br class=3D""></p></div></blockquote><br = class=3D""></div>That's not true if you have medium to high range = hardware raid. For example, HP Smart Array come with a flash cache of = about 1 or 2 Gb that hides that from the OS.</body></html>= --Apple-Mail=_912FAEC6-5476-4359-94B5-21FE877C0F94--

This is a multi-part message in MIME format. --------------2B4B839116333F84136665DD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit What you mentioned is a specific case and not a generic situation. The main point there is that RAID 5 or 6 impacts write performance compared when you write to only 2 given disks at a time. That was the comparison made. Fernando On 07/08/2017 16:49, Fabrice Bacchella wrote:
Le 7 août 2017 à 17:41, FERNANDO FREDIANI <fernando.frediani@upx.com <mailto:fernando.frediani@upx.com>> a écrit :
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
That's not true if you have medium to high range hardware raid. For example, HP Smart Array come with a flash cache of about 1 or 2 Gb that hides that from the OS.
--------------2B4B839116333F84136665DD Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>What you mentioned is a specific case and not a generic situation. The main point there is that RAID 5 or 6 impacts write performance compared when you write to only 2 given disks at a time. That was the comparison made.<br> </p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 16:49, Fabrice Bacchella wrote:<br> </div> <blockquote type="cite" cite="mid:034E384B-4B32-4FBC-A760-404EB38EBC86@orange.fr"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <br class=""> <div> <blockquote type="cite" class=""> <div class="">Le 7 août 2017 à 17:41, FERNANDO FREDIANI <<a href="mailto:fernando.frediani@upx.com" class="" moz-do-not-send="true">fernando.frediani@upx.com</a>> a écrit :</div> <br class="Apple-interchange-newline"> </blockquote> <br class=""> <blockquote type="cite" class=""> <div class=""> <p style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class="">Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br class=""> </p> </div> </blockquote> <br class=""> </div> That's not true if you have medium to high range hardware raid. For example, HP Smart Array come with a flash cache of about 1 or 2 Gb that hides that from the OS. </blockquote> <br> </body> </html> --------------2B4B839116333F84136665DD--

This is a multi-part message in MIME format. --------------A0081E07F314CF9B687DDA86 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Franando, So let's go with the following scenarios: 1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy. 2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node. 3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.) That's basically it The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options. Cheers Erekle P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system). On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail erekle.magradze@recogizer.de Web: www.recogizer.com Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitter https://twitter.com/recogizer ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet. --------------A0081E07F314CF9B687DDA86 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi Franando,</p> <p>So let's go with the following scenarios:</p> <p>1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.</p> <p>2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.</p> <p>3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)</p> <p>That's basically it</p> <p>The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.</p> <p>Cheers</p> <p>Erekle</p> <p>P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Thanks for the clarification Erekle.</p> <p>However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.<br> </p> <p>An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).</p> <p>Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br> </p> <p>Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> <pre class="moz-signature" cols="72">-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de">erekle.magradze@recogizer.de</a> Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com">www.recogizer.com</a> Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/">https://www.linkedin.com/company-beta/10039182/</a> Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer">https://twitter.com/recogizer</a> ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre> </body> </html> --------------A0081E07F314CF9B687DDA86--

This is a multi-part message in MIME format. --------------0DC09401E4EDC25E68CD6208 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Fernando (sorry for misspelling your name, I used a different keyboard), So let's go with the following scenarios: 1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy. 2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node. 3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.) That's basically it The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options. Cheers Erekle P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system). On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail erekle.magradze@recogizer.de Web: www.recogizer.com Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitter https://twitter.com/recogizer ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet. --------------0DC09401E4EDC25E68CD6208 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi Fernando (sorry for misspelling your name, I used a different keyboard),</p> <p>So let's go with the following scenarios:</p> <p>1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.</p> <p>2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.</p> <p>3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)</p> <p>That's basically it</p> <p>The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.</p> <p>Cheers</p> <p>Erekle</p> P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).<br> <br> <div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Thanks for the clarification Erekle.</p> <p>However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.<br> </p> <p>An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).</p> <p>Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br> </p> <p>Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> <pre class="moz-signature" cols="72">-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de">erekle.magradze@recogizer.de</a> Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com">www.recogizer.com</a> Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/">https://www.linkedin.com/company-beta/10039182/</a> Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer">https://twitter.com/recogizer</a> ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre> </body> </html> --------------0DC09401E4EDC25E68CD6208--

This is a multi-part message in MIME format. --------------D415F827B0DC677345E35156 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Thanks for the detailed answer Erekle. I conclude that it is worth in any scenario to have a arbiter node in order to avoid wasting more disk space to RAID X + Gluster Replication on the top of it. The cost seems much lower if you consider running costs of the whole storage and compare it with the cost to build the arbiter node. Even having a fully redundant arbiter service with 2 nodes would make it wort on a larger deployment. Regards Fernando On 07/08/2017 17:07, Erekle Magradze wrote:
Hi Fernando (sorry for misspelling your name, I used a different keyboard),
So let's go with the following scenarios:
1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.
2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.
3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)
That's basically it
The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.
Cheers
Erekle
P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).
On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Recogizer Group GmbH
Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555
E-Mailerekle.magradze@recogizer.de Web:www.recogizer.com
Recogizer auf LinkedInhttps://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitterhttps://twitter.com/recogizer
----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
--------------D415F827B0DC677345E35156 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Thanks for the detailed answer Erekle.</p> <p>I conclude that it is worth in any scenario to have a arbiter node in order to avoid wasting more disk space to RAID X + Gluster Replication on the top of it. The cost seems much lower if you consider running costs of the whole storage and compare it with the cost to build the arbiter node. Even having a fully redundant arbiter service with 2 nodes would make it wort on a larger deployment.</p> <p>Regards<br> Fernando</p> <div class="moz-cite-prefix">On 07/08/2017 17:07, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:b61c0164-c933-8204-a949-0aa303983548@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Fernando (sorry for misspelling your name, I used a different keyboard),</p> <p>So let's go with the following scenarios:</p> <p>1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.</p> <p>2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.</p> <p>3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)</p> <p>That's basically it</p> <p>The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.</p> <p>Cheers</p> <p>Erekle</p> P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).<br> <br> <div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Thanks for the clarification Erekle.</p> <p>However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.<br> </p> <p>An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).</p> <p>Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br> </p> <p>Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> <pre class="moz-signature" cols="72">-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de" moz-do-not-send="true">erekle.magradze@recogizer.de</a> Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com" moz-do-not-send="true">www.recogizer.com</a> Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/" moz-do-not-send="true">https://www.linkedin.com/company-beta/10039182/</a> Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer" moz-do-not-send="true">https://twitter.com/recogizer</a> ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre> </blockquote> <br> </body> </html> --------------D415F827B0DC677345E35156--

This is a multi-part message in MIME format. --------------16E98D99A2429E767D89172D Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Fernando, Indeed, having and arbiter node is always a good idea, and it saves costs a lot. Good luck with your setup. Cheers Erekle On 07.08.2017 23:03, FERNANDO FREDIANI wrote:
Thanks for the detailed answer Erekle.
I conclude that it is worth in any scenario to have a arbiter node in order to avoid wasting more disk space to RAID X + Gluster Replication on the top of it. The cost seems much lower if you consider running costs of the whole storage and compare it with the cost to build the arbiter node. Even having a fully redundant arbiter service with 2 nodes would make it wort on a larger deployment.
Regards Fernando
On 07/08/2017 17:07, Erekle Magradze wrote:
Hi Fernando (sorry for misspelling your name, I used a different keyboard),
So let's go with the following scenarios:
1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.
2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.
3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)
That's basically it
The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.
Cheers
Erekle
P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).
On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks: / / /"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. / / / /Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. / / / /Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you./ / /
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.com <mailto:moacirferreira@hotmail.com>) wrote:
> I am willing to assemble a oVirt "pod", made of 3 servers, each > with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The > idea is to use GlusterFS to provide HA for the VMs. The 3 > servers have a dual 40Gb NIC and a dual 10Gb NIC. So my > intention is to create a loop like a server triangle using the > 40Gb NICs for virtualization files (VMs .qcow2) access and to > move VMs around the pod (east /west traffic) while using the > 10Gb interfaces for giving services to the outside world > (north/south traffic). > > > This said, my first question is: How should I deploy GlusterFS > in such oVirt scenario? My questions are: > > > 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt > node, and then create a GlusterFS using them? > > 2 - Instead, should I create a JBOD array made of all server's > disks? > > 3 - What is the best Gluster configuration to provide for HA > while not consuming too much disk space? > > 4 - Does a oVirt hypervisor pod like I am planning to build, and > the virtualization environment, benefits from tiering when using > a SSD disk? And yes, will Gluster do it by default or I have to > configure it to do so? > > > At the bottom line, what is the good practice for using > GlusterFS in small pods for enterprises? > > > You opinion/feedback will be really appreciated! > > Moacir > > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Recogizer Group GmbH
Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555
E-Mailerekle.magradze@recogizer.de Web:www.recogizer.com
Recogizer auf LinkedInhttps://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitterhttps://twitter.com/recogizer
----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
--------------16E98D99A2429E767D89172D Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi Fernando,<br> </p> <p>Indeed, having and arbiter node is always a good idea, and it saves costs a lot.</p> <p>Good luck with your setup.</p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 07.08.2017 23:03, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:47b0f3b5-a836-d5c2-7cf4-c0147aa3948f@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Thanks for the detailed answer Erekle.</p> <p>I conclude that it is worth in any scenario to have a arbiter node in order to avoid wasting more disk space to RAID X + Gluster Replication on the top of it. The cost seems much lower if you consider running costs of the whole storage and compare it with the cost to build the arbiter node. Even having a fully redundant arbiter service with 2 nodes would make it wort on a larger deployment.</p> <p>Regards<br> Fernando</p> <div class="moz-cite-prefix">On 07/08/2017 17:07, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:b61c0164-c933-8204-a949-0aa303983548@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Fernando (sorry for misspelling your name, I used a different keyboard),</p> <p>So let's go with the following scenarios:</p> <p>1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.</p> <p>2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.</p> <p>3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)</p> <p>That's basically it</p> <p>The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.</p> <p>Cheers</p> <p>Erekle</p> P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).<br> <br> <div class="moz-cite-prefix">On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:c7a1c2e1-57c3-9fa5-0710-ebee3f3fa069@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Thanks for the clarification Erekle.</p> <p>However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.<br> </p> <p>An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).</p> <p>Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.<br> </p> <p>Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 10:26, Erekle Magradze wrote:<br> </div> <blockquote type="cite" cite="mid:aa829d07-fa77-3ed9-2500-e33cc01414b6@recogizer.de"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>Hi Frenando,</p> <p>Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.<br> </p> <p>Cheers</p> <p>Erekle<br> </p> <br> <div class="moz-cite-prefix">On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:<br> </div> <blockquote type="cite" cite="mid:63bac47b-afe6-0258-d3d7-e545a5004c30@upx.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p>For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far:<br> - up to 4 disks RAID 5<br> - 5 or more disks RAID 6</p> <p>However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.</p> <p>Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.</p> <p>Thanks<br> Regards<br> Fernando<br> </p> <br> <div class="moz-cite-prefix">On 07/08/2017 03:46, Devin Acosta wrote:<br> </div> <blockquote type="cite" cite="mid:CANCGKEp4XGs0U+Qs78eEmqCNtvpLY-Azjb5DcGhZ9yiKTBEEfw@mail.gmail.com"> <style>body{font-family:Helvetica,Arial;font-size:13px}</style> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">Moacir,</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono">In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.</font></div> <div id="bloop_customfont" style="color:rgb(0,0,0);margin:0px"><font face="Input Mono"><br> </font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” </i></font></div> <div id="bloop_customfont" style="margin:0px"><br> </div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i>Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id="bloop_customfont" style="margin:0px"><font face="Input Mono"><i><br> </i></font></div> <br> <div id="bloop_sign_1502087376725469184" class="bloop_sign"><span style="font-family:'helvetica Neue',helvetica;font-size:14px">--</span><br style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div class="gmail_signature" style="font-family:'helvetica Neue',helvetica;font-size:14px"> <div dir="ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> </div> </div> </div> <br> <p class="airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<a href="mailto:moacirferreira@hotmail.com" moz-do-not-send="true">moacirferreira@hotmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span> <div dir="ltr"> <div> <title></title> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <br> <a href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users" moz-do-not-send="true">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> <pre class="moz-signature" cols="72">-- Recogizer Group GmbH Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 E-Mail <a class="moz-txt-link-abbreviated" href="mailto:erekle.magradze@recogizer.de" moz-do-not-send="true">erekle.magradze@recogizer.de</a> Web: <a class="moz-txt-link-abbreviated" href="http://www.recogizer.com" moz-do-not-send="true">www.recogizer.com</a> Recogizer auf LinkedIn <a class="moz-txt-link-freetext" href="https://www.linkedin.com/company-beta/10039182/" moz-do-not-send="true">https://www.linkedin.com/company-beta/10039182/</a> Folgen Sie uns auf Twitter <a class="moz-txt-link-freetext" href="https://twitter.com/recogizer" moz-do-not-send="true">https://twitter.com/recogizer</a> ----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.</pre> </blockquote> <br> </blockquote> <br> </body> </html> --------------16E98D99A2429E767D89172D--

On Tue, Aug 8, 2017 at 12:03 AM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:
Thanks for the detailed answer Erekle.
I conclude that it is worth in any scenario to have a arbiter node in order to avoid wasting more disk space to RAID X + Gluster Replication on the top of it. The cost seems much lower if you consider running costs of the whole storage and compare it with the cost to build the arbiter node. Even having a fully redundant arbiter service with 2 nodes would make it wort on a larger deployment.
Note that although you get the same consistency as a replica 3 setup, a 2+arbiter gives you data availability as a replica 2 setup. May or may not be OK with your high availability requirements. Y.
Regards Fernando On 07/08/2017 17:07, Erekle Magradze wrote:
Hi Fernando (sorry for misspelling your name, I used a different keyboard),
So let's go with the following scenarios:
1. Let's say you have two servers (replication factor is 2), i.e. two bricks per volume, in this case it is strongly recommended to have the arbiter node, the metadata storage that will guarantee avoiding the split brain situation, in this case for arbiter you don't even need a disk with lots of space, it's enough to have a tiny ssd but hosted on a separate server. Advantage of such setup is that you don't need the RAID 1 for each brick, you have the metadata information stored in arbiter node and brick replacement is easy.
2. If you have odd number of bricks (let's say 3, i.e. replication factor is 3) in your volume and you didn't create the arbiter node as well as you didn't configure the quorum, in this case the entire load for keeping the consistency of the volume resides on all 3 servers, each of them is important and each brick contains key information, they need to cross-check each other (that's what people usually do with the first try of gluster :) ), in this case replacing a brick is a big pain and in this case RAID 1 is a good option to have (that's the disadvantage, i.e. loosing the space and not having the JBOD option) advantage is that you don't have the to have additional arbiter node.
3. You have odd number of bricks and configured arbiter node, in this case you can easily go with JBOD, however a good practice would be to have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly sufficient for volumes with 10s of TB-s in size.)
That's basically it
The rest about the reliability and setup scenarios you can find in gluster documentation, especially look for quorum and arbiter node configs+options.
Cheers
Erekle P.S. What I was mentioning, regarding a good practice is mostly related to the operations of gluster not installation or deployment, i.e. not the conceptual understanding of gluster (conceptually it's a JBOD system).
On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as it adds another layer of complexity to the system (either a hardware or software RAID) before the gluster config and increase the system's overall costs.
An important point to consider is: In RAID configuration you already have space 'wasted' in order to build redundancy (either RAID 1, 5, or 6). Then when you have GlusterFS on the top of several RAIDs you have again more data replicated so you end up with the same data consuming more space in a group of disks and again on the top of several RAIDs depending on the Gluster configuration you have (in a RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that it reduces considerably the write speeds as each group of disks will end up having the write speed of a single disk as all other disks of that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this big pain you mentioned if the data is replicated somewhere else, can still be retrieved to both serve clients and reconstruct the equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a brick for gluster volume and it dies, i.e. it becomes not accessible it's a huge hassle to discard that brick and exchange with another one, since gluster some tries to access that broken brick and it's causing (at least it cause for me) a big pain, therefore it's better to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case if the disk is down you can easily exchange it and rebuild the RAID without going offline, i.e switching off the volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold rule which gave good results so far: - up to 4 disks RAID 5 - 5 or more disks RAID 6
However I didn't really understand well the recommendation to use any RAID with GlusterFS. I always thought that GlusteFS likes to work in JBOD mode and control the disks (bricks) directlly so you can create whatever distribution rule you wish, and if a single disk fails you just replace it and which obviously have the data replicated from another. The only downside of using in this way is that the replication data will be flow accross all servers but that is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks Regards Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Bricks:
*"A typical Gluster configuration would use RAID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD’s.” *
*In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I’d suggest maybe looking into the NVME option. *
*Since your using only 3-servers, what I’d probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn’t require the 3rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. *
*Please see the attached document that was given to me by Red Hat to get more information on this. Hope this information helps you.*
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira ( moacirferreira@hotmail.com) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so?
At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
-- Recogizer Group GmbH
Dr.rer.nat. Erekle Magradze Lead Big Data Engineering & DevOps Rheinwerkallee 2, 53227 Bonn Tel: +49 228 29974555 <+49%20228%2029974555>
E-Mail erekle.magradze@recogizer.de Web: www.recogizer.com
Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ Folgen Sie uns auf Twitter https://twitter.com/recogizer
----------------------------------------------------------------- Recogizer Group GmbH Geschäftsführer: Oliver Habisch, Carsten Kreutze Handelsregister: Amtsgericht Bonn HRB 20724 Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Here is the direct quote from Red Hat when I asked about RAID and Bri= cks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>"A typical Gluster configuration would use RAID underneath the bri= cks. RAID 6 is most typical as it gives you 2 disk failure protection, but = RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. Th= e most popular way of doing this would be distributed replicated with 2x re=
--_000_VI1P190MB0285AF47BE9B7FEE3EEB5067C8B50VI1P190MB0285EURP_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Devin, Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5= to assemble a single disk on each server. In this case, the SSD will not m= ake any difference, right? I guess that to be possible to use it, the SSD s= hould not be part of the RAID 5. In this case I could create a logical volu= me made of the RAIDed brick and then extend it using the SSD. I.e.: Using g= deploy: [disktype] jbod .... [pv1] action=3Dcreate devices=3Dsdb, sdc wipefs=3Dyes ignore_vg_erros=3Dno [vg1] action=3Dcreate vgname=3Dgluster_vg_jbod pvname=3Dsdb ignore_vg_erros=3Dno [vg2] action=3Dextend vgname=3Dgluster_vg_jbod pvname=3Dsdc ignore_vg_erros=3Dno But will Gluster be able to auto-detect and use this SSD brick for tiering?= Do I have to do some other configurations? Also, as the VM files (.qcow2) = are quite big will I benefit from tiering? This is wrong and my approach sh= ould be other? Thanks, Moacir ________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices Moacir, I have recently installed multiple Red Hat Virtualization hosts for several= different companies, and have dealt with the Red Hat Support Team in depth= about optimal configuration in regards to setting up GlusterFS most effici= ently and I wanted to share with you what I learned. In general Red Hat Virtualization team frowns upon using each DISK of the s= ystem as just a JBOD, sure there is some protection by having the data repl= icated, however, the recommendation is to use RAID 6 (preferred) or RAID-5,= or at least RAID-1 at the very least. Here is the direct quote from Red Hat when I asked about RAID and Bricks: "A typical Gluster configuration would use RAID underneath the bricks. RAID= 6 is most typical as it gives you 2 disk failure protection, but RAID 5 co= uld be used too. Once you have the RAIDed bricks, you'd then apply the desi= red replication on top of that. The most popular way of doing this would be= distributed replicated with 2x replication. In general you'll get better p= erformance with larger bricks. 12 drives is often a sweet spot. Another opt= ion would be to create a separate tier using all SSD=92s.=94 In order to SSD tiering from my understanding you would need 1 x NVMe drive= in each server, or 4 x SSD hot tier (it needs to be distributed, replicate= d for the hot tier if not using NVME). So with you only having 1 SSD drive = in each server, I=92d suggest maybe looking into the NVME option. Since your using only 3-servers, what I=92d probably suggest is to do (2 Re= plicas + Arbiter Node), this setup actually doesn=92t require the 3rd serve= r to have big drives at all as it only stores meta-data about the files and= not actually a full copy. Please see the attached document that was given to me by Red Hat to get mor= e information on this. Hope this information helps you. -- Devin Acosta, RHCA, RHVCA Red Hat Certified Architect On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>) wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_VI1P190MB0285AF47BE9B7FEE3EEB5067C8B50VI1P190MB0285EURP_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Hi Devin,</p> <p><br> </p> <p>Please consider that for the OS I have a RAID 1. Now, lets say I use RAI= D 5 to assemble a single disk on each server. In this case, the SSD will no= t make any difference, right? I guess that to be possible to use it, the SS= D should not be part of the RAID 5. In this case I could create a logical volume made of the RAIDed brick a= nd then extend it using the SSD. I.e.: Using gdeploy:</p> <p><br> </p> <p>[disktype]</p> <p>jbod</p> <p>....</p> <p>[pv1]</p> <p>action=3Dcreate</p> <p>devices=3Dsdb, sdc</p> <p>wipefs=3Dyes</p> <p></p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p></p> <p>[vg1]</p> <p><b>action=3Dcreate</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdb</p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p>[vg2]</p> <p><b>action=3Dextend</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdc</p> <p>ignore_vg_erros=3Dno<br> </p> <br> <p>But will Gluster be able to auto-detect and use this SSD brick for tieri= ng? Do I have to do some other configurations? Also, as the VM files (.qcow= 2) are quite big will I benefit from tiering? This is wrong and my approach= should be other?</p> <p><br> </p> <p>Thanks,</p> <p>Moacir<br> </p> <p><br> </p> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <de= vin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">I have recently installed multiple Red Hat Virtualizatio= n hosts for several different companies, and have dealt with the Red Hat Su= pport Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share = with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">In general Red Hat Virtualization team frowns upon using= each DISK of the system as just a JBOD, sure there is some protection by h= aving the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 = at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= plication. In general you'll get better performance with larger bricks= . 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=92s.=94 </i><= /font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i>In order to SSD tiering from my understanding you would need 1 x N= VMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, = replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=92d suggest may= be looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Since your using only 3-servers, what I=92d probably suggest is to do (= 2 Replicas + Arbiter Node), this setup actually doesn=92t require the 3= rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></= font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Please see the attached document that was given to me by Red Hat to get= more information on this. Hope this information helps you.</i></font></div=
<div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style= =3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span><br st= yle=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvet= ica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<= a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovir= t.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> </body> </html> --_000_VI1P190MB0285AF47BE9B7FEE3EEB5067C8B50VI1P190MB0285EURP_--

Hi Devin,
Please consider that for the OS I have a RAID 1. Now, lets say I use RA= ID 5=20 to assemble a single disk on each server. In this case, the SSD will no= t=20 make any difference, right? I guess that to be possible to use it, the = SSD=20 should not be part of the RAID 5. In this case I could create a logical= =20 volume made of the RAIDed brick and then extend it using the SSD. I.e.:= =20 Using gdeploy:
[disktype]
jbod
....
[pv1]
action=3Dcreate
devices=3Dsdb, sdc
wipefs=3Dyes
ignore_vg_erros=3Dno
[vg1]
action=3Dcreate
vgname=3Dgluster_vg_jbod
pvname=3Dsdb
ignore_vg_erros=3Dno
[vg2]
action=3Dextend
vgname=3Dgluster_vg_jbod
pvname=3Dsdc
ignore_vg_erros=3Dno
But will Gluster be able to auto-detect and use this SSD brick for tier= ing?=20 Do I have to do some other configurations? Also, as the VM files (.qcow= 2)=20 are quite big will I benefit from tiering? This is wrong and my approac= h=20 should be other?
Thanks,
Moacir
________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for sev= eral=20 different companies, and have dealt with the Red Hat Support Team in de=
about optimal configuration in regards to setting up GlusterFS most=20 efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of t= he=20 system as just a JBOD, sure there is some protection by having the data= =20 replicated, however, the recommendation is to use RAID 6 (preferred) or= =20 RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Brick= s:
"A typical Gluster configuration would use RAID underneath the bricks. = RAID=20 6 is most typical as it gives you 2 disk failure protection, but RAID 5= =20 could be used too. Once you have the RAIDed bricks, you'd then apply th= e=20 desired replication on top of that. The most popular way of doing this=20 would be distributed replicated with 2x replication. In general you'll = get=20 better performance with larger bricks. 12 drives is often a sweet spot.= =20 Another option would be to create a separate tier using all SSD=E2=80=99= s.=E2=80=9D
In order to SSD tiering from my understanding you would need 1 x NVMe d= rive=20 in each server, or 4 x SSD hot tier (it needs to be distributed, replic= ated=20 for the hot tier if not using NVME). So with you only having 1 SSD driv= e in=20 each server, I=E2=80=99d suggest maybe looking into the NVME option.
Since your using only 3-servers, what I=E2=80=99d probably suggest is t= o do (2=20 Replicas + Arbiter Node), this setup actually doesn=E2=80=99t require t= he 3rd=20 server to have big drives at all as it only stores meta-data about the=20 files and not actually a full copy.
Please see the attached document that was given to me by Red Hat to get= =20 more information on this. Hope this information helps you.
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira=20 (moacirferreira@hotmail.com<mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 = CPU=20 sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use=20 GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC= and=20 a dual 10Gb NIC. So my intention is to create a loop like a server tria= ngle=20 using the 40Gb NICs for virtualization files (VMs .qcow2) access and to= =20 move VMs around the pod (east /west traffic) while using the 10Gb=20 interfaces for giving services to the outside world (north/south traffi= c).
This said, my first question is: How should I deploy GlusterFS in such=20 oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and =
This is a multi-part message in MIME format. ------------15dc07141763cbc27688fec2dc Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: quoted-printable You attach the ssd as a hot tier with a gluster command. I don't think th= at=20 gdeploy or ovirt gui can do it. The gluster docs and redhat docs explains tiering quite good. /Johan On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com> w= rote: pth=20 then=20
create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not=20 consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the=20 virtualization environment, benefits from tiering when using a SSD disk= ?=20 And yes, will Gluster do it by default or I have to configure it to do = so?
At the bottom line, what is the good practice for using GlusterFS in sm= all=20 pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
---------- _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
<br
------------15dc07141763cbc27688fec2dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <html> <head> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style> </head> <body> <div style=3D"color: black;"> <div style=3D"color: black;"> <p style=3D"margin: 0 0 1em 0; color: black;">You attach the ssd as a hot tier with a gluster command. I don't think that gdeploy or ovirt gui can = do it.</p> <p style=3D"margin: 0 0 1em 0; color: black;">The gluster docs and redhat docs explains tiering quite good.</p> <p style=3D"margin: 0 0 1em 0; color: black;">/Johan</p> </div> <div style=3D"color: black;"> <p style=3D"color: black; font-size: 10pt; font-family: Arial, sans-serif; m= argin: 10pt 0;">On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com>= ; wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-le= ft: 0.75ex;"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-= serif;" dir=3D"ltr"> <p>Hi Devin,</p> <p><br> </p> <p>Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5 to assemble a single disk on each server. In this case, the SSD wi= ll not make any difference, right? I guess that to be possible to use it, th= e SSD should not be part of the RAID 5. In this case I could create a logical volume made of the RAIDed brick and then extend it using the SSD. I.e.: Using gdeploy:</p> <p><br> </p> <p>[disktype]</p> <p>jbod</p> <p>....</p> <p>[pv1]</p> <p>action=3Dcreate</p> <p>devices=3Dsdb, sdc</p> <p>wipefs=3Dyes</p> <p></p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p></p> <p>[vg1]</p> <p><b>action=3Dcreate</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdb</p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p>[vg2]</p> <p><b>action=3Dextend</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdc</p> <p>ignore_vg_erros=3Dno<br> </p> <br> <p>But will Gluster be able to auto-detect and use this SSD brick for tiering? Do I have to do some other configurations? Also, as the VM files (.qcow2) are quite big will I benefit from tiering? This is wrong and my approach should be other?</p> <p><br> </p> <p>Thanks,</p> <p>Moacir<br> </p> <p><br> </p> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <devin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">I have recently installed multiple Red Hat Virtualiza= tion hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to shar= e with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">In general Red Hat Virtualization team frowns upon us= ing each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-= 1 at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>"A typical Gluster configuration would use RA= ID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2= x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=E2=80=99s.=E2=80=9D </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=E2=80=99d sug= gest maybe looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>Since your using only 3-servers, what I=E2=80=99d = probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn=E2=80=99t require the 3rd server to have big drives at all as it on= ly stores meta-data about the files and not actually a full copy. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>Please see the attached document that was given to= me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style=3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span= style=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira = (<a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Helvetica,san= s-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD= . The idea is to use GlusterFS to provide HA for the VMs. The 3 servers hav= e a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services = to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?<= /p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so= ?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt= .org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> _______________________________________________<br> Users mailing list<br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"mailto:Users%40ovirt.org">Users@ovirt.org</a><br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt= .org/mailman/listinfo/users</a><br> <br></blockquote> </div> </div> </body> </html> ------------15dc07141763cbc27688fec2dc--

Here is the direct quote from Red Hat when I asked about RAID and Bri= cks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>"A typical Gluster configuration would use RAID underneath the bri= cks. RAID 6 is most typical as it gives you 2 disk failure protection, but = RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. Th= e most popular way of doing this would be distributed replicated with 2x re=
--_000_DB6P190MB0280BD0BB439D0C8A4588D00C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Thanks Johan, you brought "light" into my darkness! I went looking for the = GlusterFS tiering how-to and it looks like quite simple to attach a SSD as = hot tier. For those willing to read about it, go here: http://blog.gluster.= org/2016/03/automated-tiering-in-gluster/ Now, I still have a question: VMs are made of very large .qcow2 files. My u= nderstanding is that files in Gluster are kept all together in a single bri= ck. If so, I will not benefit from tiering as a single SSD will not be big = enough to fit all my large VM .qcow2 files. This would not be true if Glust= er can store "blocks" of data that compose a large file spread on several b= ricks. But if I am not wrong, this is one of key differences in between Glu= sterFS and Ceph. Can you comment? Moacir ________________________________ From: Johan Bernhardsson <johan@kafit.se> Sent: Tuesday, August 8, 2017 7:03 AM To: Moacir Ferreira; Devin Acosta; users@ovirt.org Subject: Re: [ovirt-users] Good practices You attach the ssd as a hot tier with a gluster command. I don't think that= gdeploy or ovirt gui can do it. The gluster docs and redhat docs explains tiering quite good. /Johan On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com> wro= te: Hi Devin, Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5= to assemble a single disk on each server. In this case, the SSD will not m= ake any difference, right? I guess that to be possible to use it, the SSD s= hould not be part of the RAID 5. In this case I could create a logical volu= me made of the RAIDed brick and then extend it using the SSD. I.e.: Using g= deploy: [disktype] jbod .... [pv1] action=3Dcreate devices=3Dsdb, sdc wipefs=3Dyes ignore_vg_erros=3Dno [vg1] action=3Dcreate vgname=3Dgluster_vg_jbod pvname=3Dsdb ignore_vg_erros=3Dno [vg2] action=3Dextend vgname=3Dgluster_vg_jbod pvname=3Dsdc ignore_vg_erros=3Dno But will Gluster be able to auto-detect and use this SSD brick for tiering?= Do I have to do some other configurations? Also, as the VM files (.qcow2) = are quite big will I benefit from tiering? This is wrong and my approach sh= ould be other? Thanks, Moacir ________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices Moacir, I have recently installed multiple Red Hat Virtualization hosts for several= different companies, and have dealt with the Red Hat Support Team in depth= about optimal configuration in regards to setting up GlusterFS most effici= ently and I wanted to share with you what I learned. In general Red Hat Virtualization team frowns upon using each DISK of the s= ystem as just a JBOD, sure there is some protection by having the data repl= icated, however, the recommendation is to use RAID 6 (preferred) or RAID-5,= or at least RAID-1 at the very least. Here is the direct quote from Red Hat when I asked about RAID and Bricks: "A typical Gluster configuration would use RAID underneath the bricks. RAID= 6 is most typical as it gives you 2 disk failure protection, but RAID 5 co= uld be used too. Once you have the RAIDed bricks, you'd then apply the desi= red replication on top of that. The most popular way of doing this would be= distributed replicated with 2x replication. In general you'll get better p= erformance with larger bricks. 12 drives is often a sweet spot. Another opt= ion would be to create a separate tier using all SSD=92s.=94 In order to SSD tiering from my understanding you would need 1 x NVMe drive= in each server, or 4 x SSD hot tier (it needs to be distributed, replicate= d for the hot tier if not using NVME). So with you only having 1 SSD drive = in each server, I=92d suggest maybe looking into the NVME option. Since your using only 3-servers, what I=92d probably suggest is to do (2 Re= plicas + Arbiter Node), this setup actually doesn=92t require the 3rd serve= r to have big drives at all as it only stores meta-data about the files and= not actually a full copy. Please see the attached document that was given to me by Red Hat to get mor= e information on this. Hope this information helps you. -- Devin Acosta, RHCA, RHVCA Red Hat Certified Architect On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>) wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users%40ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_DB6P190MB0280BD0BB439D0C8A4588D00C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Thanks Johan, you brought "light" into my darkness! I went loo= king for the GlusterFS tiering how-to and it looks like quite simple to att= ach a SSD as hot tier. For those willing to read about it, go here: <a href=3D"http://blog.gluster.org/2016/03/automated-tiering-in-gluster/" c= lass=3D"OWAAutoLink" id=3D"LPlnk939343" previewremoved=3D"true"> http://blog.gluster.org/2016/03/automated-tiering-in-gluster/</a></p> <p><br> </p> <p>Now, I still have a question: VMs are made of very large .qcow2 files. M= y understanding is that files in Gluster are kept all together in a single = brick. If so, I will not benefit from tiering as a single SSD will not be b= ig enough to fit all my large VM .qcow2 files. This would not be true if Gluster can store "blocks&quo= t; of data that compose a large file spread on several bricks. But if I am = not wrong, this is one of key differences in between GlusterFS and Ceph. Ca= n you comment?</p> <p><br> </p> <p>Moacir<br> </p> <br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Johan Bernhardsson = <johan@kafit.se><br> <b>Sent:</b> Tuesday, August 8, 2017 7:03 AM<br> <b>To:</b> Moacir Ferreira; Devin Acosta; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div style=3D"color:black"> <div style=3D"color:black"> <p style=3D"margin:0 0 1em 0; color:black">You attach the ssd as a hot tier= with a gluster command. I don't think that gdeploy or ovirt gui can do it.= </p> <p style=3D"margin:0 0 1em 0; color:black">The gluster docs and redhat docs= explains tiering quite good.</p> <p style=3D"margin:0 0 1em 0; color:black">/Johan</p> </div> <div style=3D"color:black"> <p style=3D"color:black; font-size:10pt; font-family:Arial,sans-serif; marg= in:10pt 0"> On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com&g= t; wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin:0 0 0 0.75e= x; border-left:1px solid #808080; padding-left:0.75ex"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Devin,</p> <p><br> </p> <p>Please consider that for the OS I have a RAID 1. Now, lets say I use RAI= D 5 to assemble a single disk on each server. In this case, the SSD will no= t make any difference, right? I guess that to be possible to use it, the SS= D should not be part of the RAID 5. In this case I could create a logical volume made of the RAIDed brick a= nd then extend it using the SSD. I.e.: Using gdeploy:</p> <p><br> </p> <p>[disktype]</p> <p>jbod</p> <p>....</p> <p>[pv1]</p> <p>action=3Dcreate</p> <p>devices=3Dsdb, sdc</p> <p>wipefs=3Dyes</p> <p></p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p></p> <p>[vg1]</p> <p><b>action=3Dcreate</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdb</p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p>[vg2]</p> <p><b>action=3Dextend</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdc</p> <p>ignore_vg_erros=3Dno<br> </p> <br> <p>But will Gluster be able to auto-detect and use this SSD brick for tieri= ng? Do I have to do some other configurations? Also, as the VM files (.qcow= 2) are quite big will I benefit from tiering? This is wrong and my approach= should be other?</p> <p><br> </p> <p>Thanks,</p> <p>Moacir<br> </p> <p><br> </p> <br> <div style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <de= vin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">I have recently installed multiple Red Hat Virtualizatio= n hosts for several different companies, and have dealt with the Red Hat Su= pport Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share = with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">In general Red Hat Virtualization team frowns upon using= each DISK of the system as just a JBOD, sure there is some protection by h= aving the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 = at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= plication. In general you'll get better performance with larger bricks= . 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=92s.=94 </i><= /font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i>In order to SSD tiering from my understanding you would need 1 x N= VMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, = replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=92d suggest may= be looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Since your using only 3-servers, what I=92d probably suggest is to do (= 2 Replicas + Arbiter Node), this setup actually doesn=92t require the 3= rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></= font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Please see the attached document that was given to me by Red Hat to get= more information on this. Hope this information helps you.</i></font></div=
<div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style= =3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span><br st= yle=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvet= ica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<= a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovir= t.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> _______________________________________________<br> Users mailing list<br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"mailto:Users%40ovirt.org">Us= ers@ovirt.org</a><br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"http://lists.ovirt.org/mailm= an/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> </div> </div> </div> </div> </body> </html> --_000_DB6P190MB0280BD0BB439D0C8A4588D00C88A0DB6P190MB0280EURP_--

Thanks Johan, you brought "light" into my darkness! I went looking for =
GlusterFS tiering how-to and it looks like quite simple to attach a SSD= as=20 hot tier. For those willing to read about it, go here:=20 http://blog.gluster.org/2016/03/automated-tiering-in-gluster/
Now, I still have a question: VMs are made of very large .qcow2 files. = My=20 understanding is that files in Gluster are kept all together in a singl= e=20 brick. If so, I will not benefit from tiering as a single SSD will not = be=20 big enough to fit all my large VM .qcow2 files. This would not be true = if=20 Gluster can store "blocks" of data that compose a large file spread on=20 several bricks. But if I am not wrong, this is one of key differences i= n=20 between GlusterFS and Ceph. Can you comment?
Moacir
________________________________ From: Johan Bernhardsson <johan@kafit.se> Sent: Tuesday, August 8, 2017 7:03 AM To: Moacir Ferreira; Devin Acosta; users@ovirt.org Subject: Re: [ovirt-users] Good practices
You attach the ssd as a hot tier with a gluster command. I don't think =
gdeploy or ovirt gui can do it.
The gluster docs and redhat docs explains tiering quite good.
/Johan
On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com>= wrote:
Hi Devin,
Please consider that for the OS I have a RAID 1. Now, lets say I use RA= ID 5=20 to assemble a single disk on each server. In this case, the SSD will no= t=20 make any difference, right? I guess that to be possible to use it, the = SSD=20 should not be part of the RAID 5. In this case I could create a logical= =20 volume made of the RAIDed brick and then extend it using the SSD. I.e.:= =20 Using gdeploy:
[disktype]
jbod
....
[pv1]
action=3Dcreate
devices=3Dsdb, sdc
wipefs=3Dyes
ignore_vg_erros=3Dno
[vg1]
action=3Dcreate
vgname=3Dgluster_vg_jbod
pvname=3Dsdb
ignore_vg_erros=3Dno
[vg2]
action=3Dextend
vgname=3Dgluster_vg_jbod
pvname=3Dsdc
ignore_vg_erros=3Dno
But will Gluster be able to auto-detect and use this SSD brick for tier= ing?=20 Do I have to do some other configurations? Also, as the VM files (.qcow= 2)=20 are quite big will I benefit from tiering? This is wrong and my approac= h=20 should be other?
Thanks,
Moacir
________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices
Moacir,
I have recently installed multiple Red Hat Virtualization hosts for sev= eral=20 different companies, and have dealt with the Red Hat Support Team in de=
about optimal configuration in regards to setting up GlusterFS most=20 efficiently and I wanted to share with you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK of t= he=20 system as just a JBOD, sure there is some protection by having the data= =20 replicated, however, the recommendation is to use RAID 6 (preferred) or= =20 RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and Brick= s:
"A typical Gluster configuration would use RAID underneath the bricks. = RAID=20 6 is most typical as it gives you 2 disk failure protection, but RAID 5= =20 could be used too. Once you have the RAIDed bricks, you'd then apply th= e=20 desired replication on top of that. The most popular way of doing this=20 would be distributed replicated with 2x replication. In general you'll = get=20 better performance with larger bricks. 12 drives is often a sweet spot.= =20 Another option would be to create a separate tier using all SSD=E2=80=99= s.=E2=80=9D
In order to SSD tiering from my understanding you would need 1 x NVMe d= rive=20 in each server, or 4 x SSD hot tier (it needs to be distributed, replic= ated=20 for the hot tier if not using NVME). So with you only having 1 SSD driv= e in=20 each server, I=E2=80=99d suggest maybe looking into the NVME option.
Since your using only 3-servers, what I=E2=80=99d probably suggest is t= o do (2=20 Replicas + Arbiter Node), this setup actually doesn=E2=80=99t require t= he 3rd=20 server to have big drives at all as it only stores meta-data about the=20 files and not actually a full copy.
Please see the attached document that was given to me by Red Hat to get= =20 more information on this. Hope this information helps you.
--
Devin Acosta, RHCA, RHVCA Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira=20 (moacirferreira@hotmail.com<mailto:moacirferreira@hotmail.com>) wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 = CPU=20 sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use=20 GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC= and=20 a dual 10Gb NIC. So my intention is to create a loop like a server tria= ngle=20 using the 40Gb NICs for virtualization files (VMs .qcow2) access and to= =20 move VMs around the pod (east /west traffic) while using the 10Gb=20 interfaces for giving services to the outside world (north/south traffi= c).
This said, my first question is: How should I deploy GlusterFS in such=20 oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and =
This is a multi-part message in MIME format. ------------15dc15fdd7045fa2768c750b77 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On ovirt gluster uses sharding. So all large files are broken up in small= =20 pieces on the gluster bricks. /Johan On August 8, 2017 12:19:39 Moacir Ferreira <moacirferreira@hotmail.com> w= rote: the=20 that=20 pth=20 then=20
create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA while not=20 consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and the=20 virtualization environment, benefits from tiering when using a SSD disk= ?=20 And yes, will Gluster do it by default or I have to configure it to do = so?
At the bottom line, what is the good practice for using GlusterFS in sm= all=20 pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users%40ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
<br
------------15dc15fdd7045fa2768c750b77 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <html> <head> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style> </head> <body> <div style=3D"color: black;"> <div style=3D"color: black;"> <p style=3D"margin: 0 0 1em 0; color: black;">On ovirt gluster uses shard= ing. So all large files are broken up in small pieces on the gluster bricks.</= p> <p style=3D"margin: 0 0 1em 0; color: black;">/Johan</p> </div> <div style=3D"color: black;"> <p style=3D"color: black; font-size: 10pt; font-family: Arial, sans-serif; m= argin: 10pt 0;">On August 8, 2017 12:19:39 Moacir Ferreira <moacirferreira@hotmail.com>= ; wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-le= ft: 0.75ex;"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-= serif;" dir=3D"ltr"> <p>Thanks Johan, you brought "light" into my darkness! I went looking for the GlusterFS tiering how-to and it looks like quite simple t= o attach a SSD as hot tier. For those willing to read about it, go here: <a href=3D"http://blog.gluster.org/2016/03/automated-tiering-in-gluster/" class=3D"OWAAutoLink" id=3D"LPlnk939343" previewremoved=3D"true"> http://blog.gluster.org/2016/03/automated-tiering-in-gluster/</a></p> <p><br> </p> <p>Now, I still have a question: VMs are made of very large .qcow2 files. My understanding is that files in Gluster are kept all together in a sing= le brick. If so, I will not benefit from tiering as a single SSD will not be big enough to fit all my large VM .qcow2 files. This would not be true if Gluster can store "blocks" of data that compose a large file spread on several bricks. But if I am not wrong, this is one of key differences in between GlusterFS and Ceph. Can you comment?</p> <p><br> </p> <p>Moacir<br> </p> <br> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Johan Bernhar= dsson <johan@kafit.se><br> <b>Sent:</b> Tuesday, August 8, 2017 7:03 AM<br> <b>To:</b> Moacir Ferreira; Devin Acosta; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div style=3D"color:black"> <div style=3D"color:black"> <p style=3D"margin:0 0 1em 0; color:black">You attach the ssd as a hot ti= er with a gluster command. I don't think that gdeploy or ovirt gui can do it= .</p> <p style=3D"margin:0 0 1em 0; color:black">The gluster docs and redhat do= cs explains tiering quite good.</p> <p style=3D"margin:0 0 1em 0; color:black">/Johan</p> </div> <div style=3D"color:black"> <p style=3D"color:black; font-size:10pt; font-family:Arial,sans-serif; margi= n:10pt 0"> On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com> wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin:0 0 0 0.75ex; border-left:1px solid #808080; padding-left= :0.75ex"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Helvetica,san= s-serif"> <p>Hi Devin,</p> <p><br> </p> <p>Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5 to assemble a single disk on each server. In this case, the SSD wi= ll not make any difference, right? I guess that to be possible to use it, th= e SSD should not be part of the RAID 5. In this case I could create a logical volume made of the RAIDed brick and then extend it using the SSD. I.e.: Using gdeploy:</p> <p><br> </p> <p>[disktype]</p> <p>jbod</p> <p>....</p> <p>[pv1]</p> <p>action=3Dcreate</p> <p>devices=3Dsdb, sdc</p> <p>wipefs=3Dyes</p> <p></p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p></p> <p>[vg1]</p> <p><b>action=3Dcreate</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdb</p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p>[vg2]</p> <p><b>action=3Dextend</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdc</p> <p>ignore_vg_erros=3Dno<br> </p> <br> <p>But will Gluster be able to auto-detect and use this SSD brick for tiering? Do I have to do some other configurations? Also, as the VM files (.qcow2) are quite big will I benefit from tiering? This is wrong and my approach should be other?</p> <p><br> </p> <p>Thanks,</p> <p>Moacir<br> </p> <p><br> </p> <br> <div style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <devin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">I have recently installed multiple Red Hat Virtualiza= tion hosts for several different companies, and have dealt with the Red Hat Support Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to shar= e with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono">In general Red Hat Virtualization team frowns upon us= ing each DISK of the system as just a JBOD, sure there is some protection by having the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-= 1 at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font face=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o">Here is the direct quote from Red Hat when I asked about RAID and Bricks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>"A typical Gluster configuration would use RA= ID underneath the bricks. RAID 6 is most typical as it gives you 2 disk failure protection, but RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. The most popular way of doing this would be distributed replicated with 2= x replication. In general you'll get better performance with larger bricks. 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=E2=80=99s.=E2=80=9D </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i>In order to SSD tiering from my understanding you would need 1 x NVMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=E2=80=99d sug= gest maybe looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>Since your using only 3-servers, what I=E2=80=99d = probably suggest is to do (2 Replicas + Arbiter Node), this setup actually doesn=E2=80=99t require the 3rd server to have big drives at all as it on= ly stores meta-data about the files and not actually a full copy. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"><i>Please see the attached document that was given to= me by Red Hat to get more information on this. Hope this information helps you.</i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mon= o"><i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style=3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span= style=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira = (<a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Helvetica,san= s-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD= . The idea is to use GlusterFS to provide HA for the VMs. The 3 servers hav= e a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VMs around the pod (east /west traffic) while using the 10Gb interfaces for giving services = to the outside world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such oVirt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?<= /p> <p>3 - What is the best Gluster configuration to provide for HA while not consuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the virtualization environment, benefits from tiering when using a SSD disk? And yes, will Gluster do it by default or I have to configure it to do so= ?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in small pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt= .org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> _______________________________________________<br> Users mailing list<br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"mailto:Users%40ovirt.org">Users@ovirt.org</a><br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt= .org/mailman/listinfo/users</a><br> <br> </blockquote> </div> </div> </div> </div> </div> </blockquote> </div> </div> </body> </html> ------------15dc15fdd7045fa2768c750b77--

Here is the direct quote from Red Hat when I asked about RAID and Bri= cks:</font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>"A typical Gluster configuration would use RAID underneath the bri= cks. RAID 6 is most typical as it gives you 2 disk failure protection, but = RAID 5 could be used too. Once you have the RAIDed bricks, you'd then apply the desired replication on top of that. Th= e most popular way of doing this would be distributed replicated with 2x re=
--_000_DB6P190MB0280F066762EF81DE4270280C88A0DB6P190MB0280EURP_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Thanks once again Johan! What would be your approach: JBOD straight or JBOD made of RAIDed bricks? Moacir ________________________________ From: Johan Bernhardsson <johan@kafit.se> Sent: Tuesday, August 8, 2017 11:24 AM To: Moacir Ferreira; Devin Acosta; users@ovirt.org Subject: Re: [ovirt-users] Good practices On ovirt gluster uses sharding. So all large files are broken up in small p= ieces on the gluster bricks. /Johan On August 8, 2017 12:19:39 Moacir Ferreira <moacirferreira@hotmail.com> wro= te: Thanks Johan, you brought "light" into my darkness! I went looking for the = GlusterFS tiering how-to and it looks like quite simple to attach a SSD as = hot tier. For those willing to read about it, go here: http://blog.gluster.= org/2016/03/automated-tiering-in-gluster/ Now, I still have a question: VMs are made of very large .qcow2 files. My u= nderstanding is that files in Gluster are kept all together in a single bri= ck. If so, I will not benefit from tiering as a single SSD will not be big = enough to fit all my large VM .qcow2 files. This would not be true if Glust= er can store "blocks" of data that compose a large file spread on several b= ricks. But if I am not wrong, this is one of key differences in between Glu= sterFS and Ceph. Can you comment? Moacir ________________________________ From: Johan Bernhardsson <johan@kafit.se> Sent: Tuesday, August 8, 2017 7:03 AM To: Moacir Ferreira; Devin Acosta; users@ovirt.org Subject: Re: [ovirt-users] Good practices You attach the ssd as a hot tier with a gluster command. I don't think that= gdeploy or ovirt gui can do it. The gluster docs and redhat docs explains tiering quite good. /Johan On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com> wro= te: Hi Devin, Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5= to assemble a single disk on each server. In this case, the SSD will not m= ake any difference, right? I guess that to be possible to use it, the SSD s= hould not be part of the RAID 5. In this case I could create a logical volu= me made of the RAIDed brick and then extend it using the SSD. I.e.: Using g= deploy: [disktype] jbod .... [pv1] action=3Dcreate devices=3Dsdb, sdc wipefs=3Dyes ignore_vg_erros=3Dno [vg1] action=3Dcreate vgname=3Dgluster_vg_jbod pvname=3Dsdb ignore_vg_erros=3Dno [vg2] action=3Dextend vgname=3Dgluster_vg_jbod pvname=3Dsdc ignore_vg_erros=3Dno But will Gluster be able to auto-detect and use this SSD brick for tiering?= Do I have to do some other configurations? Also, as the VM files (.qcow2) = are quite big will I benefit from tiering? This is wrong and my approach sh= ould be other? Thanks, Moacir ________________________________ From: Devin Acosta <devin@pabstatencio.com> Sent: Monday, August 7, 2017 7:46 AM To: Moacir Ferreira; users@ovirt.org Subject: Re: [ovirt-users] Good practices Moacir, I have recently installed multiple Red Hat Virtualization hosts for several= different companies, and have dealt with the Red Hat Support Team in depth= about optimal configuration in regards to setting up GlusterFS most effici= ently and I wanted to share with you what I learned. In general Red Hat Virtualization team frowns upon using each DISK of the s= ystem as just a JBOD, sure there is some protection by having the data repl= icated, however, the recommendation is to use RAID 6 (preferred) or RAID-5,= or at least RAID-1 at the very least. Here is the direct quote from Red Hat when I asked about RAID and Bricks: "A typical Gluster configuration would use RAID underneath the bricks. RAID= 6 is most typical as it gives you 2 disk failure protection, but RAID 5 co= uld be used too. Once you have the RAIDed bricks, you'd then apply the desi= red replication on top of that. The most popular way of doing this would be= distributed replicated with 2x replication. In general you'll get better p= erformance with larger bricks. 12 drives is often a sweet spot. Another opt= ion would be to create a separate tier using all SSD=92s.=94 In order to SSD tiering from my understanding you would need 1 x NVMe drive= in each server, or 4 x SSD hot tier (it needs to be distributed, replicate= d for the hot tier if not using NVME). So with you only having 1 SSD drive = in each server, I=92d suggest maybe looking into the NVME option. Since your using only 3-servers, what I=92d probably suggest is to do (2 Re= plicas + Arbiter Node), this setup actually doesn=92t require the 3rd serve= r to have big drives at all as it only stores meta-data about the files and= not actually a full copy. Please see the attached document that was given to me by Red Hat to get mor= e information on this. Hope this information helps you. -- Devin Acosta, RHCA, RHVCA Red Hat Certified Architect On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (moacirferreira@hotmail.co= m<mailto:moacirferreira@hotmail.com>) wrote: I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU = sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use Gluste= rFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and a dua= l 10Gb NIC. So my intention is to create a loop like a server triangle usin= g the 40Gb NICs for virtualization files (VMs .qcow2) access and to move VM= s around the pod (east /west traffic) while using the 10Gb interfaces for g= iving services to the outside world (north/south traffic). This said, my first question is: How should I deploy GlusterFS in such oVir= t scenario? My questions are: 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then= create a GlusterFS using them? 2 - Instead, should I create a JBOD array made of all server's disks? 3 - What is the best Gluster configuration to provide for HA while not cons= uming too much disk space? 4 - Does a oVirt hypervisor pod like I am planning to build, and the virtua= lization environment, benefits from tiering when using a SSD disk? And yes,= will Gluster do it by default or I have to configure it to do so? At the bottom line, what is the good practice for using GlusterFS in small = pods for enterprises? You opinion/feedback will be really appreciated! Moacir _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users%40ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_DB6P190MB0280F066762EF81DE4270280C88A0DB6P190MB0280EURP_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi= n-bottom:0;} --></style> </head> <body dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font= -family:Calibri,Helvetica,sans-serif;" dir=3D"ltr"> <p>Thanks once again Johan!<br> </p> <br> <p>What would be your approach: JBOD straight or JBOD made of RAIDed bricks= ?</p> <p><br> </p> <p>Moacir<br> </p> <br> <div style=3D"color: rgb(49, 55, 57);"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Johan Bernhardsson = <johan@kafit.se><br> <b>Sent:</b> Tuesday, August 8, 2017 11:24 AM<br> <b>To:</b> Moacir Ferreira; Devin Acosta; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div style=3D"color:black"> <div style=3D"color:black"> <p style=3D"margin:0 0 1em 0; color:black">On ovirt gluster uses sharding. = So all large files are broken up in small pieces on the gluster bricks.</p> <p style=3D"margin:0 0 1em 0; color:black">/Johan</p> </div> <div style=3D"color:black"> <p style=3D"color:black; font-size:10pt; font-family:Arial,sans-serif; marg= in:10pt 0"> On August 8, 2017 12:19:39 Moacir Ferreira <moacirferreira@hotmail.com&g= t; wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin:0 0 0 0.75e= x; border-left:1px solid #808080; padding-left:0.75ex"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Thanks Johan, you brought "light" into my darkness! I went loo= king for the GlusterFS tiering how-to and it looks like quite simple to att= ach a SSD as hot tier. For those willing to read about it, go here: <a href=3D"http://blog.gluster.org/2016/03/automated-tiering-in-gluster/" c= lass=3D"OWAAutoLink" id=3D"LPlnk939343" previewremoved=3D"true"> http://blog.gluster.org/2016/03/automated-tiering-in-gluster/</a></p> <p><br> </p> <p>Now, I still have a question: VMs are made of very large .qcow2 files. M= y understanding is that files in Gluster are kept all together in a single = brick. If so, I will not benefit from tiering as a single SSD will not be b= ig enough to fit all my large VM .qcow2 files. This would not be true if Gluster can store "blocks&quo= t; of data that compose a large file spread on several bricks. But if I am = not wrong, this is one of key differences in between GlusterFS and Ceph. Ca= n you comment?</p> <p><br> </p> <p>Moacir<br> </p> <br> <br> <div style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Johan Bernhardsson = <johan@kafit.se><br> <b>Sent:</b> Tuesday, August 8, 2017 7:03 AM<br> <b>To:</b> Moacir Ferreira; Devin Acosta; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div style=3D"color:black"> <div style=3D"color:black"> <p style=3D"margin:0 0 1em 0; color:black">You attach the ssd as a hot tier= with a gluster command. I don't think that gdeploy or ovirt gui can do it.= </p> <p style=3D"margin:0 0 1em 0; color:black">The gluster docs and redhat docs= explains tiering quite good.</p> <p style=3D"margin:0 0 1em 0; color:black">/Johan</p> </div> <div style=3D"color:black"> <p style=3D"color:black; font-size:10pt; font-family:Arial,sans-serif; marg= in:10pt 0"> On August 8, 2017 07:06:42 Moacir Ferreira <moacirferreira@hotmail.com&g= t; wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin:0 0 0 0.75e= x; border-left:1px solid #808080; padding-left:0.75ex"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Hi Devin,</p> <p><br> </p> <p>Please consider that for the OS I have a RAID 1. Now, lets say I use RAI= D 5 to assemble a single disk on each server. In this case, the SSD will no= t make any difference, right? I guess that to be possible to use it, the SS= D should not be part of the RAID 5. In this case I could create a logical volume made of the RAIDed brick a= nd then extend it using the SSD. I.e.: Using gdeploy:</p> <p><br> </p> <p>[disktype]</p> <p>jbod</p> <p>....</p> <p>[pv1]</p> <p>action=3Dcreate</p> <p>devices=3Dsdb, sdc</p> <p>wipefs=3Dyes</p> <p></p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p></p> <p>[vg1]</p> <p><b>action=3Dcreate</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdb</p> <p>ignore_vg_erros=3Dno<br> </p> <p><br> </p> <p>[vg2]</p> <p><b>action=3Dextend</b></p> <p>vgname=3Dgluster_vg_jbod</p> <p>pvname=3Dsdc</p> <p>ignore_vg_erros=3Dno<br> </p> <br> <p>But will Gluster be able to auto-detect and use this SSD brick for tieri= ng? Do I have to do some other configurations? Also, as the VM files (.qcow= 2) are quite big will I benefit from tiering? This is wrong and my approach= should be other?</p> <p><br> </p> <p>Thanks,</p> <p>Moacir<br> </p> <p><br> </p> <br> <div style=3D"color:rgb(49,55,57)"> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face= =3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Devin Acosta <de= vin@pabstatencio.com><br> <b>Sent:</b> Monday, August 7, 2017 7:46 AM<br> <b>To:</b> Moacir Ferreira; users@ovirt.org<br> <b>Subject:</b> Re: [ovirt-users] Good practices</font> <div> </div> </div> <div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">Moacir,</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">I have recently installed multiple Red Hat Virtualizatio= n hosts for several different companies, and have dealt with the Red Hat Su= pport Team in depth about optimal configuration in regards to setting up GlusterFS most efficiently and I wanted to share = with you what I learned.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono">In general Red Hat Virtualization team frowns upon using= each DISK of the system as just a JBOD, sure there is some protection by h= aving the data replicated, however, the recommendation is to use RAID 6 (preferred) or RAID-5, or at least RAID-1 = at the very least.</font></div> <div id=3D"bloop_customfont" style=3D"color:rgb(0,0,0); margin:0px"><font f= ace=3D"Input Mono"><br> </font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= plication. In general you'll get better performance with larger bricks= . 12 drives is often a sweet spot. Another option would be to create a separate tier using all SSD=92s.=94 </i><= /font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><br> </div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i>In order to SSD tiering from my understanding you would need 1 x N= VMe drive in each server, or 4 x SSD hot tier (it needs to be distributed, = replicated for the hot tier if not using NVME). So with you only having 1 SSD drive in each server, I=92d suggest may= be looking into the NVME option. </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Since your using only 3-servers, what I=92d probably suggest is to do (= 2 Replicas + Arbiter Node), this setup actually doesn=92t require the 3= rd server to have big drives at all as it only stores meta-data about the files and not actually a full copy. </i></= font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i><br> </i></font></div> <div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"= <i>Please see the attached document that was given to me by Red Hat to get= more information on this. Hope this information helps you.</i></font></div=
<div id=3D"bloop_customfont" style=3D"margin:0px"><font face=3D"Input Mono"=
<i><br> </i></font></div> <br> <div id=3D"bloop_sign_1502087376725469184" class=3D"bloop_sign"><span style= =3D"font-family:'helvetica Neue',helvetica; font-size:14px">--</span><br st= yle=3D"font-family:'helvetica Neue',helvetica; font-size:14px"> <div class=3D"gmail_signature" style=3D"font-family:'helvetica Neue',helvet= ica; font-size:14px"> <div dir=3D"ltr"> <div><br> </div> <div>Devin Acosta, RHCA, RHVCA</div> <div>Red Hat Certified Architect</div> <div></div> </div> </div> </div> <br> <p class=3D"airmail_on">On August 6, 2017 at 7:29:29 PM, Moacir Ferreira (<= a href=3D"mailto:moacirferreira@hotmail.com">moacirferreira@hotmail.com</a>= ) wrote:</p> <blockquote type=3D"cite" class=3D"clean_bq"><span> <div dir=3D"ltr"> <div></div> <div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p><span>I am willing to assemble a oVirt "pod", made of 3 server= s, each with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The id= ea is to use GlusterFS to provide HA for the VMs. The 3 servers have a dual= 40Gb NIC and a dual 10Gb NIC. So my intention is to create a loop like a server triangle using the 40Gb NICs for virtual= ization files (VMs .qcow2) access and to move VMs around the pod (east /wes= t traffic) while using the 10Gb interfaces for giving services to the outsi= de world (north/south traffic).</span></p> <p><br> </p> <p>This said, my first question is: How should I deploy GlusterFS in such o= Virt scenario? My questions are:</p> <p><br> </p> <p>1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and t= hen create a GlusterFS using them?</p> <p>2 - Instead, should I create a JBOD array made of all server's disks?</p=
<p>3 - What is the best Gluster configuration to provide for HA while not c= onsuming too much disk space?<br> </p> <p>4 - Does a oVirt hypervisor pod like I am planning to build, and the vir= tualization environment, benefits from tiering when using a SSD disk? And y= es, will Gluster do it by default or I have to configure it to do so?</p> <p><br> </p> <p>At the bottom line, what is the good practice for using GlusterFS in sma= ll pods for enterprises?<br> </p> <p><br> </p> <p>You opinion/feedback will be really appreciated!</p> <p>Moacir<br> </p> </div> _______________________________________________ <br> Users mailing list <br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovir= t.org/mailman/listinfo/users</a> <br> </div> </div> </span></blockquote> </div> </div> </div> _______________________________________________<br> Users mailing list<br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"mailto:Users%40ovirt.org">Us= ers@ovirt.org</a><br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"http://lists.ovirt.org/mailm= an/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> </div> </div> </div> </div> </blockquote> </div> </div> </div> </div> </div> </body> </html> --_000_DB6P190MB0280F066762EF81DE4270280C88A0DB6P190MB0280EURP_--
participants (10)
-
Colin Coe
-
Devin Acosta
-
Erekle Magradze
-
Fabrice Bacchella
-
FERNANDO FREDIANI
-
Johan Bernhardsson
-
Karli Sjöberg
-
Moacir Ferreira
-
Pavel Gashev
-
Yaniv Kaul