Just wanted to comment on this again. Today I rebuilt my oVirt environment
as I wanted to change disk/volume layout one final time before making use
of the cluster. I downloaded the most recent oVirt node image linked off
the ovirt site and used cockpit to deploy. Again, the gluster config
defaults were not set to optimize for virt. My gluster performance was as
bad as the first time around 5Mb/sec on DD tests. After optimizing volumes
for virt store it increased by 10x. If these settings are suppose to be
applied by default it does not appear to be working as intended..
- Jayme
On Mon, Aug 6, 2018 at 12:15 AM, Darrell Budic <budic(a)onholyground.com>
wrote:
The defaults are a queue depth of 1000 and 1 thread. Recommended
settings
are going to depend on what kind of hardware you’re running it on, load,
and memory as much or more than disk type/speed, from my experience.
I’d probably recommend a # of queues equal to half my total CPU core
count, leaving the other half for handling actually serving data. Unless
it’s hyper-converged, then I’d keep it to 1 or two, since those CPUs would
also be serving VM. For the queue dept, I don’t have any good ideas other
than using the default 1000. Especially if you don’t have a huge storage
system, it won’t make a big difference. One other thing I’m not sure of is
if that’s threads per SHD, if it’s per, you get one per volume and might
want to limit it even more. My reason is that if gluster can max your CPU,
it’s got high enough settings for those two vars :)
And these numbers are relative, I was testing with 8/10000 after a post in
gluster-users suggested it helped speed up healing time, and I found it
took my systems about 4 or 5 hours to heal fully after rebooting a server.
BUT they also staved my VMs for iops and eventually corrupted some disks
due to io timeouts and failures. VMs would pause all the time as well. I
have about 60 VMs on the main cluster of 3 stand alone servers, several
volumes. Ugh. With 1/1000 it takes about 6 hours to fully heal after a
reboot, but no VM thrashing on disk and nothings been corrupted since. Note
that it’s actually healing all that time, but at least one node will be
maxing it’s CPU (even with 1/1000) comparing files and making sure they are
synced with the other servers.
Ovirt Devs, if you’ve made the default optimized setting or cockpit setup
8/10000, I think you’re doing most folk a dis-service unless they have
massive servers..
------------------------------
*From:* Jayme <jaymef(a)gmail.com>
*Subject:* Re: [ovirt-users] Tuning and testing GlusterFS performance
*Date:* August 5, 2018 at 2:21:00 PM EDT
*To:* William Dossett
*Cc:* Darrell Budic; users
I can't imagine too many probs with such a minor update I've been doing
updates on Ovirt for a while (non gluster) and haven't had too many
problems
On Sun, Aug 5, 2018, 2:49 PM William Dossett, <william.dossett(a)gmail.com>
wrote:
> Ah. Ok… mine are the H710s and yes I had to do virtual drives at RAID 0.
> I’ve got my first templates up and running now anyway, getting ready to
> demo this to mgmt. late this week or early next. Hoping to get some budget
> for flash drives after that.
>
>
>
> They got quotes in for renewing our VMware licensing last week… ½ a
> million! So I have a fairly interested audience 😊
>
>
>
> Pretty sure with some cash I can get the performance we need using
> flash, the other thing will be upgrades… going to see how the upgrade
> from 4.2.4 to 4.2.5 goes this week. Classically this is where open source
> has failed me in the past, but this is feeling much more like a finished
> product than it used to.
>
>
>
> Regards
>
> Bill
>
>
>
>
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Sent:* Sunday, August 5, 2018 10:18 AM
> *To:* William Dossett <william.dossett(a)gmail.com>
> *Cc:* Darrell Budic <budic(a)onholyground.com>; users <users(a)ovirt.org>
> *Subject:* Re: [ovirt-users] Tuning and testing GlusterFS performance
>
>
>
> I'm using h310s which are known to have crap queue depth, I'm using them
> because they are one of the only percs that allow you to do both raid and
> passtrhough jbod instead of having to jbod using individual raid 0s. They
> should be fine but could bottleneck during an intensive brick rebuild in
> addition to regular volume activity
>
>
>
> On Sun, Aug 5, 2018, 1:06 PM William Dossett, <william.dossett(a)gmail.com>
> wrote:
>
> I think Percs have queue depth of 31 if that’s of any help… fairly common
> with that level of controller.
>
>
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Sent:* Sunday, August 5, 2018 9:50 AM
> *To:* Darrell Budic <budic(a)onholyground.com>
> *Cc:* William Dossett <william.dossett(a)gmail.com>; users <users(a)ovirt.org
> >
> *Subject:* Re: [ovirt-users] Tuning and testing GlusterFS performance
>
>
>
> I would have to assume so because I have not manually modified any
> gluster volume settings after performing gdeploy via cockpit. What would
> you recommend these values be set to and does the fact that I am running
> SSDs make any difference in this regard? I've been a bit concerned about
> how a rebuild might affect performance as the raid controllers in these
> servers doesn't have a large queue depth
>
>
>
> On Sun, Aug 5, 2018, 12:07 PM Darrell Budic, <budic(a)onholyground.com>
> wrote:
>
> It set these by default?
>
>
>
> cluster.shd-wait-qlength: 10000
>
> cluster.shd-max-threads: 8
>
>
>
> In my experience, these are WAY too high and will degrade performance to
> the point of causing problems on decently used volumes during a heal. If
> these are being set by the HCI installer, I’d recommend changing them.
>
>
>
>
>
> ------------------------------
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Subject:* [ovirt-users] Re: Tuning and testing GlusterFS performance
> *Date:* August 4, 2018 at 10:31:30 AM EDT
> *To:* William Dossett
> *Cc:* users
>
> Yes the volume options can be changed on the fly post creation no
> problem. Good luck!
>
>
>
> On Sat, Aug 4, 2018, 11:23 AM William Dossett, <william.dossett(a)gmail.com>
> wrote:
>
> Hey, thanks! Good catch! Going to have to take a look at that, will be
> working on it this weekend.. hopefully we can do this post creation.
>
>
>
> Thanks again
>
> Bill
>
>
>
>
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Sent:* Thursday, August 2, 2018 5:56 PM
> *To:* William Dossett <william.dossett(a)gmail.com>
> *Cc:* users <users(a)ovirt.org>
> *Subject:* Re: [ovirt-users] Tuning and testing GlusterFS performance
>
>
>
> Bill,
>
>
>
> I thought I'd let you (and others know this) as it might save you some
> headaches. I found that my performance problem was resolved by clicking
> "optimize for virt store" option in the volume settings of the hosted
> engine (for the data volume). Doing this one change has increased my I/O
> performance by 10x alone. I don't know why this would not be set or
> recommended by default but I'm glad I found it!
>
>
>
> - James
>
>
>
> On Thu, Aug 2, 2018 at 2:32 PM, William Dossett <
> william.dossett(a)gmail.com> wrote:
>
> Yeah, I am just ramping up here, but this project is mostly on my own
> time and money, hence no SSDs for Gluster… I’ve already blown close to $500
> of my own money on 10Gb ethernet cards and SFPs on ebay as my company
> frowns on us getting good deals for equipment on ebay and would rather go
> to their preferred supplier – where $500 wouldn’t even buy half a 10Gb CNA
> ☹ but I believe in this project and it feels like it is getting ready
> for showtime – if I can demo this in a few weeks and get some interest I’ll
> be asking them to reimburse me, that’s for sure!
>
>
>
> Hopefully going to get some of the other work off my plate and work on
> this later this afternoon, will let you know any findings.
>
>
>
> Regards
>
> Bill
>
>
>
>
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Sent:* Thursday, August 2, 2018 11:07 AM
> *To:* William Dossett <william.dossett(a)gmail.com>
> *Cc:* users <users(a)ovirt.org>
> *Subject:* Re: [ovirt-users] Tuning and testing GlusterFS performance
>
>
>
> Bill,
>
>
>
> Appreciate the feedback and would be interested to hear some of your
> results. I'm a bit worried about what i'm seeing so far on a very stock 3
> node HCI setup. 8mb/sec on that dd test mentioned in the original post
> from within a VM (which may be explained by bad testing methods or some
> other configuration considerations).. but what is more worrisome to me is
> that I tried another dd test to time creating a 32GB file, it was taking a
> long time so I exited the process and the VM basically locked up on me, I
> couldn't access it or the console and eventually had to do a hard shutdown
> of the VM to recover.
>
>
>
> I don't plan to host many VMs, probably around 15. They aren't super
> demanding servers but some do read/write big directories such as working
> with github repos and large node_module folders, rsyncs of fairly large
> dirs etc. I'm definitely going to have to do a lot more testing before I
> can be assured enough to put any important VMs on this cluster.
>
>
>
> - James
>
>
>
> On Thu, Aug 2, 2018 at 1:54 PM, William Dossett <
> william.dossett(a)gmail.com> wrote:
>
> I usually look at IOPs using IOMeter… you usually want several workers
> running reads and writes in different threads at the same time. You can
> run Dynamo on a Linux instance and then connect it to a window GUI running
> IOMeter to give you stats. I was getting around 250 IOPs on JBOD sata
> 7200rpm drives which isn’t bad for cheap and cheerful sata drives.
>
>
>
> As I said, I’ve worked with HCI in VMware now for a couple of years,
> intensely this last year when we had some defective Dell hardware and
> trying to diagnose the problem. Since then the hardware has been
> completely replaced with all flash solution. So when I got the all flash
> solution I used IOmeter on it and was only getting around 3000 IOPs on
> enterprise flash disks… not exactly stellar, but OK for one VM. The trick
> there was the scale out. There is a VMware Fling call HCI Bench. Its very
> cool in that you spin up one VM and then it spawns 40 more VMs across the
> cluster. I could then use VSAN observer and it showed my hosts were
> actually doing 30K IOPs on average which is absolutely stellar
> performance.
>
>
>
> Anyway, moral of the story there was that your one VM may seem like its
> quick, but not what you would expect from flash… but as you add more VMs
> in the cluster and they are all doing workloads, it scales out beautifully
> and the read/write speed does not slow down as you add more loads. I’m
> hoping that’s what we are going to see with Gluster.
>
>
>
> Also, you are using mb nomenclature below, is that Mb, or MB? I am sort
> of assuming MB megabytes per second… it does not seem very fast. I’m
> probably not going to get to work more on my cluster today as I’ve got
> other projects that I need to get done on time, but I want to try and get
> some templates up and running and do some more testing either tomorrow or
> this weekend and see what I get in just basic writing MB/s and let you know.
>
>
>
> Regards
>
> Bill
>
>
>
>
>
> *From:* Jayme <jaymef(a)gmail.com>
> *Sent:* Thursday, August 2, 2018 8:12 AM
> *To:* users <users(a)ovirt.org>
> *Subject:* [ovirt-users] Tuning and testing GlusterFS performance
>
>
>
> So I've finally completed my first HCI build using the below
> configuration:
>
>
>
> 3x
>
> Dell PowerEdge R720
>
> 2x 2.9 GHz 8 Core E5-2690
>
> 256GB RAM
>
> 2x250gb SSD Raid 1 (boot/os)
>
> 2x2TB SSD jbod passthrough (used for gluster bricks)
>
> 1Gbe Nic for management 10Gbe nic for Gluster
>
>
>
> Using Replica 3 with no arbiter.
>
>
>
> Installed the latest version of oVirt available at the time 4.2.5.
> Created recommended volumes (with an additional data volume on second SSD).
> Not using VDO
>
>
>
> First thing I did was setup glusterFS network on 10Gbe and set it to be
> used for glusterFS and migration traffic.
>
>
>
> I've setup a single test VM using Centos7 minimal on the default "x-large
> instance" profile.
>
>
>
> Within this VM if I do very basic write test using something like:
>
>
>
> dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync
>
>
>
> I'm seeing quite slow speeds, only 8mb/sec.
>
>
>
> If I do the same from one of the hosts gluster mounts i.e.
>
>
>
> host1: /rhev/data-center/mnt/glusterSD/HOST:data
>
>
>
> I get about 30mb/sec (which still seems fairly low?)
>
>
>
> Am I testing incorrectly here? Is there anything I should be tuning on
> the Gluster volumes to increase performance with SSDs? Where can I find
> out where the bottle neck is here, or is this expected performance of
> Gluster?
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-
> guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/
> message/ZZVZP6MYCAHU4DHWRLL3VVZTW5DKKBUV/
>
>
>
>