
On Mon, Aug 6, 2018 at 10:17 PM, Jayme <jaymef@gmail.com> wrote:
Just wanted to comment on this again. Today I rebuilt my oVirt environment as I wanted to change disk/volume layout one final time before making use of the cluster. I downloaded the most recent oVirt node image linked off the ovirt site and used cockpit to deploy. Again, the gluster config defaults were not set to optimize for virt. My gluster performance was as bad as the first time around 5Mb/sec on DD tests. After optimizing volumes for virt store it increased by 10x. If these settings are suppose to be applied by default it does not appear to be working as intended..
The "optimize for virt" from the oVirt UI sets the group "virt" profile on volume. This is essentially what the deployment from cockpit does too - in addition, it also enables the o-direct writes on gluster volume (vis network.remote-dio = disable). This was done to solve an issue found with fsync taking upto 30s. If you see that without the network.remote-dio option, you're able to run workloads without issues, can you open a bug to change the defaults - and we'll investigate further. thanks!
- Jayme
On Mon, Aug 6, 2018 at 12:15 AM, Darrell Budic <budic@onholyground.com> wrote:
The defaults are a queue depth of 1000 and 1 thread. Recommended settings are going to depend on what kind of hardware you’re running it on, load, and memory as much or more than disk type/speed, from my experience.
I’d probably recommend a # of queues equal to half my total CPU core count, leaving the other half for handling actually serving data. Unless it’s hyper-converged, then I’d keep it to 1 or two, since those CPUs would also be serving VM. For the queue dept, I don’t have any good ideas other than using the default 1000. Especially if you don’t have a huge storage system, it won’t make a big difference. One other thing I’m not sure of is if that’s threads per SHD, if it’s per, you get one per volume and might want to limit it even more. My reason is that if gluster can max your CPU, it’s got high enough settings for those two vars :)
And these numbers are relative, I was testing with 8/10000 after a post in gluster-users suggested it helped speed up healing time, and I found it took my systems about 4 or 5 hours to heal fully after rebooting a server