On Mon, Aug 6, 2018 at 10:17 PM, Jayme <jaymef@gmail.com> wrote:
Just wanted to comment on this again.  Today I rebuilt my oVirt environment
as I wanted to change disk/volume layout one final time before making use
of the cluster.  I downloaded the most recent oVirt node image linked off
the ovirt site and used cockpit to deploy.  Again, the gluster config
defaults were not set to optimize for virt.  My gluster performance was as
bad as the first time around 5Mb/sec on DD tests.  After optimizing volumes
for virt store it increased by 10x.   If these settings are suppose to be
applied by default it does not appear to be working as intended..

The "optimize for virt" from the oVirt UI sets the group "virt" profile on volume. This is essentially what the deployment from cockpit does too - in addition, it also enables the o-direct writes on gluster volume (vis network.remote-dio = disable). This was done to solve an issue found with fsync taking upto 30s.
If you see that without the network.remote-dio option, you're able to run workloads without issues, can you open a bug to change the defaults - and we'll investigate further.

thanks!


- Jayme

On Mon, Aug 6, 2018 at 12:15 AM, Darrell Budic <budic@onholyground.com>
wrote:

> The defaults are a queue depth of 1000 and 1 thread. Recommended settings
> are going to depend on what kind of hardware you’re running it on, load,
> and memory as much or more than disk type/speed, from my experience.
>
> I’d probably recommend a # of queues equal to half my total CPU core
> count, leaving the other half for handling actually serving data. Unless
> it’s hyper-converged, then I’d keep it to 1 or two, since those CPUs would
> also be serving VM. For the queue dept, I don’t have any good ideas other
> than using the default 1000. Especially if you don’t have a huge storage
> system, it won’t make a big difference. One other thing I’m not sure of is
> if that’s threads per SHD, if it’s per, you get one per volume and might
> want to limit it even more. My reason is that if gluster can max your CPU,
> it’s got high enough settings for those two vars :)
>
> And these numbers are relative, I was testing with 8/10000 after a post in
> gluster-users suggested it helped speed up healing time, and I found it
> took my systems about 4 or 5 hours to heal fully after rebooting a server