I strongly believe that FUSE mount is the real reason for poor performance in HCI and these minor gluster and other tweaks won't satisfy most seeking i/o performance. Enabling libgfapi is probably the best option. Redhat has recently closed bug reports related to libgfapi citing won't fix and one comment suggests that libgfapi was not showing good enough performance to bother with which appears to contradict what many oVirt users are seeing. It's confusing to me why libgfapi as a default option is not being given any priority. 

https://bugzilla.redhat.com/show_bug.cgi?id=1465810 

"We do not plan to enable libgfapi for oVirt/RHV. We did not find enough performance improvement justification for it"

On Tue, Mar 24, 2020 at 3:34 PM Alex McWhirter <alex@triadic.us> wrote:
Red hat also recommends a shard size of 512mb, it's actually the only
shard size they support. Also check the chunk size on the LVM thin pools
running the bricks, should be at least 2mb. Note that changing the shard
size only applies to new VM disks after the change. Changing the chunk
size requires making a new brick.

libgfapi brings a huge performance boost, in my opinion its almost a
necessity unless you have a ton of extra disk speed / network
throughput. Just be aware of the caveats.

On 2020-03-24 14:12, Strahil Nikolov wrote:
> On March 24, 2020 7:33:16 PM GMT+02:00, Darrell Budic
> <budic@onholyground.com> wrote:
>> Christian,
>>
>> Adding on to Stahil’s notes, make sure you’re using jumbo MTUs on
>> servers and client host nodes. Making sure you’re using appropriate
>> disk schedulers on hosts and VMs is important, worth double checking
>> that it’s doing what you think it is. If you are only HCI, gluster’s
>> choose-local on is a good thing, but try
>>
>> cluster.choose-local: false
>> cluster.read-hash-mode: 3
>>
>> if you have separate servers or nodes with are not HCI to allow it
>> spread reads over multiple nodes.
>>
>> Test out these settings if you have lots of RAM and cores on your
>> servers, they work well for me with 20 cores and 64GB ram on my
>> servers
>> with my load:
>>
>> performance.io-thread-count: 64
>> performance.low-prio-threads: 32
>>
>> these are worth testing for your workload.
>>
>> If you’re running VMs with these, test out libglapi connections, it’s
>> significantly better for IO latency than plain fuse mounts. If you can
>> tolerate the issues, the biggest one at the moment being you can’t
>> take
>> snapshots of the VMs with it enabled as of March.
>>
>> If you have tuned available, I use throughput-performance on my
>> servers
>> and guest-host on my vm nodes, throughput-performance on some HCI
>> ones.
>>
>>
>> I’d test with out the fips-rchecksum setting, that may be creating
>> extra work for your servers.
>>
>> If you mounted individual bricks, check that you disabled barriers on
>> them at mount if appropriate.
>>
>> Hope it helps,
>>
>>  -Darrell
>>
>>> On Mar 24, 2020, at 6:23 AM, Strahil Nikolov <hunter86_bg@yahoo.com>
>> wrote:
>>>
>>> On March 24, 2020 11:20:10 AM GMT+02:00, Christian Reiss
>> <email@christian-reiss.de> wrote:
>>>> Hey Strahil,
>>>>
>>>> seems you're the go-to-guy with pretty much all my issues. I thank
>> you
>>>> for this and your continued support. Much appreciated.
>>>>
>>>>
>>>> 200mb/reads however seems like a broken config or malfunctioning
>>>> gluster
>>>> than requiring performance tweaks. I enabled profiling so I have
>> real
>>>> life data available. But seriously even without tweaks I would like
>>>> (need) 4 times those numbers, 800mb write speed is okay'ish, given
>> the
>>>> fact that 10gbit backbone can be the limiting factor.
>>>>
>>>> We are running BigCouch/CouchDB Applications that really really need
>>>> IO.
>>>> Not in throughput but in response times. 200mb/s is just way off.
>>>>
>>>> It feels as gluster can/should do more, natively.
>>>>
>>>> -Chris.
>>>>
>>>> On 24/03/2020 06:17, Strahil Nikolov wrote:
>>>>> Hey Chris,,
>>>>>
>>>>> You got some options.
>>>>> 1. To speedup the reads in HCI - you can use the option :
>>>>> cluster.choose-local: on
>>>>> 2. You can adjust the server and client event-threads
>>>>> 3. You can use NFS Ganesha (which connects to all servers via
>>>> libgfapi)  as a NFS Server.
>>>>> In such case you have to use some clustering like ctdb or
>> pacemaker.
>>>>> Note:disable cluster.choose-local if you use this one
>>>>> 4 You can try the built-in NFS , although it's deprecated (NFS
>>>> Ganesha is fully supported)
>>>>> 5.  Create a gluster profile during the tests. I have seen numerous
>>>> improperly selected tests -> so test with real-world  workload.
>>>> Synthetic tests are not good.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>
>>> Hey Chris,
>>>
>>> What type is your VM ?
>>> Try with 'High Performance' one (there is a  good RH documentation on
>> that topic).
>>>
>>> If the DB load  was  directly on gluster, you could use the settings
>> in the '/var/lib/gluster/groups/db-workload'  to optimize that, but
>> I'm
>> not sure  if this will bring any performance  on a VM.
>>>
>>> 1. Check the VM disk scheduler. Use 'noop/none' (depends on
>> multiqueue is enabled) to allow  the Hypervisor aggregate the I/O
>> requests from multiple VMs.
>>> Next, set 'noop/none' disk scheduler  on the hosts - these 2 are the
>> optimal for SSDs and NVME disks  (if I recall corectly you are  using
>> SSDs)
>>>
>>> 2. Disable cstates on the host and Guest (there are a lot of articles
>> about that)
>>>
>>> 3. Enable MTU 9000 for Hypervisor (gluster node).
>>>
>>> 4. You can try setting/unsetting the tunables in the db-workload
>> group and run benchmarks with real workload  .
>>>
>>> 5.  Some users  reported  that enabling  TCP offload  on the hosts
>> gave huge  improvement in performance  of gluster  - you can try that.
>>> Of course  there are mixed  feelings - as others report  that
>> disabling it brings performance. I guess  it is workload  specific.
>>>
>>> 6.  You can try to tune  the 'performance.readahead'  on your
>> gluster volume.
>>>
>>> Here are some settings  of some users /from an old e-mail/:
>>>
>>> performance.read-ahead: on
>>> performance.stat-prefetch: on
>>> performance.flush-behind: on
>>> performance.client-io-threads: on
>>> performance.write-behind-window-size: 64MB (shard  size)
>>>
>>>
>>>
>>> For a  48 cores / host:
>>>
>>> server.event-threads: 4
>>> client.event-threads: 8
>>>
>>> Your ecent-threads  seem to be too high.And yes, documentation
>> explains it , but without an example it becomes more confusing.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>>
>>> _______________________________________________
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-leave@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BOFZEJPBIRXUAXLJS6M34Z3RHPDNQB4D/
>
> When talking about mounts, you can avoid SELINUX lookups  via
> 'context=system_u:object_r:glusterd_brick_t:s0'  mount option for all
> bricks.
> This way the kernel will reduce the requests to the bricks.
>
> Also 'noatime' is a default mount option(relatime is also a good  one)
> for HCI gluster bricks.
>
>
> It seems you have a lot of checks to do :)
>
> Best Regards,
> Strahil Nikolov
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/B7OSBWD4MGBS6KXVB7YZDWPDFDO2THHU/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PDG34NP2ADFL6P2W5CEMZFN4EA4Z5F2P/