Christian,
I've been following along with interest, as I've also been trying everything I can to improve gluster performance in my HCI cluster. My issue is mostly latency related and my workloads are typically small file operations which have been especially challenging.
Couple of things
1. About the MTU, did you also enable jumbo frames at switch level (if applicable)? I have jumbo frames enabled but honestly didn't see much of an impact from doing so.
2. About libgfapi. It's actually quite simple to enable it (at least if you want to do some testing). It can be enabled on the hosted engine using engine-config i.e. engine-config -s LibgfApiSupported=true -- from my experience you can do this while VMs are running and they won't pick up the new config under powered off/restarted. So you are able to test it out on one VM. Again, as and some others have mentioned this is not a default option in oVirt because there are known bugs with the libgfapi implementation. Some others have worked around these bugs in various ways but like you, I am not willing to do so in a production environment. Still, I think it's very much worth doing some tests on a VM with libgfapi enabled compared to default fuse mount.