
Hi all, Seems that I am seeing light in he tunnel! I have done the below: Added the option /etc/glusterfs/glusterd.vol : option rpc-auth-allow-insecure on restarted glusterd Then set the volume option: gluster volume set vms server.allow-insecure on Then disabled the option: gluster volume set vms performance.readdir-ahead off which seems to have been enabled when desperately testing gluster options. then I added again the storage domain with glusterfs (not NFS). This lead me to have really high performance boost on writes of VMs. The dd went from 10MB/s to 60MB/s! And the VM disk benchmarks from few MB to 80 - 100MB/s! Seems that the above two options which were missing did the trick for the gluster. When checking the VM XML I still see the disk defines as file and not network. I am not sure that ligfapi is used from ovirt. I will set also a new VM just in case to check the XML again. Is there any way to confirm use of ligfapi? On Fri, Sep 8, 2017 at 10:08 AM, Abi Askushi <rightkicktech@gmail.com> wrote:
I don't see any other bottleneck. CPUs are quite idle. Seems that the load is mostly due to high latency on IO.
Reading further the gluster docs:
https://github.com/gluster/glusterfs-specs/blob/master/ done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md
I see that I am missing the following options:
/etc/glusterfs/glusterd.vol : option rpc-auth-allow-insecure on
gluster volume set vms gluster server.allow-insecure on
It says that the above allow qemu to use libgfapi. When checking the VM XML, I don't see any gluster protocol at the disk drive:
<disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/> <source file='/rhev/data-center/00000001-0001-0001-0001- 000000000311/94741028-e765-4300-a618-c3eeb7dbb7c8/images/ 222a1312-5efa-4731-8914-9a9d24dccba5/d691e6b3-c8e7- 4820-9042-555d30c8a21b'/> <target dev='sda' bus='scsi'/> <serial>222a1312-5efa-4731-8914-9a9d24dccba5</serial> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
While at gluster docs it mentions the below type of disk:
<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source protocol='gluster' name='distrepvol/vm3.img'> <host name='10.70.37.106' port='24007'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>
Does the above indicate that ovirt/qemu is not using libgfapi but FUSE only? This could be the reason of such slow perf.
On Thu, Sep 7, 2017 at 1:47 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Thu, Sep 7, 2017 at 12:52 PM, Abi Askushi <rightkicktech@gmail.com> wrote:
On Thu, Sep 7, 2017 at 10:30 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Thu, Sep 7, 2017 at 10:06 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Wed, Sep 6, 2017 at 6:08 PM, Abi Askushi <rightkicktech@gmail.com> wrote:
For a first idea I use:
dd if=/dev/zero of=testfile bs=1GB count=1
This is an incorrect way to test performance, for various reasons: 1. You are not using oflag=direct , thus not using DirectIO, but using cache. 2. It's unrealistic - it is very uncommon to write large blocks of zeros (sometimes during FS creation or wiping). Certainly not 1GB 3. It is a single thread of IO - again, unrealistic for VM's IO.
I forgot to mention that I include oflag=direct in my tests. I agree
though that dd is not the correct way to test, hence I mentioned I just use it to get a first feel. More tests are done within the VM benchmarking its disk IO (with tools like IOmeter).
I suggest using fio and such. See https://github.com/pcuzner/fio-tools
for example.
Do you have any recommended config file to use for VM workload?
Desktops and Servers VMs behave quite differently, so not really. But the 70/30 job is typically a good baseline.
When testing on the gluster mount point using above command I hardly get 10MB/s. (On the same time the network traffic hardly reaches 100Mbit).
When testing our of the gluster (for example at /root) I get 600 - 700MB/s.
That's very fast - from 4 disks doing RAID5? Impressive (unless you use caching!). Are those HDDs or SSDs/NVMe?
These are SAS disks. But there is also a RAID controller with 1GB cache.
When I mount the gluster volume with NFS and test on it I get 90 - 100 MB/s, (almost 10x from gluster results) which is the max I can get considering I have only 1 Gbit network for the storage.
Also, when using glusterfs the general VM performance is very poor and disk write benchmarks show that is it at least 4 times slower then when the VM is hosted on the same data store when NFS mounted.
I don't know why I hitting such a significant performance penalty, and every possible tweak that I was able to find out there did not make any difference on the performance.
The hardware I am using is pretty decent for the purposes intended: 3 nodes, each node having with 32 MB of RAM, 16 physical CPU cores, 2 TB of storage in RAID5 (4 disks), of which 1.5 TB are sliced for the data store of ovirt where VMs are stored.
I forgot to ask why are you using RAID 5 with 4 disks and not RAID 10? Same usable capacity, higher performance, same protection and faster recovery, I believe.
Correction: there are 5 disks of 600GB each. The main reason going with RAID 5 was the capacity. With RAID 10 I can use only 4 of them and get only 1.1 TB usable, with RAID 5 I get 2.2 TB usable. I agree going with RAID 10 (+ one additional drive to go with 6 drives) would be better but this is what I have now.
Y.
You have not mentioned your NIC speeds. Please ensure all work well, with 10g. Is the network dedicated for Gluster traffic? How are they connected?
I have mentioned that I have 1 Gbit dedicated for the storage. A
different network is used for this and a dedicated 1Gbit switch. The throughput has been confirmed between all nodes with iperf.
Oh.... With 1Gb, you can't get more than 100+MBps...
I know 10Gbit would be better, but when using native gluster at ovirt the network pipe was hardly reaching 100Mbps thus the bottleneck was gluster and not the network. If I can saturate 1Gbit and I still have performance issues then I may think to go with 10Gbit. With NFS on top gluster I see traffic reaching 800Mbit when testing with dd which is much better.
Agreed. Do you see the bottleneck elsewhere? CPU?
The gluster configuration is the following:
Which version of Gluster are you using?
The version is 3.8.12
I think it's a very old release (near end of life?). I warmly suggest 3.10.x or 3.12. There are performance improvements (AFAIR) in both. Y.
Volume Name: vms Type: Replicate Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster0:/gluster/vms/brick Brick2: gluster1:/gluster/vms/brick Brick3: gluster2:/gluster/vms/brick (arbiter) Options Reconfigured: nfs.export-volumes: on nfs.disable: off performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: on
I think this should be off.
performance.low-prio-threads: 32 network.remote-dio: off
I think this should be enabled.
cluster.eager-lock: off cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on cluster.granular-entry-heal: enable features.shard-block-size: 64MB
I'm not sure if this should not be 512MB. I don't remember the last resolution on this. Y.
performance.client-io-threads: on client.event-threads: 4 server.event-threads: 4 performance.write-behind-window-size: 4MB performance.cache-size: 1GB
I have been playing with all above with very little difference on performance I was getting.
In case I can provide any other details let me know.
What is your tuned profile?
the tuned profile is virtual-host
At the moment I already switched to gluster based NFS but I have a
similar setup with 2 nodes where the data store is mounted through gluster (and again relatively good hardware) where I might check any tweaks or improvements on this setup.
Thanx
On Wed, Sep 6, 2017 at 5:32 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
> > > On Wed, Sep 6, 2017 at 3:32 PM, Abi Askushi <rightkicktech@gmail.com > > wrote: > >> Hi All, >> >> I've playing with ovirt self hosted engine setup and I even use it >> to production for several VM. The setup I have is 3 server with gluster >> storage in replica 2+1 (1 arbiter). >> The data storage domain where VMs are stored is mounted with >> gluster through ovirt. The performance I get for the VMs is very low and I >> was thinking to switch and mount the same storage through NFS instead of >> glusterfs. >> > > I don't see how it'll improve performance. > I suggest you share the gluster configuration (as well as the > storage HW) so we can understand why the performance is low. > Y. > > >> >> The only think I am hesitant is how can I ensure high availability >> of the storage when I loose one server? I was thinking to have at >> /etc/hosts sth like below: >> >> 10.100.100.1 nfsmount >> 10.100.100.2 nfsmount >> 10.100.100.3 nfsmount >> >> then use nfsmount as the server name when adding this domain >> through ovirt GUI. >> Are there any other more elegant solutions? What do you do for such >> cases? >> Note: gluster has the back-vol-file option which provides a lean >> way to have redundancy on the mount point and I am using this when mounting >> with glusterfs. >> >> Thanx >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >