Hi all,

Seems that I am seeing light in he tunnel!

I have done the below:

Added the option
/etc/glusterfs/glusterd.vol :
option rpc-auth-allow-insecure on

restarted glusterd

Then set the volume option:
gluster volume set vms server.allow-insecure on

Then disabled the option:
gluster volume set vms performance.readdir-ahead off

which seems to have been enabled when desperately testing gluster options.

then I added again the storage domain with glusterfs (not NFS).
This lead me to have really high performance boost on writes of VMs.

The dd went from 10MB/s to 60MB/s!
And the VM disk benchmarks from few MB to 80 - 100MB/s!

Seems that the above two options which were missing did the trick for the gluster.

When checking the VM XML I still see the disk defines as file and not network. I am not sure that ligfapi is used from ovirt.
I will set also a new VM just in case to check the XML again.
Is there any way to confirm use of ligfapi?

On Fri, Sep 8, 2017 at 10:08 AM, Abi Askushi <rightkicktech@gmail.com> wrote:
I don't see any other bottleneck. CPUs are quite idle. Seems that the load is mostly due to high latency on IO.

Reading further the gluster docs:


I see that I am missing the following options:

/etc/glusterfs/glusterd.vol :
option rpc-auth-allow-insecure on

gluster volume set vms gluster server.allow-insecure on

It says that the above allow qemu to use libgfapi.
When checking the VM XML, I don't see any gluster protocol at the disk drive:

<disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/00000001-0001-0001-0001-000000000311/94741028-e765-4300-a618-c3eeb7dbb7c8/images/222a1312-5efa-4731-8914-9a9d24dccba5/d691e6b3-c8e7-4820-9042-555d30c8a21b'/>
      <target dev='sda' bus='scsi'/>
      <serial>222a1312-5efa-4731-8914-9a9d24dccba5</serial>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>



While at gluster docs it mentions the below type of disk:

<disk type='network' device='disk'>
       <driver name='qemu' type='raw' cache='none'/>
       <source protocol='gluster' name='distrepvol/vm3.img'>
            <host name='10.70.37.106' port='24007'/>
        </source>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>


Does the above indicate that ovirt/qemu is not using libgfapi but FUSE only? 
This could be the reason of such slow perf.


On Thu, Sep 7, 2017 at 1:47 PM, Yaniv Kaul <ykaul@redhat.com> wrote:


On Thu, Sep 7, 2017 at 12:52 PM, Abi Askushi <rightkicktech@gmail.com> wrote:


On Thu, Sep 7, 2017 at 10:30 AM, Yaniv Kaul <ykaul@redhat.com> wrote:


On Thu, Sep 7, 2017 at 10:06 AM, Yaniv Kaul <ykaul@redhat.com> wrote:


On Wed, Sep 6, 2017 at 6:08 PM, Abi Askushi <rightkicktech@gmail.com> wrote:
For a first idea I use:

dd if=/dev/zero of=testfile bs=1GB count=1

This is an incorrect way to test performance, for various reasons:
1. You are not using oflag=direct , thus not using DirectIO, but using cache.
2. It's unrealistic - it is very uncommon to write large blocks of zeros (sometimes during FS creation or wiping). Certainly not 1GB
3. It is a single thread of IO - again, unrealistic for VM's IO.

I forgot to mention that I include oflag=direct in my tests. I agree though that dd is not the correct way to test, hence I mentioned I just use it to get a first feel. More tests are done within the VM benchmarking its disk IO (with tools like IOmeter).

I suggest using fio and such. See https://github.com/pcuzner/fio-tools for example.
Do you have any recommended config file to use for VM workload?

Desktops and Servers VMs behave quite differently, so not really. But the 70/30 job is typically a good baseline.
 




When testing on the gluster mount point using above command I hardly get 10MB/s. (On the same time the network traffic hardly reaches 100Mbit).

When testing our of the gluster (for example at /root) I get 600 - 700MB/s.

That's very fast - from 4 disks doing RAID5? Impressive (unless you use caching!). Are those HDDs or SSDs/NVMe?
 
These are SAS disks. But there is also a RAID controller with 1GB cache.


When I mount the gluster volume with NFS and test on it I get 90 - 100 MB/s, (almost 10x from gluster results) which is the max I can get considering I have only 1 Gbit network for the storage.

Also, when using glusterfs the general VM performance is very poor and disk write benchmarks show that is it at least 4 times slower then when the VM is hosted on the same data store when NFS mounted.

I don't know why I hitting such a significant performance penalty, and every possible tweak that I was able to find out there did not make any difference on the performance.

The hardware I am using is pretty decent for the purposes intended:
3 nodes, each node having with 32 MB of RAM, 16 physical CPU cores, 2 TB of storage in RAID5 (4 disks), of which 1.5 TB are sliced for the data store of ovirt where VMs are stored.

I forgot to ask why are you using RAID 5 with 4 disks and not RAID 10? Same usable capacity, higher performance, same protection and faster recovery, I believe.
Correction: there are 5 disks of 600GB each. The main reason going with RAID 5 was the capacity. With RAID 10 I can use only 4 of them and get only 1.1 TB usable, with RAID 5 I get 2.2 TB usable. I agree going with RAID 10 (+ one additional drive to go with 6 drives) would be better but this is what I have now.

Y.


You have not mentioned your NIC speeds. Please ensure all work well, with 10g.
Is the network dedicated for Gluster traffic? How are they connected?
 
I have mentioned that I have 1 Gbit dedicated for the storage. A different network is used for this and a dedicated 1Gbit switch. The throughput has been confirmed between all nodes with iperf.

Oh.... With 1Gb, you can't get more than 100+MBps...
 
I know 10Gbit would be better, but when using native gluster at ovirt the network pipe was hardly reaching 100Mbps thus the bottleneck was gluster and not the network. If I can saturate 1Gbit and I still have performance issues then I may think to go with 10Gbit. With NFS on top gluster I see traffic reaching 800Mbit when testing with dd which is much better.

Agreed. Do you see the bottleneck elsewhere? CPU?
 


The gluster configuration is the following:

Which version of Gluster are you using?
 
The version is  3.8.12 

I think it's a very old release (near end of life?). I warmly suggest 3.10.x or 3.12.
There are performance improvements (AFAIR) in both.
Y.
 

Volume Name: vms
Type: Replicate
Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster0:/gluster/vms/brick
Brick2: gluster1:/gluster/vms/brick
Brick3: gluster2:/gluster/vms/brick (arbiter)
Options Reconfigured:
nfs.export-volumes: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: on

I think this should be off.
 
performance.low-prio-threads: 32
network.remote-dio: off

I think this should be enabled.
 
cluster.eager-lock: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
features.shard-block-size: 64MB

I'm not sure if this should not be 512MB.  I don't remember the last resolution on this.
Y.
 
performance.client-io-threads: on
client.event-threads: 4
server.event-threads: 4
performance.write-behind-window-size: 4MB
performance.cache-size: 1GB

I have been playing with all above with very little difference on performance I was getting.

In case I can provide any other details let me know.

What is your tuned profile?
 
the tuned profile is virtual-host

At the moment I already switched to gluster based NFS but I have a similar setup with 2 nodes  where the data store is mounted through gluster (and again relatively good hardware) where I might check any tweaks or improvements on this setup.

Thanx


On Wed, Sep 6, 2017 at 5:32 PM, Yaniv Kaul <ykaul@redhat.com> wrote:


On Wed, Sep 6, 2017 at 3:32 PM, Abi Askushi <rightkicktech@gmail.com> wrote:
Hi All,

I've playing with ovirt self hosted engine setup and I even use it to production for several VM. The setup I have is 3 server with gluster storage in replica 2+1 (1 arbiter).
The data storage domain where VMs are stored is mounted with gluster through ovirt. The performance I get for the VMs is very low and I was thinking to switch and mount the same storage through NFS instead of glusterfs.

I don't see how it'll improve performance.
I suggest you share the gluster configuration (as well as the storage HW) so we can understand why the performance is low.
Y.
 

The only think I am hesitant is how can I ensure high availability of the storage when I loose one server? I was thinking to have at /etc/hosts sth like below:

10.100.100.1 nfsmount
10.100.100.2 nfsmount
10.100.100.3 nfsmount

then use nfsmount as the server name when adding this domain through ovirt GUI.
Are there any other more elegant solutions? What do you do for such cases?
Note: gluster has the back-vol-file option which provides a lean way to have redundancy on the mount point and I am using this when mounting with glusterfs.

Thanx

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users