Gluster volume slower then raid1 zpool speed

Hi, Can anyone help me with the performance on my 3 node gluster on zfs (it is setup with one arbiter) The performance on the single vm I have on it (with engine) is 50% worse then a single bare metal disk, on the writes. I have enabled "Optimize for virt store" I run 1Gbps 1500MTU network, could this be the write performance killer? Is this to be expected from a 2xHDD zfs raid one on each node, with 3xNode arbiter setup? Maybe I should move to raid 5 or 6? Maybe I should add SSD cache to raid1 zfs zpools? What are your thoughts? What to do for optimize this setup? I would like to run zfs with gluster and I can deal with a little performance loss, but not that much.

On 11/23/2020 5:56 AM, Harry O wrote:
Hi, Can anyone help me with the performance on my 3 node gluster on zfs (it is setup with one arbiter) The performance on the single vm I have on it (with engine) is 50% worse then a single bare metal disk, on the writes. I have enabled "Optimize for virt store" I run 1Gbps 1500MTU network, could this be the write performance killer?
it usually is. Remember that the data has to be written to both the local and the other nodes (though in the case of the arbiter its just the metadata). So 1Gb/s is going to be slower than local SATA speed. This is not a Gluster issue. You will find it with all distributed File Systems.
Is this to be expected from a 2xHDD zfs raid one on each node, with 3xNode arbiter setup? Maybe I should move to raid 5 or 6? Maybe I should add SSD cache to raid1 zfs zpools? What are your thoughts? What to do for optimize this setup? I would like to run zfs with gluster and I can deal with a little performance loss, but not that much.
You don't mention numbers, so we don't know your defination of a "little" loss. There IS tuning that can be done in gluster, but the 1G network is going to be the bottleneck in your current setup. Consider, adding ethernet cards/ports and using bonding (or teamd). I am a fan of teamd which is provided with Redhat and Ubuntu distros. Its very easy to setup and manage. As a bonus you get some high availability as a bonus. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm... You *will* see an immediate improvement. MTU 9000 (jumbo frames) can also help a bit. Of course 10G or better networking would be optimal. -wk
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SYP4I4MQDKLCIF...

Unfortunately I didn't get any improvement by upgrading the network. Bare metal (zfs raid1 zvol): dd if=/dev/zero of=/gluster_bricks/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6471 s, 68.6 MB/s Centos VM on gluster volume: dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.8618 s, 29.1 MB/s Does this performance look normal?

Any reason to use dsync flag ? Do you have a real workload to test with ? Best Regards, Strahil Nikolov В 10:29 +0000 на 25.11.2020 (ср), Harry O написа:
Unfortunately I didn't get any improvement by upgrading the network.
Bare metal (zfs raid1 zvol): dd if=/dev/zero of=/gluster_bricks/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6471 s, 68.6 MB/s
Centos VM on gluster volume: dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.8618 s, 29.1 MB/s
Does this performance look normal? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5ZKRIMXDVN3MAV...

No, that doesn't look right. I have a testbed cluster that has a single 1G network (1500 mtu) it is replica 2 + arbiter on top of 7200 rpms spinning drives formatted with XFS This cluster runs Gluster 6.10 on Ubuntu 18 on some Dell i5-2xxx boxes that were lying around. it uses a stock 'virt' group tuning which provides the following: root@onetest2:~/datastores/101# cat /var/lib/glusterd/groups/virt performance.quick-read=off performance.read-ahead=off performance.io-cache=off performance.low-prio-threads=32 network.remote-dio=enable cluster.eager-lock=enable cluster.quorum-type=auto cluster.server-quorum-type=server cluster.data-self-heal-algorithm=full cluster.locking-scheme=granular cluster.shd-max-threads=8 cluster.shd-wait-qlength=10000 features.shard=on user.cifs=off cluster.choose-local=off client.event-threads=4 server.event-threads=4 performance.client-io-threads=on I show the following results on your test. Note: the cluster is actually doing some work with 3 Vms running doing monitoring things. The bare metal performance is as follows: root@onetest2:/# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.0783 s, 96.9 MB/s root@onetest2:/# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.5047 s, 93.3 MB/s Moving over to the Gluster mount I show the following: root@onetest2:~/datastores/101# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.4582 s, 93.7 MB/s root@onetest2:~/datastores/101# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2034 s, 88.0 MB/s So a little performance hit with Gluster but almost insignificant given that other things were going on. I don't know if you are in a VM environment but if so you could try the virt tuning. gluster volume set VOLUME group virt Unfortunately, I know little about ZFS so I can't comment on its performance, but your gluster results should be closer to the bare metal performance. Also note I am using an Arbiter, so that is less work than Replica 3. With a true Replica 3 I would expect the Gluster results to be lower, maybe as low as 60-70 MB/s range -wk On 11/25/2020 2:29 AM, Harry O wrote:
Unfortunately I didn't get any improvement by upgrading the network.
Bare metal (zfs raid1 zvol): dd if=/dev/zero of=/gluster_bricks/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6471 s, 68.6 MB/s
Centos VM on gluster volume: dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.8618 s, 29.1 MB/s
Does this performance look normal? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5ZKRIMXDVN3MAV...

The virt settings (highly recommended for Virtual usage) enabled SHARDING. ONCE ENABLED, NEVER EVER DISABLE SHARDING !!! Best Regards, Strahil Nikolov В 16:34 -0800 на 25.11.2020 (ср), WK написа:
No, that doesn't look right.
I have a testbed cluster that has a single 1G network (1500 mtu)
it is replica 2 + arbiter on top of 7200 rpms spinning drives formatted with XFS
This cluster runs Gluster 6.10 on Ubuntu 18 on some Dell i5-2xxx boxes that were lying around.
it uses a stock 'virt' group tuning which provides the following:
root@onetest2:~/datastores/101# cat /var/lib/glusterd/groups/virt
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off
cluster.choose-local=off
client.event-threads=4
server.event-threads=4
performance.client-io-threads=on
I show the following results on your test. Note: the cluster is actually doing some work with 3 Vms running doing monitoring things.
The bare metal performance is as follows:
root@onetest2:/# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.0783 s, 96.9 MB/s
root@onetest2:/# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.5047 s, 93.3 MB/s
Moving over to the Gluster mount I show the following:
root@onetest2:~/datastores/101# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.4582 s, 93.7 MB/s
root@onetest2:~/datastores/101# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2034 s, 88.0 MB/s
So a little performance hit with Gluster but almost insignificant given that other things were going on.
I don't know if you are in a VM environment but if so you could try the virt tuning.
gluster volume set VOLUME group virt Unfortunately, I know little about ZFS so I can't comment on its performance, but your gluster results should be closer to the bare metal performance.
Also note I am using an Arbiter, so that is less work than Replica 3. With a true Replica 3 I would expect the Gluster results to be lower, maybe as low as 60-70 MB/s range
-wk
On 11/25/2020 2:29 AM, Harry O wrote:
Unfortunately I didn't get any improvement by upgrading the network. Bare metal (zfs raid1 zvol):dd if=/dev/zero of=/gluster_bricks/test1.img bs=1G count=1 oflag=dsync1+0 records in1+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6471 s, 68.6 MB/s Centos VM on gluster volume:dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync1+0 records in1+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.8618 s, 29.1 MB/s Does this performance look normal?_______________________________________________Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5ZKRIMXDVN3MAV...
_______________________________________________Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/J27OKT7IRWZM6D...

I would love to see something similar to your performance numbers WK. Here is my gluster volume options and info: [root@ovirtn1 ~]# gluster v info vmstore Volume Name: vmstore Type: Replicate Volume ID: stuff Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirtn1.5ervers.lan:/gluster_bricks/vmstore/vmstore Brick2: ovirtn2.5ervers.lan:/gluster_bricks/vmstore/vmstore Brick3: ovirtn3.5ervers.lan:/gluster_bricks/vmstore/vmstore (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on Does it look like sharding is on Strahil Nikolov? Running "gluster volume set vmstore group virt" had no effect. I don't know why I ended up using dsync flag. For real work test, I have crystal disk mark on windows VM, this is the results: https://gofile.io/d/7nOeEL

On Thu, Nov 26, 2020 at 1:54 PM Harry O <harryo.dk@gmail.com> wrote:
I would love to see something similar to your performance numbers WK. Here is my gluster volume options and info: [root@ovirtn1 ~]# gluster v info vmstore
Volume Name: vmstore Type: Replicate Volume ID: stuff Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirtn1.5ervers.lan:/gluster_bricks/vmstore/vmstore Brick2: ovirtn2.5ervers.lan:/gluster_bricks/vmstore/vmstore Brick3: ovirtn3.5ervers.lan:/gluster_bricks/vmstore/vmstore (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on
if this is on that means sharding is on and the default size 64MB
cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable
I think this option should be disabled , direct-io to take effect as i can see performance.strict-o-direct is on. Try disabling this it may help.
performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on
Does it look like sharding is on Strahil Nikolov?
Running "gluster volume set vmstore group virt" had no effect.
I don't know why I ended up using dsync flag. For real work test, I have crystal disk mark on windows VM, this is the results: https://gofile.io/d/7nOeEL _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZBPFPA5K6XCCF...

New results from centos vm on vmstore: [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 26.6353 s, 40.3 MB/s [root@host2 ~]# rm -rf /test12.img [root@host2 ~]# [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 61.4851 s, 17.5 MB/s [root@host2 ~]# rm -rf /test12.img [root@host2 ~]# [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 28.2097 s, 38.1 MB/s

well, I just reviewed my previous test and I realized that I made a mistake on the gluster mount test. I had up arrowed the shell history and used of= "/test12.img" instead of "./test12" which meant I was testing on the baremetal root partition even though I had 'cd'ed into the Gluster mount. My "Actual" Gluster results on 1G network 7200 drives are in the 20-24 MB/s range using your values. Sorry for the confusion. Your results are consistent. -wk On 11/26/2020 12:29 AM, Harry O wrote:
New results from centos vm on vmstore: [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 26.6353 s, 40.3 MB/s [root@host2 ~]# rm -rf /test12.img [root@host2 ~]# [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 61.4851 s, 17.5 MB/s [root@host2 ~]# rm -rf /test12.img [root@host2 ~]# [root@host2 ~]# dd if=/dev/zero of=/test12.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 28.2097 s, 38.1 MB/s _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JPH2OG6FOB4Y36...

for that workload (using that particular test with the dsync) then that is what I saw on mounted gluster given the 7200 drives and simple 1G network. Next week I'll make a point of running your test with bonded ethernet to see if that improves things. Note: our testing uses the following: for size in `echo 50M 10M 1M` do echo 'starting' pwd echo "$size" dd if=/dev/zero of=./junk bs=$size count=100 oflag=direct; rm ./junk done so we are doing multiple copies of much smaller files. and this is what I see on that kit SIZE = 50M 1.01 0.84 0.77 2/388 28977 100+0 records in 100+0 records out 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 70.262 s, 74.6 MB/s SIZE = 10M 3.88 1.79 1.11 2/400 29336 100+0 records in 100+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 15.8082 s, 66.3 MB/s SIZE = 1M 3.93 1.95 1.18 1/394 29616 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.67975 s, 62.4 MB/s with teamd (bonding) I would expect an approx 40-50% speed increase (which is why I didn't catch my error earlier as I am used to seeing values in the 80s) On 11/26/2020 11:11 PM, Harry O wrote:
So my gluster performance results is expected? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RCQ5LA77ZFQF5V...

Ok guys now my setup it like this: 2 x Servers with 5 x 4TB 7200RPM drives in raidz1 and 10G storage network (mtu 9000) in each - my gluster_bricks folders 1 x SFF workstation with 2 x 50GB SSD's in ZFS mirror - my gluster_bricks folder for arbiter My gluster vol info looks like this: Volume Name: vmstore Type: Replicate Volume ID: 7deac39b-3109-4229-b99f-afa50fc8d5a1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirtn1.5erverssan.lan:/gluster_bricks/vmstore/vmstore Brick2: ovirtn2.5erverssan.lan:/gluster_bricks/vmstore/vmstore Brick3: ovirtn3.5erverssan.lan:/gluster_bricks/vmstore/vmstore (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: off network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: on And my test results look like this: starting on engine /tmp 50M dd: error writing './junk': No space left on device 40+0 records in 39+0 records out 2044723200 bytes (2.0 GB, 1.9 GiB) copied, 22.1341 s, 92.4 MB/s starting /tmp 10M 100+0 records in 100+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 11.4612 s, 91.5 MB/s starting /tmp 1M 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.602421 s, 174 MB/s starting on node1 /gluster_bricks 50M 100+0 records in 100+0 records out 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 40.8802 s, 128 MB/s starting /gluster_bricks 10M 100+0 records in 100+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.49434 s, 140 MB/s starting /gluster_bricks 1M 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.164098 s, 639 MB/s starting on node2 /gluster_bricks 50M 100+0 records in 100+0 records out 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 22.0764 s, 237 MB/s starting /gluster_bricks 10M 100+0 records in 100+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.32239 s, 243 MB/s starting /gluster_bricks 1M 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0584058 s, 1.8 GB/s I don't know why my zfs arrays perform different, its the same drives with the same config. Is this performace normal or bad? I think it is too bad hmm... Any tips or tricks for this?
participants (5)
-
Harry O
-
Ritesh Chikatwar
-
Strahil Nikolov
-
WK
-
wkmail