On Wed, Sep 8, 2021 at 12:15 PM Mathieu Valois <mvalois(a)teicee.com> wrote:
Sorry for double post but I don't know if this mail has been
received.
Hello everyone,
I know this issue was already treated on this mailing list. However none
of the proposed solutions is satisfying me.
Here is my situation : I've got 3 hyperconverged gluster ovirt nodes, with
6 network interfaces, bounded in bunches of 2 (management, VMs and
gluster). The gluster network is on a dedicated bound where the 2
interfaces are directly connected to the 2 other ovirt nodes. Gluster is
apparently using it :
# gluster volume status vmstore
Status of volume: vmstore
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick gluster-ov1:/gluster_bricks
/vmstore/vmstore 49152 0 Y
3019
Brick gluster-ov2:/gluster_bricks
/vmstore/vmstore 49152 0 Y
3009
Brick gluster-ov3:/gluster_bricks
/vmstore/vmstore
where 'gluster-ov{1,2,3}' are domain names referencing nodes in the
gluster network. This networks has 10Gbps capabilities :
# iperf3 -c gluster-ov3
Connecting to host gluster-ov3, port 5201
[ 5] local 10.20.0.50 port 46220 connected to 10.20.0.51 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.16 GBytes 9.92 Gbits/sec 17 900
KBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 900
KBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 4 996
KBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 996
KBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.89 Gbits/sec 0 996
KBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996
KBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996
KBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.91 Gbits/sec 0 996
KBytes
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996
KBytes
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 22
sender
[ 5] 0.00-10.04 sec 11.5 GBytes 9.86 Gbits/sec
receiver
iperf Done.
Network seems fine.
However, VMs stored on the vmstore gluster volume has poor write
performances, oscillating between 100KBps and 30MBps. I almost always
observe a write spike (180Mbps) at the beginning until around 500MB
written, then it drastically falls at 10MBps, sometimes even less
(100KBps). Hypervisors have 32 threads (2 sockets, 8 cores per socket, 2
threads per core).
Here is the volume settings :
Volume Name: vmstore
Type: Replicate
Volume ID: XXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
This looks like a replica 3 volume. In this case the VM writes everything
3 times - once per replica. The writes are done in parallel, but the data
is sent over the write 2-3 times (e.g. 2 if one of the bricks is on the
local host).
You may get better performance with replica 2 + arbiter:
https://gluster.readthedocs.io/en/latest/Administrator-Guide/arbiter-volu...
In this case data is written only to 2 bricks, and the arbiter brick holds
only
metadata.
Transport-type: tcp
Bricks:
Brick1: gluster-ov1:/gluster_bricks/vmstore/vmstore
Brick2: gluster-ov2:/gluster_bricks/vmstore/vmstore
Brick3: gluster-ov3:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.io-thread-count: 32 # was 16 by default.
cluster.granular-entry-heal: enable
storage.owner-gid: 36
storage.owner-uid: 36
cluster.lookup-optimize: off
server.keepalive-count: 5
server.keepalive-interval: 2
server.keepalive-time: 10
server.tcp-user-timeout: 20
network.ping-timeout: 30
server.event-threads: 4
client.event-threads: 8 # was 4 by default
cluster.choose-local: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.strict-o-direct: on
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
When I naively write directly on the logical volume, which is mounted on a
material RAID5 3-disks array, I have interesting performances:
# dd if=/dev/zero of=a bs=4M count=2048
2048+0 records in
2048+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 17.2485 s, 498 MB/s #urandom
gives around 200MBps
There are few issues with this test:
- you don't use oflag=direct or
conv=fsync, so this may test copying data
to the host page cache, instead of writing data to storage
- This tests only sequential write, which is the best case for any kind of
storage
- Using synchronous I/O - every write wait for the previous write completion
- Using single process
- 2g is too small, may test your cache performance
Try to test using fio - attached fio script that tests sequential and
random io with
various queue depth.
You can use it like this:
fio --filename=/path/to/fio.data --output=test.out bench.fio
Test both on the host, and in the VM. This will give you more detailed
results that may help to evaluate the issue, and it may help Gluster
folks to advise on tuning your storage.
Nir