Hi,
I've been using oVirt for years and have just discovered a rather strange
issue that causes EXTREME high iowait when using a NFSv3 storage.
Here's a quick test on a CentOS 7.6 VM running on any oVirt 4.2.x node:
(5 oVirt nodes, all showing the same results)
# CentOS VM
$ dd if=/dev/zero of=TEST02 bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 141.649 s, 22.2 MB/s
# iostat output
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz
await r_await w_await svctm %util
vdb 0.00 0.00 1.00 50.00 0.00 23.02 924.39 121.62
2243.47 2301.00 2242.32 19.61 100.00
As you can see iowait is beyond bad both for read and write requests.
During this test the underlying NFS storage server was idle, disks barely
doing anything. iowait on the NFS storage server was very low.
However, when using oflag=direct the test shows a completely different result:
# CentOS VM
$ dd if=/dev/zero of=TEST02 bs=1M count=3000 oflag=direct
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 21.0724 s, 149 MB/s
# iostat output
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz
await r_await w_await svctm %util
vdb 0.00 0.00 4.00 483.00 0.02 161.00 677.13 2.90
5.96 0.00 6.01 1.99 97.10
This test shows the *expected* performance in this small oVirt setup.
Notice how iowait remains healthy, although the throughput is 7x higher now.
I think this 2nd test may prove multiple things: the NFS storage is fast
enough and there's no networking/switch issue either.
Still, under normal conditions WRITE/READ operations are really slow and
iowait goes through the roof.
Do these results make sense to anyone? Any hints how to find what's wrong here?
Any tests I should run or sysctls/tunables that would make sense?
FWIW, iperf result looks good between the oVirt Node and the NFS storage:
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 10.9 GBytes 9.36 Gbits/sec
Regards
- Frank