[ovirt-users] Re: Poor I/O Performance (again...)

14 Apr 2019


      Hi,
Thank you Alex, I was looking for some optimisation settings as well, since
I am pretty much in the same boat, using ssd based replicate-distributed
volumes across 12 hosts.
Could anyone else (maybe even from from ovirt or rhev team) validate these
settings or add some other tweaks as well, so we can use them as standard ?
Thank you very much again !

On Mon, Apr 15, 2019, 05:56 Alex McWhirter <alex@triadic.us> wrote:
...
On 2019-04-14 20:27, Jim Kusznir wrote:
Hi all:
I've had I/O performance problems pretty much since the beginning of using
oVirt.  I've applied several upgrades as time went on, but strangely, none
of them have alleviated the problem.  VM disk I/O is still very slow to the
point that running VMs is often painful; it notably affects nearly all my
VMs, and makes me leary of starting any more.  I'm currently running 12 VMs
and the hosted engine on the stack.
My configuration started out with 1Gbps networking and hyperconverged
gluster running on a single SSD on each node.  It worked, but I/O was
painfully slow.  I also started running out of space, so I added an SSHD on
each node, created another gluster volume, and moved VMs over to it.  I
also ran that on a dedicated 1Gbps network.  I had recurring disk failures
(seems that disks only lasted about 3-6 months; I warrantied all three at
least once, and some twice before giving up).  I suspect the Dell PERC 6/i
was partly to blame; the raid card refused to see/acknowledge the disk, but
plugging it into a normal PC showed no signs of problems.  In any case,
performance on that storage was notably bad, even though the gig-e
interface was rarely taxed.
I put in 10Gbps ethernet and moved all the storage on that none the less,
as several people here said that 1Gbps just wasn't fast enough.  Some
aspects improved a bit, but disk I/O is still slow.  And I was still having
problems with the SSHD data gluster volume eating disks, so I bought a
dedicated NAS server (supermicro 12 disk dedicated FreeNAS NFS storage
system on 10Gbps ethernet).  Set that up.  I found that it was actually
FASTER than the SSD-based gluster volume, but still slow.  Lately its been
getting slower, too...Don't know why.  The FreeNAS server reports network
loads around 4MB/s on its 10Gbe interface, so its not network constrained.
At 4MB/s, I'd sure hope the 12 spindle SAS interface wasn't constrained
either.....  (and disk I/O operations on the NAS itself complete much
faster).
So, running a test on my NAS against an ISO file I haven't accessed in
months:
# dd
if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
of=/dev/null bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec)
Running it on one of my hosts:
root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k
count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s
(I don't know if this is a true apples to apples comparison, as I don't
have a large file inside this VM's image).  Even this is faster than I
often see.
I have a VoIP Phone server running as a VM.  Voicemail and other
recordings usually fail due to IO issues opening and writing the files.
Often, the first 4 or so seconds of the recording is missed; sometimes the
entire thing just fails.  I didn't use to have this problem, but its
definately been getting worse.  I finally bit the bullet and ordered a
physical server dedicated for my VoIP System...But I still want to figure
out why I'm having all these IO problems.  I read on the list of people
running 30+ VMs...I feel that my IO can't take any more VMs with any
semblance of reliability.  We have a Quickbooks server on here too
(windows), and the performance is abysmal; my CPA is charging me extra
because of all the lost staff time waiting on the system to respond and
generate reports.....
I'm at my whits end...I started with gluster on SSD with 1Gbps network,
migrated to 10Gbps network, and now to dedicated high performance NAS box
over NFS, and still have performance issues.....I don't know how to
troubleshoot the issue any further, but I've never had these kinds of
issues when I was playing with other VM technologies.  I'd like to get to
the point where I can resell virtual servers to customers, but I can't do
so with my current performance levels.
I'd greatly appreciate help troubleshooting this further.
--Jim
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZR64VABNT2SGKL...
Been working on optimizing the same. This is where im at currently.
Gluster volume settings.
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.write-behind-window-size: 64MB
performance.flush-behind: on
performance.stat-prefetch: on
server.event-threads: 4
client.event-threads: 8
performance.io-thread-count: 32
network.ping-timeout: 30
cluster.granular-entry-heal: enable
performance.strict-o-direct: on
storage.owner-gid: 36
storage.owner-uid: 36
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: on
sysctl options
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf =1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_congestion_control=htcp
custom /sbin/ifup-local file, Storage is the bridge name, which ==
ens3f0/1 in bond2
#!/bin/bash
case "$1" in
  Storage)
    /sbin/ethtool -K ens3f0 tx off rx off tso off gso off
    /sbin/ethtool -K ens3f1 tx off rx off tso off gso off
    /sbin/ip link set dev ens3f0 txqueuelen 10000
    /sbin/ip link set dev ens3f1 txqueuelen 10000
    /sbin/ip link set dev bond2 txqueuelen 10000
    /sbin/ip link set dev Storage txqueuelen 10000
  ;;
  *)
  ;;
esac
exit 0
i still have some latency issues, but my writes are up to 264MB/S
sequential on HDD's
output of crystal diskmark on windows 10 vm
Sequential Read (Q= 32,T= 1) :   688.536 MB/s
  Sequential Write (Q= 32,T= 1) :   264.254 MB/s
  Random Read 4KiB (Q=  8,T= 8) :   176.069 MB/s [  42985.6 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :    63.217 MB/s [  15433.8 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :   159.598 MB/s [  38964.4 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :    54.212 MB/s [  13235.4 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :     3.488 MB/s [    851.6 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :     3.006 MB/s [    733.9 IOPS]
also enabling libgfapi on the engine was the best performance option i
ever tweaked, easily doubled reads / writes
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/S7I3PQVERQZT6Q...

[ovirt-users] Re: Poor I/O Performance (again...)

Leo David