
I'm trying to get optimal iscsi performance. We're a heavy iscsi shop, with 10g net. I'mm experimenting with SSDs, and the performance in ovirt is way, way less than I would have hoped. More than an order of magnitude slower. here's a datapoint. Im running filebench, with the OLTP workload. First, i run it on one of the hosts, that has an SSD directly attached. create an xfs filesystem (created on a vg "device" on top of the SSD), mount it with noatime, and run the benchmark. 37166: 74.084: IO Summary: 3746362 ops, 62421.629 ops/s, (31053/31049 r/w), 123.6mb/s, 161us cpu/op, 1.1ms latency I then unmount it, and make the exact same device an iscsi target, and create a storage domain with it. I then create a disk for a VM running *on the same host*, and run the benchmark. The same thing: filebench, oltp workload, xfs filesystem, noatime. 13329: 91.728: IO Summary: 153548 ops, 2520.561 ops/s, (1265/1243 r/w), 4.9mb/s, 289us cpu/op, 88.4ms latency 62,000 ops/s vs 2500 ops/s. what???? Someone might be tempted to say, "try making the device directly available, AS a device, to the VM". Unfortunately,this is not an option. My goal is specifically to put together a new, high performing storage domain, that I can use as database devices in VMs. I'm not expecting the same 62,000 ops/second. but I was expecting at *least* 5,000. Ideally more like 10,000. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown@medata.com| www.medata.com

On Mon, Jul 20, 2020 at 8:51 PM Philip Brown <pbrown@medata.com> wrote:
I'm trying to get optimal iscsi performance. We're a heavy iscsi shop, with 10g net.
I'mm experimenting with SSDs, and the performance in ovirt is way, way less than I would have hoped. More than an order of magnitude slower.
here's a datapoint. Im running filebench, with the OLTP workload.
Did you try fio? https://fio.readthedocs.io/en/latest/fio_doc.html I think this is the most common and advanced tool for such tests.
First, i run it on one of the hosts, that has an SSD directly attached. create an xfs filesystem (created on a vg "device" on top of the SSD), mount it with noatime, and run the benchmark.
37166: 74.084: IO Summary: 3746362 ops, 62421.629 ops/s, (31053/31049 r/w), 123.6mb/s, 161us cpu/op, 1.1ms latency
What do you get if you login to the target on the host and access the LUN directly on the host? If you create a file system on the LUN and mount it on the host?
I then unmount it, and make the exact same device an iscsi target, and create a storage domain with it. I then create a disk for a VM running *on the same host*, and run the benchmark.
What kind of disk? thin? preallocated?
The same thing: filebench, oltp workload, xfs filesystem, noatime.
13329: 91.728: IO Summary: 153548 ops, 2520.561 ops/s, (1265/1243 r/w), 4.9mb/s, 289us cpu/op, 88.4ms latency
4.9mb/s looks very low. Are you testing very small random writes?
62,000 ops/s vs 2500 ops/s.
what????
Someone might be tempted to say, "try making the device directly available, AS a device, to the VM". Unfortunately,this is not an option. My goal is specifically to put together a new, high performing storage domain, that I can use as database devices in VMs.
This is something to discuss with qemu folks. oVirt is just an easy way to manage VMs. Please attach the VM XML using: virsh -r dumpxml vm-name-or-id And the qemu command line from: /var/log/libvirt/qemu/vm-name.log I think you will get the best performance using direct LUN. Storage domain is best if you want to use features provided by storage domain. If your important feature is performance, you want to connect the storage in the most direct way to your VM. Mordechai, did we do any similar performance tests in our lab? Do you have example results? Nir

Il lun 20 lug 2020, 23:42 Nir Soffer <nsoffer@redhat.com> ha scritto:
I think you will get the best performance using direct LUN.
Is direct LUN using the QEMU iSCSI initiator, or SG_IO, and if so is it using /dev/sg or has that been fixed? SG_IO is definitely not going to be the fastest, especially with /dev/sg. Storage
domain is best if you want to use features provided by storage domain. If your important feature is performance, you want to connect the storage in the most direct way to your VM.
Agreed but you want a virtio-blk device, not SG_IO; direct LUN with SG_IO is only recommended if you want to do clustering and other stuff that requires SCSI-level access. Paolo
Mordechai, did we do any similar performance tests in our lab? Do you have example results?
Nir

FYI, I just tried it with direct lun. it is as bad or worse. I dont know about that sg io vs qemu initiator, but here is the results. 15223: 62.824: IO Summary: 83751 ops, 1387.166 ops/s, (699/681 r/w), 2.7mb/s, 619us cpu/op, 281.4ms latency 15761: 62.268: IO Summary: 77610 ops, 1287.908 ops/s, (649/632 r/w), 2.5mb/s, 686us cpu/op, 283.0ms latency 16397: 61.812: IO Summary: 94065 ops, 1563.781 ops/s, (806/750 r/w), 3.0mb/s, 894us cpu/op, 217.3ms latency ----- Original Message ----- From: "Paolo Bonzini" <pbonzini@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Philip Brown" <pbrown@medata.com>, "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Stefan Hajnoczi" <stefanha@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" <mlehrer@redhat.com> Sent: Monday, July 20, 2020 3:46:39 PM Subject: Re: [ovirt-users] very very bad iscsi performance Il lun 20 lug 2020, 23:42 Nir Soffer <nsoffer@redhat.com> ha scritto:
I think you will get the best performance using direct LUN.
Is direct LUN using the QEMU iSCSI initiator, or SG_IO, and if so is it using /dev/sg or has that been fixed? SG_IO is definitely not going to be the fastest, especially with /dev/sg. Storage
domain is best if you want to use features provided by storage domain. If your important feature is performance, you want to connect the storage in the most direct way to your VM.
Agreed but you want a virtio-blk device, not SG_IO; direct LUN with SG_IO is only recommended if you want to do clustering and other stuff that requires SCSI-level access. Paolo

AAAAAH! my apologies. It seemed very odd, so I reviewed, and discovered that I messed up my testing of direct lun. updated results are improved from my previous email, but not any better than going through normal storage domain. 18156: 61.714: IO Summary: 110396 ops, 1836.964 ops/s, (921/907 r/w), 3.6mb/s, 949us cpu/op, 27.3ms latency 17095: 61.794: IO Summary: 123458 ops, 2052.922 ops/s, (1046/996 r/w), 4.0mb/s, 858us cpu/op, 60.4ms latency ----- Original Message ----- From: "Philip Brown" <pbrown@medata.com> To: "Paolo Bonzini" <pbonzini@redhat.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Stefan Hajnoczi" <stefanha@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" <mlehrer@redhat.com> Sent: Monday, July 20, 2020 4:30:32 PM Subject: Re: [ovirt-users] very very bad iscsi performance FYI, I just tried it with direct lun. it is as bad or worse. I dont know about that sg io vs qemu initiator, but here is the results. 15223: 62.824: IO Summary: 83751 ops, 1387.166 ops/s, (699/681 r/w), 2.7mb/s, 619us cpu/op, 281.4ms latency 15761: 62.268: IO Summary: 77610 ops, 1287.908 ops/s, (649/632 r/w), 2.5mb/s, 686us cpu/op, 283.0ms latency 16397: 61.812: IO Summary: 94065 ops, 1563.781 ops/s, (806/750 r/w), 3.0mb/s, 894us cpu/op, 217.3ms latency

Do you have NICs that support iSCSI -I guess you can use hardware offloading? MTU size ? Lattency is usually the killer of any performance, what is your round-trip time ? Best Regards, Strahil Nikolov На 21 юли 2020 г. 2:37:10 GMT+03:00, Philip Brown <pbrown@medata.com> написа:
AAAAAH! my apologies. It seemed very odd, so I reviewed, and discovered that I messed up my testing of direct lun.
updated results are improved from my previous email, but not any better than going through normal storage domain.
18156: 61.714: IO Summary: 110396 ops, 1836.964 ops/s, (921/907 r/w), 3.6mb/s, 949us cpu/op, 27.3ms latency
17095: 61.794: IO Summary: 123458 ops, 2052.922 ops/s, (1046/996 r/w), 4.0mb/s, 858us cpu/op, 60.4ms latency
----- Original Message ----- From: "Philip Brown" <pbrown@medata.com> To: "Paolo Bonzini" <pbonzini@redhat.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Stefan Hajnoczi" <stefanha@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" <mlehrer@redhat.com> Sent: Monday, July 20, 2020 4:30:32 PM Subject: Re: [ovirt-users] very very bad iscsi performance
FYI, I just tried it with direct lun.
it is as bad or worse. I dont know about that sg io vs qemu initiator, but here is the results.
15223: 62.824: IO Summary: 83751 ops, 1387.166 ops/s, (699/681 r/w), 2.7mb/s, 619us cpu/op, 281.4ms latency 15761: 62.268: IO Summary: 77610 ops, 1287.908 ops/s, (649/632 r/w), 2.5mb/s, 686us cpu/op, 283.0ms latency 16397: 61.812: IO Summary: 94065 ops, 1563.781 ops/s, (806/750 r/w), 3.0mb/s, 894us cpu/op, 217.3ms latency
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MAVOLDANXJCE6V...

yes I am testing small writes. "oltp workload" means, simulation of OLTP database access. You asked me to test the speed of iscsi from another host, which is very reasonable. So here are the results, run from another node in the ovirt cluster. Setup is using: - exact same vg device, exported via iscsi - mounted directly into another physical host running centos 7, rather than a VM running on it - literaly the same filesystem, again, mounted noatime I ran the same oltp workload. this setup gives the following results over 2 runs. grep Summary oltp.iscsimount.? oltp.iscsimount.1:35906: 63.433: IO Summary: 648762 ops, 10811.365 ops/s, (5375/5381 r/w), 21.4mb/s, 475us cpu/op, 1.3ms latency oltp.iscsimount.2:36830: 61.072: IO Summary: 824557 ops, 13741.050 ops/s, (6844/6826 r/w), 27.2mb/s, 429us cpu/op, 1.1ms latency As requested, I attach virsh output, and qemu log

On Tue, Jul 21, 2020 at 2:20 AM Philip Brown <pbrown@medata.com> wrote:
yes I am testing small writes. "oltp workload" means, simulation of OLTP database access.
You asked me to test the speed of iscsi from another host, which is very reasonable. So here are the results, run from another node in the ovirt cluster. Setup is using:
- exact same vg device, exported via iscsi - mounted directly into another physical host running centos 7, rather than a VM running on it - literaly the same filesystem, again, mounted noatime
I ran the same oltp workload. this setup gives the following results over 2 runs.
grep Summary oltp.iscsimount.? oltp.iscsimount.1:35906: 63.433: IO Summary: 648762 ops, 10811.365 ops/s, (5375/5381 r/w), 21.4mb/s, 475us cpu/op, 1.3ms latency oltp.iscsimount.2:36830: 61.072: IO Summary: 824557 ops, 13741.050 ops/s, (6844/6826 r/w), 27.2mb/s, 429us cpu/op, 1.1ms latency
As requested, I attach virsh output, and qemu log
What we see in your logs: You are using: - thin disk - qcow2 image on logical volume: - virtio-scsi <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/fddf9b00-6c80-4be2-85f1-4b3852d93786/images/47af0207-8b51-4a59-a93e-fddf9ed56d44/743550ef-7670-4556-8d7f-4d6fcfd5eb70'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>47af0207-8b51-4a59-a93e-fddf9ed56d44</serial> <boot order='1'/> <alias name='ua-47af0207-8b51-4a59-a93e-fddf9ed56d44'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> -object iothread,id=iothread1 \ -device virtio-scsi-pci,iothread=iothread1,id=ua-a50f193d-fa74-419d-bf03-f5a2677acd2a,bus=pci.0,addr=0x5 \ -drive file=/rhev/data-center/mnt/blockSD/87cecd83-d6c8-4313-9fad-12ea32768703/images/47af0207-8b51-4a59-a93e-fddf9ed56d44/743550ef-7670-4556-8d7f-4d6fcfd5eb70,format=qcow2,if=none,id=drive-ua-47af0207-8b51-4a59-a93e-fddf9ed56d44,serial=47af0207-8b51-4a59-a93e-fddf9ed56d44,werror=stop,rerror=stop,cache=none,aio=native \ This is the most flexible option oVirt has, but not the default. Known issue with such a disk is possible pausing of the VM when the disk becomes full, if oVirt cannot extend the underlying logical volume fast enough. It can be mitigated by using larger chunks in vdsm. We recommend these settings if you are going to use VMs with heavy I/O with thin disks: # cat /etc/vdsm/vdsm.conf.d/99-local.conf [irs] # Together with volume_utilization_chunk_mb, set the minimal free # space before a thin provisioned block volume is extended. Use lower # values to extend earlier. # default value: # volume_utilization_percent = 50 volume_utilization_percent = 25 # Size of extension chunk in megabytes, and together with # volume_utilization_percent, set the free space limit. Use higher # values to extend in bigger chunks. # default value: # volume_utilization_chunk_mb = 1024 volume_utilization_chunk_mb = 4096 With this configuration, when free space on the disk is 1 GiB, oVirt will extend the disk by 4 GiB. So your disk may be up to 5 GiB larger than the used space, but if the VM is writing data very fast, the chance of pausing is reduced. If you want to reduce the chance of pausing your database in the most busy times to zero, using a preallocated disk is the way. In oVirt 4.4. you can check this option when creating a disk: [x] Enable Incremental Backup With: Allocation Policy: [Preallocated] You will get a preallocated disk in the specified size, using qcow2 format. This gives you both the option to use incremental backup, faster disk operations in oVirt (since qemu-img does not need to read the entire disk), and avoids the pausing issue. It may also defeat thin provisioning, but if your backend storage supports thin provisioning anyway it does not matter. To get best performance for database use case preallocated volume should be better. Please try to benchmark: - raw preallocated disk - using virtio instead of virtio-scsi If your database can use multiple disks, you may get better performance by adding multiple disks and use one iothread per disk. See also interesting talk about storage performance from 2017: https://events19.lfasiallc.com/wp-content/uploads/2017/11/Storage-Performanc...

Thank you for the analysis. I have some further comments: First off, filebench pre-writes the files before doing oltp benchmarks, so I dont think the thin provisioning is at play here. I will double check this, but if you dont hear otherwise, please presume that is the case :) Secondly, I am surprised at your recommendation to use virtio instead of virtio-scsi. since the writeup for virtio-scsi claims it has equivalent performance in general, and adds better scaling https://www.ovirt.org/develop/release-management/features/storage/virtio-scs... As far as your suggestion for using multiple disks for scaling higher: We are using an SSD. Isnt the whole advantage of using SSD drives, that you can get the IOP/s performance of 10 drives, out of a single drive? We certainly get that using it natively, outside of a VM. SO it would be nice to see performance approaching that within an ovirt VM. ----- Original Message ----- From: "Nir Soffer" <nsoffer@redhat.com> To: "Philip Brown" <pbrown@medata.com> Cc: "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Stefan Hajnoczi" <stefanha@redhat.com>, "Paolo Bonzini" <pbonzini@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" <mlehrer@redhat.com>, "Kevin Wolf" <kwolf@redhat.com> Sent: Tuesday, July 21, 2020 4:23:36 AM Subject: [BULK] Re: [ovirt-users] very very bad iscsi performance On Tue, Jul 21, 2020 at 2:20 AM Philip Brown <pbrown@medata.com> wrote:
yes I am testing small writes. "oltp workload" means, simulation of OLTP database access.
You asked me to test the speed of iscsi from another host, which is very reasonable. So here are the results, run from another node in the ovirt cluster. Setup is using:
- exact same vg device, exported via iscsi - mounted directly into another physical host running centos 7, rather than a VM running on it - literaly the same filesystem, again, mounted noatime
I ran the same oltp workload. this setup gives the following results over 2 runs.
grep Summary oltp.iscsimount.? oltp.iscsimount.1:35906: 63.433: IO Summary: 648762 ops, 10811.365 ops/s, (5375/5381 r/w), 21.4mb/s, 475us cpu/op, 1.3ms latency oltp.iscsimount.2:36830: 61.072: IO Summary: 824557 ops, 13741.050 ops/s, (6844/6826 r/w), 27.2mb/s, 429us cpu/op, 1.1ms latency
As requested, I attach virsh output, and qemu log
What we see in your logs: You are using: - thin disk - qcow2 image on logical volume: - virtio-scsi <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/fddf9b00-6c80-4be2-85f1-4b3852d93786/images/47af0207-8b51-4a59-a93e-fddf9ed56d44/743550ef-7670-4556-8d7f-4d6fcfd5eb70'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>47af0207-8b51-4a59-a93e-fddf9ed56d44</serial> <boot order='1'/> <alias name='ua-47af0207-8b51-4a59-a93e-fddf9ed56d44'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> -object iothread,id=iothread1 \ -device virtio-scsi-pci,iothread=iothread1,id=ua-a50f193d-fa74-419d-bf03-f5a2677acd2a,bus=pci.0,addr=0x5 \ -drive file=/rhev/data-center/mnt/blockSD/87cecd83-d6c8-4313-9fad-12ea32768703/images/47af0207-8b51-4a59-a93e-fddf9ed56d44/743550ef-7670-4556-8d7f-4d6fcfd5eb70,format=qcow2,if=none,id=drive-ua-47af0207-8b51-4a59-a93e-fddf9ed56d44,serial=47af0207-8b51-4a59-a93e-fddf9ed56d44,werror=stop,rerror=stop,cache=none,aio=native \ This is the most flexible option oVirt has, but not the default. Known issue with such a disk is possible pausing of the VM when the disk becomes full, if oVirt cannot extend the underlying logical volume fast enough. It can be mitigated by using larger chunks in vdsm. We recommend these settings if you are going to use VMs with heavy I/O with thin disks: # cat /etc/vdsm/vdsm.conf.d/99-local.conf [irs] # Together with volume_utilization_chunk_mb, set the minimal free # space before a thin provisioned block volume is extended. Use lower # values to extend earlier. # default value: # volume_utilization_percent = 50 volume_utilization_percent = 25 # Size of extension chunk in megabytes, and together with # volume_utilization_percent, set the free space limit. Use higher # values to extend in bigger chunks. # default value: # volume_utilization_chunk_mb = 1024 volume_utilization_chunk_mb = 4096 With this configuration, when free space on the disk is 1 GiB, oVirt will extend the disk by 4 GiB. So your disk may be up to 5 GiB larger than the used space, but if the VM is writing data very fast, the chance of pausing is reduced. If you want to reduce the chance of pausing your database in the most busy times to zero, using a preallocated disk is the way. In oVirt 4.4. you can check this option when creating a disk: [x] Enable Incremental Backup With: Allocation Policy: [Preallocated] You will get a preallocated disk in the specified size, using qcow2 format. This gives you both the option to use incremental backup, faster disk operations in oVirt (since qemu-img does not need to read the entire disk), and avoids the pausing issue. It may also defeat thin provisioning, but if your backend storage supports thin provisioning anyway it does not matter. To get best performance for database use case preallocated volume should be better. Please try to benchmark: - raw preallocated disk - using virtio instead of virtio-scsi If your database can use multiple disks, you may get better performance by adding multiple disks and use one iothread per disk. See also interesting talk about storage performance from 2017: https://events19.lfasiallc.com/wp-content/uploads/2017/11/Storage-Performanc...

On Tue, Jul 21, 2020 at 07:14:53AM -0700, Philip Brown wrote:
Thank you for the analysis. I have some further comments:
First off, filebench pre-writes the files before doing oltp benchmarks, so I dont think the thin provisioning is at play here. I will double check this, but if you dont hear otherwise, please presume that is the case :)
Secondly, I am surprised at your recommendation to use virtio instead of virtio-scsi. since the writeup for virtio-scsi claims it has equivalent performance in general, and adds better scaling https://www.ovirt.org/develop/release-management/features/storage/virtio-scs...
As far as your suggestion for using multiple disks for scaling higher: We are using an SSD. Isnt the whole advantage of using SSD drives, that you can get the IOP/s performance of 10 drives, out of a single drive? We certainly get that using it natively, outside of a VM. SO it would be nice to see performance approaching that within an ovirt VM.
Hi, At first glance it appears that the filebench OLTP workload does not use O_DIRECT, so this isn't a measurement of pure disk I/O performance: https://github.com/filebench/filebench/blob/master/workloads/oltp.f If you suspect that disk performance is the issue please run a benchmark that bypasses the page cache using O_DIRECT. The fio setting is direct=1. Here is an example fio job for 70% read/30% write 4KB random I/O: [global] filename=/path/to/device runtime=120 ioengine=libaio direct=1 ramp_time=10 # start measuring after warm-up time [read] readwrite=randrw rwmixread=70 rwmixwrite=30 iodepth=64 blocksize=4k (Based on https://blog.vmsplice.net/2017/11/common-disk-benchmarking-mistakes.html) Stefan

Im in the middle of a priority issue right now, so cant take time out to rerun the bench, but... Usually in that kind of situation, if you dont turn on sync-to-disk on every write, you get benchmarks that are artificially HIGH. Forcing O_DIRECT slows throughput down. Dont you think the results are bad enough already? :-} ----- Original Message ----- From: "Stefan Hajnoczi" <stefanha@redhat.com> To: "Philip Brown" <pbrown@medata.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Paolo Bonzini" <pbonzini@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" <mlehrer@redhat.com>, "Kevin Wolf" <kwolf@redhat.com> Sent: Thursday, July 23, 2020 6:09:39 AM Subject: Re: [BULK] Re: [ovirt-users] very very bad iscsi performance Hi, At first glance it appears that the filebench OLTP workload does not use O_DIRECT, so this isn't a measurement of pure disk I/O performance: https://github.com/filebench/filebench/blob/master/workloads/oltp.f If you suspect that disk performance is the issue please run a benchmark that bypasses the page cache using O_DIRECT. The fio setting is direct=1. Here is an example fio job for 70% read/30% write 4KB random I/O: [global] filename=/path/to/device runtime=120 ioengine=libaio direct=1 ramp_time=10 # start measuring after warm-up time [read] readwrite=randrw rwmixread=70 rwmixwrite=30 iodepth=64 blocksize=4k (Based on https://blog.vmsplice.net/2017/11/common-disk-benchmarking-mistakes.html) Stefan

Getting meaningful results is more important than getting good results. If the benchmark is not meaningful, it is not useful towards fixing the issue. Did you try virtio-blk with direct LUN? Paolo Il gio 23 lug 2020, 16:35 Philip Brown <pbrown@medata.com> ha scritto:
Im in the middle of a priority issue right now, so cant take time out to rerun the bench, but... Usually in that kind of situation, if you dont turn on sync-to-disk on every write, you get benchmarks that are artificially HIGH. Forcing O_DIRECT slows throughput down. Dont you think the results are bad enough already? :-}
----- Original Message ----- From: "Stefan Hajnoczi" <stefanha@redhat.com> To: "Philip Brown" <pbrown@medata.com> Cc: "Nir Soffer" <nsoffer@redhat.com>, "users" <users@ovirt.org>, "qemu-block" <qemu-block@nongnu.org>, "Paolo Bonzini" <pbonzini@redhat.com>, "Sergio Lopez Pascual" <slp@redhat.com>, "Mordechai Lehrer" < mlehrer@redhat.com>, "Kevin Wolf" <kwolf@redhat.com> Sent: Thursday, July 23, 2020 6:09:39 AM Subject: Re: [BULK] Re: [ovirt-users] very very bad iscsi performance
Hi, At first glance it appears that the filebench OLTP workload does not use O_DIRECT, so this isn't a measurement of pure disk I/O performance: https://github.com/filebench/filebench/blob/master/workloads/oltp.f
If you suspect that disk performance is the issue please run a benchmark that bypasses the page cache using O_DIRECT.
The fio setting is direct=1.
Here is an example fio job for 70% read/30% write 4KB random I/O:
[global] filename=/path/to/device runtime=120 ioengine=libaio direct=1 ramp_time=10 # start measuring after warm-up time
[read] readwrite=randrw rwmixread=70 rwmixwrite=30 iodepth=64 blocksize=4k
(Based on https://blog.vmsplice.net/2017/11/common-disk-benchmarking-mistakes.html)
Stefan

On Thu, Jul 23, 2020 at 07:25:14AM -0700, Philip Brown wrote:
Usually in that kind of situation, if you dont turn on sync-to-disk on every write, you get benchmarks that are artificially HIGH. Forcing O_DIRECT slows throughput down. Dont you think the results are bad enough already? :-}
The results that were posted do not show iSCSI performance in isolation so it's hard to diagnose the problem. The page cache is used when the O_DIRECT flag is absent. I/O is not sent to the disk at all when it can be fulfilled from the page cache in memory. Therefore the benchmark is not an accurate indicator of disk I/O performance. In addition to this, page cache behavior depends on various factors such as available free memory, operating system implementation and version, etc. This makes it hard to compare results across VMs, different machines, etc. Stefan

I think that the best test is to: 0. Set only 1 change in the infrastructure 1. Automatically create your VM 2. Install the necessary application on the VM from point 1 3. Restore from backup the state of the App 4. Run a typical workload on the app - for example a bunch of queries that are pushed against a typical DB 5. Measure performance during point 4 (for example time of execution) 6. Start over Anything else is a waste of time. Best Regards, Strahil Nikolov На 24 юли 2020 г. 13:26:18 GMT+03:00, Stefan Hajnoczi <stefanha@redhat.com> написа:
On Thu, Jul 23, 2020 at 07:25:14AM -0700, Philip Brown wrote:
Usually in that kind of situation, if you dont turn on sync-to-disk on every write, you get benchmarks that are artificially HIGH. Forcing O_DIRECT slows throughput down. Dont you think the results are bad enough already? :-}
The results that were posted do not show iSCSI performance in isolation so it's hard to diagnose the problem.
The page cache is used when the O_DIRECT flag is absent. I/O is not sent to the disk at all when it can be fulfilled from the page cache in memory. Therefore the benchmark is not an accurate indicator of disk I/O performance.
In addition to this, page cache behavior depends on various factors such as available free memory, operating system implementation and version, etc. This makes it hard to compare results across VMs, different machines, etc.
Stefan
participants (5)
-
Nir Soffer
-
Paolo Bonzini
-
Philip Brown
-
Stefan Hajnoczi
-
Strahil Nikolov