Poor performance of oVirt Export Domain

Hello I deployed a dedicated server (fs05.holding.com) on CentOS 7.5 and created a VDO volume on it. The local write test in the VDO volume on this server gives us an acceptable result # dd if=/dev/zero of=/mnt/vdo-vd1/nfs/testfile count=1000000 1000000+0 records in 1000000+0 records out 512000000 bytes (512 MB) copied, 6.26545 s, 81.7 MB/s The disk capacity of the VDO volume is connected to the oVirt 4.2.5 cluster as the Export Domain via NFS. I'm seeing a problem with the low performance of Export Domain. Snapshots of virtual machines are copied very slowly to the Export Domain, approximately 6-8 MB/s. At the same time, if I try to run a write test in an mounted NFS-directory on any of the oVit cluster hosts, I get about 50-70 MB/s. # dd if=/dev/zero of=/rhev/data-center/mnt/fs05.holding.com:_mnt_vdo-vd1_nfs_ovirt-vm-backup/testfile count=10000000 10000000+0 records in 10000000+0 records out 5120000000 bytes (5.1 GB) copied, 69.5506 s, 73.6 MB/s Help me understand the reason for the slow copying to Export Domain on oVirt.

On Fri, Aug 3, 2018 at 12:12 PM <Aleksey.I.Maksimov@yandex.ru> wrote:
Hello
I deployed a dedicated server (fs05.holding.com) on CentOS 7.5 and created a VDO volume on it. The local write test in the VDO volume on this server gives us an acceptable result
# dd if=/dev/zero of=/mnt/vdo-vd1/nfs/testfile count=1000000 1000000+0 records in 1000000+0 records out 512000000 bytes (512 MB) copied, 6.26545 s, 81.7 MB/s
This is not a good test for copying images: - You are not using direct I/O, and - You are using block size of 512 bytes, way too small - You don't sync at the end of the transfer - You don't copy real image, reading zeros does not take any time, while reading real image takes time. This can drop dd performance in half A better way to test this is: dd if=/path/to/src of=/path/to/dst bs=8M count=1280 iflag=direct oflag=direct conv=fsync status=progress This does not optimize copying sparse parts of the image. For this you can use qemu-img, what oVirt is using. The command oVirt uses is: qemu-img convert -p -f raw|qcow2 -O raw|qcow2 -t none -T none /paht/to/src /path/to/dst The disk capacity of the VDO volume is connected to the oVirt 4.2.5 cluster
as the Export Domain via NFS.
I'm seeing a problem with the low performance of Export Domain. Snapshots of virtual machines are copied very slowly to the Export Domain, approximately 6-8 MB/s.
This is very very low throughput. Can you give more details on - the source domain, how it is connected (iSCSI/FC)? - the destination domain, how is it connected? (NFS 4.2?, 1G nic? 10G nic?) - the source image - can you attach output of: qemu-img map --output json /path/to/src - If the source image is on block storage, please copy it to a file system supporting sparsness using NFS 4.2 using: qemu-img convert -p -f raw -O raw -t none -T none /path/to/src/ /path/to/dst (if the image is qcow2, replace "raw" with "qcow2")
At the same time, if I try to run a write test in an mounted NFS-directory on any of the oVit cluster hosts, I get about 50-70 MB/s.
# dd if=/dev/zero of=/rhev/data-center/mnt/fs05.holding.com:_mnt_vdo-vd1_nfs_ovirt-vm-backup/testfile count=10000000 10000000+0 records in 10000000+0 records out 5120000000 bytes (5.1 GB) copied, 69.5506 s, 73.6 MB/s
Again, not a good way to test. This sounds like https://bugzilla.redhat.com/1511891. (the bug may be private) Finally, can you provide detailed commands to reproduce your setup, so we can reproduce it in the lab? - how to create the vdo volume - how you created the file system on this volume - NFS version/configuration on the server - info about the server - info abut the network - info about the host Nir

Hello Nir Thanks for the advice using dd. I found the main reason for the low level of performance. The problem was in the faulty battery from cache-module on the disk shelf. We also used a low-performance RAID configuration. Thanks for the help. 05.08.2018, 18:38, "Nir Soffer" <nsoffer@redhat.com>:
On Fri, Aug 3, 2018 at 12:12 PM <Aleksey.I.Maksimov@yandex.ru> wrote:
Hello
I deployed a dedicated server (fs05.holding.com) on CentOS 7.5 and created a VDO volume on it. The local write test in the VDO volume on this server gives us an acceptable result
# dd if=/dev/zero of=/mnt/vdo-vd1/nfs/testfile count=1000000 1000000+0 records in 1000000+0 records out 512000000 bytes (512 MB) copied, 6.26545 s, 81.7 MB/s
This is not a good test for copying images: - You are not using direct I/O, and - You are using block size of 512 bytes, way too small - You don't sync at the end of the transfer - You don't copy real image, reading zeros does not take any time, while reading real image takes time. This can drop dd performance in half
A better way to test this is:
dd if=/path/to/src of=/path/to/dst bs=8M count=1280 iflag=direct oflag=direct conv=fsync status=progress
This does not optimize copying sparse parts of the image. For this you can use qemu-img, what oVirt is using.
The command oVirt uses is:
qemu-img convert -p -f raw|qcow2 -O raw|qcow2 -t none -T none /paht/to/src /path/to/dst
The disk capacity of the VDO volume is connected to the oVirt 4.2.5 cluster as the Export Domain via NFS.
I'm seeing a problem with the low performance of Export Domain. Snapshots of virtual machines are copied very slowly to the Export Domain, approximately 6-8 MB/s.
This is very very low throughput.
Can you give more details on - the source domain, how it is connected (iSCSI/FC)? - the destination domain, how is it connected? (NFS 4.2?, 1G nic? 10G nic?) - the source image - can you attach output of: qemu-img map --output json /path/to/src - If the source image is on block storage, please copy it to a file system supporting sparsness using NFS 4.2 using: qemu-img convert -p -f raw -O raw -t none -T none /path/to/src/ /path/to/dst (if the image is qcow2, replace "raw" with "qcow2")
At the same time, if I try to run a write test in an mounted NFS-directory on any of the oVit cluster hosts, I get about 50-70 MB/s.
# dd if=/dev/zero of=/rhev/data-center/mnt/fs05.holding.com:_mnt_vdo-vd1_nfs_ovirt-vm-backup/testfile count=10000000 10000000+0 records in 10000000+0 records out 5120000000 bytes (5.1 GB) copied, 69.5506 s, 73.6 MB/s
Again, not a good way to test.
This sounds like https://bugzilla.redhat.com/1511891. (the bug may be private)
Finally, can you provide detailed commands to reproduce your setup, so we can reproduce it in the lab? - how to create the vdo volume - how you created the file system on this volume - NFS version/configuration on the server - info about the server - info abut the network - info about the host
Nir
participants (3)
-
Aleksey.I.Maksimov@yandex.ru
-
Nir Soffer
-
Алексей Максимов