On Thu, Nov 11, 2021 at 4:33 AM Pascal D <pascal(a)butterflyit.com> wrote:
I have been trying to figure out why cloning a VM and creating a template from ovirt is
so slow. I am using ovirt 4.3.10 over NFS. My NFS server is running NFS 4 over RAID10 with
SSD disks over a 10G network and 9000 MTU
Therocially I should be writing a 50GB file in around 1m30s
a direct copy from the SPM host server of an image to another image on the same host
takes 6m34s
a cloning from ovirt takes around 29m
So quite a big difference. Therefore I started investigating and found that ovirt
launches a qemu-img process with no source and target cache. Therefore thinking that could
be the issue, I change the cache mode to writeback and was able to run the exact command
in 8m14s. Over 3 times faster. I haven't tried yet other parameters line -o
preallocation=metadata
-o preallocation=metadata may work for files, we don't use it since it is not
compatible with block storage (requires allocation of the entire
volume upfront).
but was wondering why no cache was selected and how to change it to
use cache writeback
We don't use the host page cache. There are several issues;
- reading stale data after another host change an image on shared storage
this should probably not happen with NFS.
- writing to the page cache pollute the page cache with data that is unlikely to
be needed, since vms also do not use the page cache (for other reasons).
so you may reclaim memory that should be used by your vms during the copy.
- the kernel like to buffer huge amount of data, and flush too much
data at the same
time. This cause delays in accessing storage during flushing. This
is may break
sanlock leases that must have access to storage to update the storage leases.
We improved copy performance a few years ago using the -W option, allowing
concurrent writes. This can speed up copy to block storage (iscsi/fc)
up to 6 times[1].
When we tested this with NFS, we did not see big improvement, so we
did not enable
it. It also recommended to use -W for raw preallocated disk, since it may cause
fragmentation.
You can try to change this in vdsm/storage/sd.py:
396 def recommends_unordered_writes(self, format):
397 """
398 Return True if unordered writes are recommended for
copying an image
399 using format to this storage domain.
400
401 Unordered writes improve copy performance but are
recommended only for
402 preallocated devices and raw format.
403 """
404 return format == sc.RAW_FORMAT and not self.supportsSparseness
This allows -W only on raw preallocated disks. So it will not be used for
raw-sparse (NFS thin) or qcow2-sparse (snapshots on NFS), or for
qcow2 on block storage.
We use unordered writes for any disk in ovirt-imageio, and other tools
like nbdcopy
also always enable unordered writes, so maybe we should enable it in all cases.
To enable unordered writes for any volume, change this to:
def recommends_unordered_writes(self, format):
"""
Allow unordered writes only storage in any format.
"""
return True
If you want to always enable this only for file storage (NFS, GlsuterFS,
LocalFS, posix) add this method in vdsm/storage/nfsSD.py:
class FileStorageDomainManifest(sd.StorageDomainManifest):
...
def recommends_unordered_writes(self, format):
"""
Override StorageDomainManifest to allow on on qcow2 and raw
sparse images.
"""
return True
Please report how it works for you.
If this give good results, file a bug to enable option.
I think we can enable this based on vdsm configuration, so it will be
easy to disable
the option if it causes trouble with some storage domain types or image formats.
command launched by ovirt:
/usr/bin/qemu-img convert -p -t none -T none -f qcow2
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
-O qcow2 -o compat=1.1
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9
With the change suggested, this command will become:
/usr/bin/qemu-img convert -p -t none -T none -f qcow2
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
-O qcow2 -o compat=1.1 -W
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9
You can test this in the shell without modifying vdsm to test how it
affects performance.
[1]
https://bugzilla.redhat.com/1511891#c57
Nir