
On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský <tgolembi@redhat.com> wrote:
Hi,
currently, when we run virt-sparsify on VM or user runs VM with discard enabled and when the disk is on block storage in qcow, the results are not reflected in oVirt. The blocks get discarded, storage can reuse them and reports correct allocation statistics, but oVirt does not. In oVirt one can still see the original allocation for disk and storage domain as it was before blocks were discarded. This is super-confusing to the users because when they check after running virt-sparsify and see the same values they think sparsification is not working. Which is not true.
This may be documentation issue. This is a known limitation of oVirt thin provisioned storage. We allocate space as needed, but we release the space only when a volume is deleted.
It all seems to be because of our LVM layout that we have on storage domain. The feature page for discard [1] suggests it could be solved by running lvreduce. But this does not seem to be true. When blocks are discarded the QCOW does not necessarily change its apparent size, the blocks don't have to be removed from the end of the disk. So running lvreduce is likely to remove valuable data.
We have an API to (safely) reduce a volume to optimal size: http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/... Reducing images depends on qcow2 image-end-offset. We can tell which is the highest offset used by inactive disk: https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/... and reduce the logical volume to this size. But this will not works since qcow2 image-end-offset is not decreased by virt-sparsify --in-place So it is true that sparsify releases unused space on storage level, but it does not decrease the qcow2 image allocation, so we cannot reduce the logical volumes.
At the moment I don't see how we could achieve the correct values. If anyone has any idea feel free to entertain me. The only option seems to be to switch to LVM thin pools. Do we have any plans on doing that?
No, thin pools do not support clustering, this can be used only on a single host. oVirt lvm based volumes are accessed on multiple hosts at the same time. Here is an example sparisfy test showing the issue: Before writing data to new disk guest: # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data storage: $ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 00:57 /home/target/2/00 host: # qemu-img info /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 image: /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 file format: qcow2 virtual size: 10 GiB (10737418240 bytes) disk size: 0 B cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters Image end offset: 12582912 After writing 5g file to file system on this disk in the guest: guest: $ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct conv=fsync status=progress # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 5.2G 4.9G 52% /data storage: $ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:06 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104 After deleting the 5g file: guest: # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data storage: $ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:12 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104 After sparsifying disk: storage: $ qemu-img check /var/tmp/download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552 $ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 01:14 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 4822138880 Allocation decreased from 50% to 0.1%, but image end offset decreased only from 5381423104 to 4822138880 (-10.5%). I don't know if this is a behavior change in virt-sparsify or qemu or it was always like that. We had an old and unused sparsifyVolume API in vdsm before 4.4. This did not use --in-place and was very complicated because of this. But I think it would work in this case, since qemu-img convert will drop the unallocated areas. For example after downloading the sparsified disk, we get: $ qemu-img check download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552 Kevin, is this the expected behavior or a bug in qemu? The disk I tested is a single qcow2 image without the backing file, so theoretically qemu can deallocate all the discarded clusters. Nir