
On Fri, Oct 02, 2020 at 01:57:04AM +0300, Nir Soffer wrote:
On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský <tgolembi@redhat.com> wrote:
Hi,
currently, when we run virt-sparsify on VM or user runs VM with discard enabled and when the disk is on block storage in qcow, the results are not reflected in oVirt. The blocks get discarded, storage can reuse them and reports correct allocation statistics, but oVirt does not. In oVirt one can still see the original allocation for disk and storage domain as it was before blocks were discarded. This is super-confusing to the users because when they check after running virt-sparsify and see the same values they think sparsification is not working. Which is not true.
This may be documentation issue. This is a known limitation of oVirt thin provisioned storage. We allocate space as needed, but we release the space only when a volume is deleted.
It all seems to be because of our LVM layout that we have on storage domain. The feature page for discard [1] suggests it could be solved by running lvreduce. But this does not seem to be true. When blocks are discarded the QCOW does not necessarily change its apparent size, the blocks don't have to be removed from the end of the disk. So running lvreduce is likely to remove valuable data.
We have an API to (safely) reduce a volume to optimal size: http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/...
Reducing images depends on qcow2 image-end-offset. We can tell which is the highest offset used by inactive disk: https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/...
and reduce the logical volume to this size.
But this will not works since qcow2 image-end-offset is not decreased by
virt-sparsify --in-place
Right - this doesn't "defragment" the qcow2 file, ie. moving clusters to the beginning - so (except by accident) it won't make the qcow2 file smaller. Virt-sparsify in copying mode will actually do what you want, but obviously is much more heavyweight and complex to use.
So it is true that sparsify releases unused space on storage level, but it does not decrease the qcow2 image allocation, so we cannot reduce the logical volumes.
At the moment I don't see how we could achieve the correct values. If anyone has any idea feel free to entertain me. The only option seems to be to switch to LVM thin pools. Do we have any plans on doing that?
No, thin pools do not support clustering, this can be used only on a single host. oVirt lvm based volumes are accessed on multiple hosts at the same time.
Here is an example sparisfy test showing the issue:
Before writing data to new disk
guest:
# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data
storage:
$ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 00:57 /home/target/2/00
host:
# qemu-img info /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 image: /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 file format: qcow2 virtual size: 10 GiB (10737418240 bytes) disk size: 0 B cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false
# qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters Image end offset: 12582912
After writing 5g file to file system on this disk in the guest:
guest:
$ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct conv=fsync status=progress
# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 5.2G 4.9G 52% /data
storage:
$ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:06 /home/target/2/00
host:
# qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104
After deleting the 5g file:
guest:
# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data
storage:
$ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:12 /home/target/2/00
host:
# qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104
After sparsifying disk:
storage: $ qemu-img check /var/tmp/download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552
$ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 01:14 /home/target/2/00
host:
# qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 4822138880
Allocation decreased from 50% to 0.1%, but image end offset decreased only from 5381423104 to 4822138880 (-10.5%).
I don't know if this is a behavior change in virt-sparsify or qemu or it was always like that.
AFAIK nothing in virt-sparsify --in-place or qemu has changed here.
We had an old and unused sparsifyVolume API in vdsm before 4.4. This did not use --in-place and was very complicated because of this. But I think it would work in this case, since qemu-img convert will drop the unallocated areas.
For example after downloading the sparsified disk, we get:
$ qemu-img check download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552
Kevin, is this the expected behavior or a bug in qemu?
The disk I tested is a single qcow2 image without the backing file, so theoretically qemu can deallocate all the discarded clusters.
Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top