[ovirt-devel] Re: Disk sizes not updated on unmap/discard

2 Oct 2020

      On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský <tgolembi@redhat.com> wrote:
...
Hi,
currently, when we run virt-sparsify on VM or user runs VM with discard
enabled and when the disk is on block storage in qcow, the results are
not reflected in oVirt. The blocks get discarded, storage can reuse them
and reports correct allocation statistics, but oVirt does not. In oVirt
one can still see the original allocation for disk and storage domain as
it was before blocks were discarded. This is super-confusing to the
users because when they check after running virt-sparsify and see the
same values they think sparsification is not working. Which is not true.
This may be documentation issue. This is a known limitation of oVirt thin
provisioned storage. We allocate space as needed, but we release the
space only when a volume is deleted.
...
It all seems to be because of our LVM layout that we have on storage
domain. The feature page for discard [1] suggests it could be solved by
running lvreduce. But this does not seem to be true. When blocks are
discarded the QCOW does not necessarily change its apparent size, the
blocks don't have to be removed from the end of the disk. So running
lvreduce is likely to remove valuable data.
We have an API to (safely) reduce a volume to optimal size:
http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/...

Reducing images depends on qcow2 image-end-offset. We can tell which
is the highest offset used by inactive disk:
https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/...

and reduce the logical volume to this size.

But this will not works since qcow2 image-end-offset is not decreased by

    virt-sparsify --in-place

So it is true that sparsify releases unused space on storage level, but it does
not decrease the qcow2 image allocation, so we cannot reduce the logical
volumes.
...
At the moment I don't see how we could achieve the correct values. If
anyone has any idea feel free to entertain me. The only option seems to
be to switch to LVM thin pools. Do we have any plans on doing that?
No, thin pools do not support clustering, this can be used only on a single
host. oVirt lvm based volumes are accessed on multiple hosts at the same
time.

Here is an example sparisfy test showing the issue:

Before writing data to new disk

guest:

# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        10G  104M  9.9G   2% /data

storage:

$ ls -lhs /home/target/2/00
2.1G -rw-r--r--. 1 root root 100G Oct  2 00:57 /home/target/2/00

host:

# qemu-img info
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
image: /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters
Image end offset: 12582912

After writing 5g file to file system on this disk in the guest:

guest:

$ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct
conv=fsync status=progress

# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        10G  5.2G  4.9G  52% /data

storage:

$ ls -lhs /home/target/2/00
7.1G -rw-r--r--. 1 root root 100G Oct  2 01:06 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
Image end offset: 5381423104

After deleting the 5g file:

guest:

# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        10G  104M  9.9G   2% /data

storage:

$ ls -lhs /home/target/2/00
7.1G -rw-r--r--. 1 root root 100G Oct  2 01:12 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
Image end offset: 5381423104

After sparsifying disk:

storage:
$ qemu-img check /var/tmp/download.qcow2
No errors were found on the image.
170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
Image end offset: 11927552

$ ls -lhs /home/target/2/00
2.1G -rw-r--r--. 1 root root 100G Oct  2 01:14 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
Image end offset: 4822138880

Allocation decreased from 50% to 0.1%, but image end offset
decreased only from 5381423104 to 4822138880 (-10.5%).

I don't know if this is a behavior change in virt-sparsify or qemu or
it was always
like that.

We had an old and unused sparsifyVolume API in vdsm before 4.4. This did not use
--in-place and was very complicated because of this. But I think it
would work in this
case, since qemu-img convert will drop the unallocated areas.

For example after downloading the sparsified disk, we get:

$ qemu-img check download.qcow2
No errors were found on the image.
170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
Image end offset: 11927552

Kevin, is this the expected behavior or a bug in qemu?

The disk I tested is a single qcow2 image without the backing file, so
theoretically
qemu can deallocate all the discarded clusters.

Nir

[ovirt-devel] Re: Disk sizes not updated on unmap/discard

Nir Soffer