[ovirt-devel] Reporting vm disk stats "truesize" and "apparentsize"?

Nir Soffer nsoffer at redhat.com
Sun Sep 18 16:16:15 UTC 2016


On Sat, Sep 17, 2016 at 3:14 AM, Nir Soffer <nsoffer at redhat.com> wrote:

> On Fri, Sep 16, 2016 at 10:49 AM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
>
>>
>> > On 15 Sep 2016, at 18:18, Nir Soffer <nsoffer at redhat.com> wrote:
>> >
>> > Hi all,
>> >
>> > Vdsm reports apparentsize and truesize disk stats [1] when getting
>> > vms stats (every 15 seconds?). These values are update every
>> > 60 seconds in vdsm.
>> >
>> > To collect the values, we run risky storage apis in vdsm virt thread
>> > pool, and we want to avoid this [2] since one slow or broken domain
>> > can cause the entire virt thread pool to get stuck and cause vms
>> > using other (healthy) storage domain to become non responsive.
>> >
>> > These can also break block storage thin provisioned disks, since they
>> > depend also on the virt thread pool. So one bad NFS storage domain
>> > can cause vms using only block storage to be paused.
>> >
>> > If both of these values are not used by anyone, we would like to
>> > stop reporting them.
>>
>> there are only 2 consumers of the monitoring, engine and mom.
>> git grep reveals that “apparentsize" is used only for importing HE.
>> “truesize" too, but additionally it is used in engine storage code as an
>> actual size of the disk
>>
>> >
>> > If the values are used, we need to find a safer way to report them,
>> > probably in storage thread pool, or maybe we can get these values
>> > from libvirt using bulk sampling.
>>
>> you could have been able to drop apparentsize right away, but the HE
>> import code is expecting that field and won’t be happy if it is missing
>> The monitoring code would work but fill in 0 for the actual size
>>
>
> Returning always 0 can be nice prank for the storage team :-)
>
> Looking in bulk stats, we already have the required info from libvirt:
>
>  {'bcfa00d3-78a7-40c9-990e-5ffac8886ce0': {'balloon.current': 1048576L,
>                                           'balloon.maximum': 1048576L,
>                                           'block.0.allocation': 0L,
>                                           'block.0.fl.reqs': 0L,
>                                           'block.0.fl.times': 0L,
>                                           'block.0.name': 'hdc',
>                                           'block.0.physical': 0L,
>                                           'block.0.rd.bytes': 152L,
>                                           'block.0.rd.reqs': 4L,
>                                           'block.0.rd.times': 539801L,
>                                           'block.0.wr.bytes': 0L,
>                                           'block.0.wr.reqs': 0L,
>                                           'block.0.wr.times': 0L,
>                                           'block.1.allocation': 131005952L,
>                                           'block.1.capacity': 8589934592L,
>                                           'block.1.fl.reqs': 68L,
>                                           'block.1.fl.times': 1894725112L,
>                                           'block.1.name': 'vda',
>                                           'block.1.path':
> '/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/
> 5f35b5c0-17d7-4475-9125-e97f1cdb06f9/images/c54e7894-
> b1dc-4f23-9ff5-1836259adc6d/133db162-6c6a-4e82-baae-9ae0e7e3885d',
>                                           'block.1.physical': 1073741824L,
>                                           'block.1.rd.bytes': 123849728L,
>                                           'block.1.rd.reqs': 7979L,
>                                           'block.1.rd.times': 10655381303L,
>                                           'block.1.wr.bytes': 16762880L,
>                                           'block.1.wr.reqs': 455L,
>                                           'block.1.wr.times': 6021639149L,
>                                           'block.2.allocation': 0L,
>                                           'block.2.capacity': 21474836480L,
>                                           'block.2.fl.reqs': 0L,
>                                           'block.2.fl.times': 0L,
>                                           'block.2.name': 'vdb',
>                                           'block.2.path':
> '/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/
> bb85ee2f-d674-489f-9377-3eb1f176e8fb/images/b59304f3-
> d19d-40dd-9f04-8c2df37ef6d3/4df47a96-8a1b-436e-8a3e-3a638f119b48',
>                                           'block.2.physical': 21474836480L,
>                                           'block.2.rd.bytes': 1389056L,
>                                           'block.2.rd.reqs': 331L,
>                                           'block.2.rd.times': 160943568L,
>                                           'block.2.wr.bytes': 0L,
>                                           'block.2.wr.reqs': 0L,
>                                           'block.2.wr.times': 0L,
>                                           'block.count': 3,
>                                           'cpu.system': 19090000000L,
>                                           'cpu.time': 53480823390L,
>                                           'cpu.user': 4650000000L,
>                                           'net.0.name': 'vnet0',
>                                           'net.0.rx.bytes': 2595857L,
>                                           'net.0.rx.drop': 0L,
>                                           'net.0.rx.errs': 0L,
>                                           'net.0.rx.pkts': 39957L,
>                                           'net.0.tx.bytes': 17041L,
>                                           'net.0.tx.drop': 0L,
>                                           'net.0.tx.errs': 0L,
>                                           'net.0.tx.pkts': 177L,
>                                           'net.count': 1,
>                                           'state.reason': 1,
>                                           'state.state': 1,
>                                           'vcpu.0.state': 1,
>                                           'vcpu.0.time': 43040000000L,
>                                           'vcpu.0.wait': 0L,
>                                           'vcpu.current': 1,
>                                           'vcpu.maximum': 16}}
>
> So we can extract the values from the stats cache matching them using
> drive.path.
>
> We are already doing this for block.*.rd.bytes etc.
>
> Francesco, what do you think?
>

I check this in https://gerrit.ovirt.org/64093.

Unfortunately, we cannot use it, since libvirt allocation value
is not compatible with truesize.

truesize is:
- file storage: number of blocks * block size
- block storage: size of lv

Also allocation is available only if qemu has written something
to a volume. When starting a vm with a chain of volumes, all
volumes have allocation=0 except the top volume in the boot
disk, not very useful.

So we will have to use the storage apis that do the right thing
for the storage type, but run them in a way that cannot affect
unrelated vms.

Nir


>
>
>
>>
>> >
>> > Please update if these values are used in engine/dwh.
>> >
>> > [1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/vmst
>> ats.py#L364
>> > [2] https://gerrit.ovirt.org/59801
>> >
>> > Nir
>> > _______________________________________________
>> > Devel mailing list
>> > Devel at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20160918/cf70ce7f/attachment-0001.html>


More information about the Devel mailing list