[ovirt-devel] Reporting vm disk stats "truesize" and "apparentsize"?
Michal Skrivanek
michal.skrivanek at redhat.com
Mon Sep 19 08:50:39 UTC 2016
> On 18 Sep 2016, at 18:16, Nir Soffer <nsoffer at redhat.com> wrote:
>
> On Sat, Sep 17, 2016 at 3:14 AM, Nir Soffer <nsoffer at redhat.com <mailto:nsoffer at redhat.com>> wrote:
> On Fri, Sep 16, 2016 at 10:49 AM, Michal Skrivanek <michal.skrivanek at redhat.com <mailto:michal.skrivanek at redhat.com>> wrote:
>
> > On 15 Sep 2016, at 18:18, Nir Soffer <nsoffer at redhat.com <mailto:nsoffer at redhat.com>> wrote:
> >
> > Hi all,
> >
> > Vdsm reports apparentsize and truesize disk stats [1] when getting
> > vms stats (every 15 seconds?). These values are update every
> > 60 seconds in vdsm.
> >
> > To collect the values, we run risky storage apis in vdsm virt thread
> > pool, and we want to avoid this [2] since one slow or broken domain
> > can cause the entire virt thread pool to get stuck and cause vms
> > using other (healthy) storage domain to become non responsive.
> >
> > These can also break block storage thin provisioned disks, since they
> > depend also on the virt thread pool. So one bad NFS storage domain
> > can cause vms using only block storage to be paused.
> >
> > If both of these values are not used by anyone, we would like to
> > stop reporting them.
>
> there are only 2 consumers of the monitoring, engine and mom.
> git grep reveals that “apparentsize" is used only for importing HE.
> “truesize" too, but additionally it is used in engine storage code as an actual size of the disk
>
> >
> > If the values are used, we need to find a safer way to report them,
> > probably in storage thread pool, or maybe we can get these values
> > from libvirt using bulk sampling.
>
> you could have been able to drop apparentsize right away, but the HE import code is expecting that field and won’t be happy if it is missing
> The monitoring code would work but fill in 0 for the actual size
>
> Returning always 0 can be nice prank for the storage team :-)
what about that apparentsize? that seems to be unused except for HE import
>
> Looking in bulk stats, we already have the required info from libvirt:
>
> {'bcfa00d3-78a7-40c9-990e-5ffac8886ce0': {'balloon.current': 1048576L,
> 'balloon.maximum': 1048576L,
> 'block.0.allocation': 0L,
> 'block.0.fl.reqs': 0L,
> 'block.0.fl.times': 0L,
> 'block.0.name <http://block.0.name/>': 'hdc',
> 'block.0.physical': 0L,
> 'block.0.rd.bytes': 152L,
> 'block.0.rd.reqs': 4L,
> 'block.0.rd.times': 539801L,
> 'block.0.wr.bytes': 0L,
> 'block.0.wr.reqs': 0L,
> 'block.0.wr.times': 0L,
> 'block.1.allocation': 131005952L,
> 'block.1.capacity': 8589934592L,
> 'block.1.fl.reqs': 68L,
> 'block.1.fl.times': 1894725112L,
> 'block.1.name <http://block.1.name/>': 'vda',
> 'block.1.path': '/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/5f35b5c0-17d7-4475-9125-e97f1cdb06f9/images/c54e7894-b1dc-4f23-9ff5-1836259adc6d/133db162-6c6a-4e82-baae-9ae0e7e3885d',
> 'block.1.physical': 1073741824L,
> 'block.1.rd.bytes': 123849728L,
> 'block.1.rd.reqs': 7979L,
> 'block.1.rd.times': 10655381303L,
> 'block.1.wr.bytes': 16762880L,
> 'block.1.wr.reqs': 455L,
> 'block.1.wr.times': 6021639149L,
> 'block.2.allocation': 0L,
> 'block.2.capacity': 21474836480L,
> 'block.2.fl.reqs': 0L,
> 'block.2.fl.times': 0L,
> 'block.2.name <http://block.2.name/>': 'vdb',
> 'block.2.path': '/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/bb85ee2f-d674-489f-9377-3eb1f176e8fb/images/b59304f3-d19d-40dd-9f04-8c2df37ef6d3/4df47a96-8a1b-436e-8a3e-3a638f119b48',
> 'block.2.physical': 21474836480L,
> 'block.2.rd.bytes': 1389056L,
> 'block.2.rd.reqs': 331L,
> 'block.2.rd.times': 160943568L,
> 'block.2.wr.bytes': 0L,
> 'block.2.wr.reqs': 0L,
> 'block.2.wr.times': 0L,
> 'block.count': 3,
> 'cpu.system': 19090000000L,
> 'cpu.time': 53480823390L,
> 'cpu.user': 4650000000L,
> 'net.0.name <http://net.0.name/>': 'vnet0',
> 'net.0.rx.bytes': 2595857L,
> 'net.0.rx.drop': 0L,
> 'net.0.rx.errs': 0L,
> 'net.0.rx.pkts': 39957L,
> 'net.0.tx.bytes': 17041L,
> 'net.0.tx.drop': 0L,
> 'net.0.tx.errs': 0L,
> 'net.0.tx.pkts': 177L,
> 'net.count': 1,
> 'state.reason': 1,
> 'state.state': 1,
> 'vcpu.0.state': 1,
> 'vcpu.0.time': 43040000000L,
> 'vcpu.0.wait': 0L,
> 'vcpu.current': 1,
> 'vcpu.maximum': 16}}
>
> So we can extract the values from the stats cache matching them using drive.path.
>
> We are already doing this for block.*.rd.bytes etc.
>
> Francesco, what do you think?
>
> I check this in https://gerrit.ovirt.org/64093 <https://gerrit.ovirt.org/64093>.
>
> Unfortunately, we cannot use it, since libvirt allocation value
> is not compatible with truesize.
>
> truesize is:
> - file storage: number of blocks * block size
> - block storage: size of lv
>
> Also allocation is available only if qemu has written something
> to a volume. When starting a vm with a chain of volumes, all
> volumes have allocation=0 except the top volume in the boot
> disk, not very useful.
>
> So we will have to use the storage apis that do the right thing
> for the storage type, but run them in a way that cannot affect
> unrelated vms.
>
> Nir
>
>
>
>
> >
> > Please update if these values are used in engine/dwh.
> >
> > [1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/vmstats.py#L364 <https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/vmstats.py#L364>
> > [2] https://gerrit.ovirt.org/59801 <https://gerrit.ovirt.org/59801>
> >
> > Nir
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org <mailto:Devel at ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20160919/0d227a25/attachment-0001.html>
More information about the Devel
mailing list