<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Sep 17, 2016 at 3:14 AM, Nir Soffer <span dir="ltr"><<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-">On Fri, Sep 16, 2016 at 10:49 AM, Michal Skrivanek <span dir="ltr"><<a href="mailto:michal.skrivanek@redhat.com" target="_blank">michal.skrivanek@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><br>
> On 15 Sep 2016, at 18:18, Nir Soffer <<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>> wrote:<br>
><br>
> Hi all,<br>
><br>
> Vdsm reports apparentsize and truesize disk stats [1] when getting<br>
> vms stats (every 15 seconds?). These values are update every<br>
> 60 seconds in vdsm.<br>
><br>
> To collect the values, we run risky storage apis in vdsm virt thread<br>
> pool, and we want to avoid this [2] since one slow or broken domain<br>
> can cause the entire virt thread pool to get stuck and cause vms<br>
> using other (healthy) storage domain to become non responsive.<br>
><br>
> These can also break block storage thin provisioned disks, since they<br>
> depend also on the virt thread pool. So one bad NFS storage domain<br>
> can cause vms using only block storage to be paused.<br>
><br>
> If both of these values are not used by anyone, we would like to<br>
> stop reporting them.<br>
<br>
</span>there are only 2 consumers of the monitoring, engine and mom.<br>
git grep reveals that “apparentsize" is used only for importing HE.<br>
“truesize" too, but additionally it is used in engine storage code as an actual size of the disk<br>
<span><br>
><br>
> If the values are used, we need to find a safer way to report them,<br>
> probably in storage thread pool, or maybe we can get these values<br>
> from libvirt using bulk sampling.<br>
<br>
</span>you could have been able to drop apparentsize right away, but the HE import code is expecting that field and won’t be happy if it is missing<br>
The monitoring code would work but fill in 0 for the actual size<br></blockquote><div><br></div></span><div>Returning always 0 can be nice prank for the storage team :-)</div><div><br></div><div>Looking in bulk stats, we already have the required info from libvirt:</div><div><br></div><div><div> {'bcfa00d3-78a7-40c9-990e-<wbr>5ffac8886ce0': {'balloon.current': 1048576L,</div><div> 'balloon.maximum': 1048576L,</div><div> 'block.0.allocation': 0L,</div><div> 'block.0.fl.reqs': 0L,</div><div> 'block.0.fl.times': 0L,</div><div> '<a href="http://block.0.name" target="_blank">block.0.name</a>': 'hdc',</div><div> 'block.0.physical': 0L,</div><div> 'block.0.rd.bytes': 152L,</div><div> 'block.0.rd.reqs': 4L,</div><div> 'block.0.rd.times': 539801L,</div><div> 'block.0.wr.bytes': 0L,</div><div> 'block.0.wr.reqs': 0L,</div><div> 'block.0.wr.times': 0L,</div><div> 'block.1.allocation': 131005952L,</div><div> 'block.1.capacity': 8589934592L,</div><div> 'block.1.fl.reqs': 68L,</div><div> 'block.1.fl.times': 1894725112L,</div><div> '<a href="http://block.1.name" target="_blank">block.1.name</a>': 'vda',</div><div> 'block.1.path': '/rhev/data-center/f9374c0e-<wbr>ae24-4bc1-a596-f61d5f05bc5f/<wbr>5f35b5c0-17d7-4475-9125-<wbr>e97f1cdb06f9/images/c54e7894-<wbr>b1dc-4f23-9ff5-1836259adc6d/<wbr>133db162-6c6a-4e82-baae-<wbr>9ae0e7e3885d',</div><div> 'block.1.physical': 1073741824L,</div><div> 'block.1.rd.bytes': 123849728L,</div><div> 'block.1.rd.reqs': 7979L,</div><div> 'block.1.rd.times': 10655381303L,</div><div> 'block.1.wr.bytes': 16762880L,</div><div> 'block.1.wr.reqs': 455L,</div><div> 'block.1.wr.times': 6021639149L,</div><div> 'block.2.allocation': 0L,</div><div> 'block.2.capacity': 21474836480L,</div><div> 'block.2.fl.reqs': 0L,</div><div> 'block.2.fl.times': 0L,</div><div> '<a href="http://block.2.name" target="_blank">block.2.name</a>': 'vdb',</div><div> 'block.2.path': '/rhev/data-center/f9374c0e-<wbr>ae24-4bc1-a596-f61d5f05bc5f/<wbr>bb85ee2f-d674-489f-9377-<wbr>3eb1f176e8fb/images/b59304f3-<wbr>d19d-40dd-9f04-8c2df37ef6d3/<wbr>4df47a96-8a1b-436e-8a3e-<wbr>3a638f119b48',</div><div> 'block.2.physical': 21474836480L,</div><div> 'block.2.rd.bytes': 1389056L,</div><div> 'block.2.rd.reqs': 331L,</div><div> 'block.2.rd.times': 160943568L,</div><div> 'block.2.wr.bytes': 0L,</div><div> 'block.2.wr.reqs': 0L,</div><div> 'block.2.wr.times': 0L,</div><div> 'block.count': 3,</div><div> 'cpu.system': 19090000000L,</div><div> 'cpu.time': 53480823390L,</div><div> 'cpu.user': 4650000000L,</div><div> '<a href="http://net.0.name" target="_blank">net.0.name</a>': 'vnet0',</div><div> 'net.0.rx.bytes': 2595857L,</div><div> 'net.0.rx.drop': 0L,</div><div> 'net.0.rx.errs': 0L,</div><div> 'net.0.rx.pkts': 39957L,</div><div> 'net.0.tx.bytes': 17041L,</div><div> 'net.0.tx.drop': 0L,</div><div> 'net.0.tx.errs': 0L,</div><div> 'net.0.tx.pkts': 177L,</div><div> 'net.count': 1,</div><div> 'state.reason': 1,</div><div> 'state.state': 1,</div><div> 'vcpu.0.state': 1,</div><div> 'vcpu.0.time': 43040000000L,</div><div> 'vcpu.0.wait': 0L,</div><div> 'vcpu.current': 1,</div><div> 'vcpu.maximum': 16}}</div></div><div><br></div><div>So we can extract the values from the stats cache matching them using drive.path.</div><div><br></div><div>We are already doing this for block.*.rd.bytes etc.</div><div><br></div><div>Francesco, what do you think?</div></div></div></div></blockquote><div><br></div><div>I check this in <a href="https://gerrit.ovirt.org/64093">https://gerrit.ovirt.org/64093</a>.</div><div><br></div><div>Unfortunately, we cannot use it, since libvirt allocation value</div><div>is not compatible with truesize.</div><div><br></div><div>truesize is:</div><div>- file storage: number of blocks * block size</div><div>- block storage: size of lv</div><div><br></div><div>Also allocation is available only if qemu has written something</div><div>to a volume. When starting a vm with a chain of volumes, all</div><div>volumes have allocation=0 except the top volume in the boot</div><div>disk, not very useful.</div><div><br></div><div>So we will have to use the storage apis that do the right thing</div><div>for the storage type, but run them in a way that cannot affect</div><div>unrelated vms.</div><div><br></div><div>Nir</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span><br>
><br>
> Please update if these values are used in engine/dwh.<br>
><br>
> [1] <a href="https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/vmstats.py#L364" rel="noreferrer" target="_blank">https://github.com/oVirt/vdsm/<wbr>blob/master/lib/vdsm/virt/vmst<wbr>ats.py#L364</a><br>
> [2] <a href="https://gerrit.ovirt.org/59801" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/59801</a><br>
><br>
> Nir<br>
</span>> ______________________________<wbr>_________________<br>
> Devel mailing list<br>
> <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br>
<br>
</blockquote></span></div><br></div></div>
</blockquote></div><br></div></div>