[ovirt-devel] [VDSM] scalable sampling benchmark

Fri Aug 1 13:10:08 UTC 2014

----- Original Message -----
> From: "Francesco Romani" <fromani at redhat.com>
> To: devel at ovirt.org
> Sent: Friday, August 1, 2014 2:07:00 PM
> Subject: [ovirt-devel] [VDSM] scalable sampling benchmark
> 
> Hi everyone,
> 
> here's the followup for
> http://lists.ovirt.org/pipermail/devel/2014-July/008332.html
> 
> Platform setup (HW/SW/storage) is described here.
> 
> Let's start with the RHEL 6.5 graphs, as I need to recheck how
> scalable_sampling
> behaves on RHEL7.
> These are the first graphs on 100 VMs, more will come for 200 and 400 VMs.
> 
> What was tested here is the scalable_sampling branch considering
> * gerrit.ovirt.org/#/c/29980/13 (last version)
> * http://gerrit.ovirt.org/#/c/30751/1 was included
> 
> find attached graphs for CPU and memory profiling.
> Some stats on RHEL 6.5:
> 
> master cpu usage:
>                         samples% below
>             average%    10%     25%     50%     75%
> libvirt     74.101      0.083   0.083   3.172   52.922
> vdsm        44.211	3.506   33.556  70.618  84.641
> total       30.504	0.000   9.599   99.750  100.000
> 
> scalable_sampling cpu usage:
> 
>                         samples% below
>             average%    10%     25%     50%     75%
> libvirt     58.835	0.000   0.591   28.270  86.160

I wonder if we are using libvirt correctly - maybe our thread pool is too
small, keeping tasks in our queue, instead of letting libvirt process
them concurrently?

Can you check the size of the libvirt thread pool, and increase our sampling
pool to the same value?

> vdsm        65.146	0.084   10.549  49.030  71.055

This looks 47% worse then the current code. We need a profile to understand why.

Are you sure that we are doing the same amount of samplers in both cases?
Did you compare the logs?

Maybe we should create simpler standalone benchmark simulating what vdsm does,
hopefully yappi will not crash using it, and if it does, the benchmark can
be used by yappi developers to fix this.

> total       29.390	0.000   24.473  99.325  100.000
> 
> memory usage (RSS, mmegabytes), in numbers:
> 
>                     average     minimum     maximum
> master              262         254         264
> scalable_sampling   143         133         147

So this seems to save lot of memory, but cost in more cpu. I wonder if this
is better - systems have huge amount of memory, but much limited cpus.

Before 3.5, we were using gigabytes of memory for the remote file handlers
on NFS (replaced by ioprocess), and I never heard anyone complaining about
it, but we had lot of issues with excessive cpu usage.

Lets see how this look with 200 vms and with rhel 7 or fedora.

Nir