[ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks

Fri Nov 14 07:52:38 UTC 2014

On Thu, Nov 13, 2014 at 9:35 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> ----- Original Message -----
>> From: "Francesco Romani" <fromani at redhat.com>
>> To: "engine-devel at ovirt.org" <devel at ovirt.org>
>> Sent: Thursday, November 13, 2014 6:47:04 PM
>> Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks
>>
>> ----- Original Message -----
>> > From: "Piotr Kliczewski" <piotr.kliczewski at gmail.com>
>> > To: "Francesco Romani" <fromani at redhat.com>
>> > Cc: "engine-devel at ovirt.org" <devel at ovirt.org>
>> > Sent: Monday, November 10, 2014 4:06:38 PM
>> > Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained
>> > benchmarks
>> >
>> > Great work!
>> >
>> > On Mon, Nov 10, 2014 at 3:50 PM, Francesco Romani <fromani at redhat.com>
>> > wrote:
>> > > Hi everyone,
>> > >
>> > > I'd like to share initial profiles of VDSM comparing
>> > > XMLRPC and JSONRPC, plus two patches optimizing the latter.
>> > >
>> > > While the low-level profiler (yappi) is now working nicely,
>> > > and this is a good building block, I'm not really satisfied about
>> > > the testcase - altough it is a fair approximation of reality;
>> > > most important, I'm not satisfied about benchmark driver, which
>> > > is too coarse grained.
>> > >
>> > > The benchmark driver is a simple python script using the REST APIs
>> > > of Engine.
>> > >
>> > > I'd like to be able to bypass Engine, but vdsClient is not good enough:
>> > > no JSONRPC option yet.
>> > >
>> >
>> > Once I have a bit o time will work on it.
>>
>> Well, it's on my list as well since long time so no pressure.
>>
>> I'm profiling four major flows at the moment, and improving tools along the
>> way.
>>
>> Here's my series, still being worked on and not yet ready for review (still
>> draft for this reason)
>>
>> http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:profile_runtime,n,z
>>
>> Here we have
>> 1. one patch to start/stop profiling at runtime using vdsClient
>
> Note that stopping yappi may segfault, as yappi is using unsafe iteration
> on Python internal link list without locking.
>
> If we add such option, it should work only from local connection.
>
>> 2. one simple (simplistic?) benchmark runner script
>>
>> more will come as need arises. A proper wiki page about how to profile VDSM
>> is planned as well.
>>
>> The flow which I'm testing are
>> 1. VM startup
>> 2. VM shutdown
>> 3. VM list
>> 4. VM getAllVmStats
>>
>> with focus on VDSM side, not all the flow; thus, I'm intentionally ruling out
>> Engine from the picture.
>>
>> 1 and 2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1155146
>> 3 and 4 because of https://bugzilla.redhat.com/show_bug.cgi?id=1110835
>>
>> So far, the profile of #2 was the more puzzling, as no clear bottleneck
>> emerged yet. A new deeper round
>> is scheduled ASAP.
>> I found optimizations that affect 1, 3, 4, especially with JSONRPC.
>> Unfortunately, I can't yet test
>> with vdsClient (see above). Will follow up on that ASAP.
>>
>> Patches, in order of readiness for review:
>>
>> http://gerrit.ovirt.org/#/c/35024/
>> http://gerrit.ovirt.org/#/c/34989/
>> http://gerrit.ovirt.org/#/c/35139 (DRAFT)
>> http://gerrit.ovirt.org/#/c/35140 (DRAFT)
>>
>> Let's see the effect of these in the #3 flows.
>> Scenario:
>> RHEL 6.5 on minidell running VDSM master + patches, with 150 VMs each one 1
>
> What is the cpu usage and load during the test? In previous tests you had nice
> graph with this data.
>
>> VCPU, 16 MB (megs) RAM and 1 GB HDD
>> on ISCSI. Run 300 times
>> "vdsClient -s 0 list table"
>
> This is not a good test - you just overload vdsm. Instead, run this verb in a more
> typical rate (like engine does) and limit the test by the running time, not by the
> number of calls.
>
> Running like this will show that the patched version needs less cpu and complete more
> calls during the test duration (e.g. 5 mintues).
>
>>
>> vanilla VDSM:
>> average=6985ms min=3220ms max=11880ms
>> response time: percentage
>> <3000ms: 0.0
>> <3500ms: 8.3
>> <4000ms: 18.3
>> <4500ms: 23.0
>> <5000ms: 27.7
>> <5500ms: 31.0
>> <6000ms: 36.3
>> <6500ms: 41.0
>> <7000ms: 49.0
>> <7500ms: 54.7
>> <8000ms: 59.0
>> <8500ms: 66.0
>> <9000ms: 71.7
>> <9500ms: 79.7
>> <10000ms: 88.3
>> <10500ms: 96.3
>> <11000ms: 99.0
>> <11500ms: 99.7
>> <12000ms: 100.0
>>
>> patched VDSM:
>
> What is "patched"? which patches are applied?
>
>> average=5761ms min=2830ms max=11450ms
>> response time: percentage
>> <3000ms: 0.3
>> <3500ms: 6.7
>> <4000ms: 20.7
>> <4500ms: 30.7
>> <5000ms: 39.7
>> <5500ms: 48.0
>> <6000ms: 56.3
>> <6500ms: 66.3
>> <7000ms: 72.3
>> <7500ms: 80.0
>> <8000ms: 87.7
>> <8500ms: 94.3
>> <9000ms: 97.7
>> <9500ms: 98.0
>> <10000ms: 98.7
>> <10500ms: 99.3
>> <11000ms: 99.7
>> <11500ms: 100.0
>> <12000ms: 100.0
>>
>> See attachment for an oocalc spreadsheat with raw samples and graph
>>
>> profiles.
>> Vanilla VDSM:
>>          243874010 function calls (349761179 primitive calls) in 2612.654
>>          seconds
>>
>>    Ordered by: internal time
>>    List reduced from 883 to 20 due to restriction <20>
>>
>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>> 196905/49111117  500.567    0.003 1002.618    0.000 copy.py:144(deepcopy)
>> 120996/4106203  217.146    0.002  998.929    0.000
>> copy.py:251(_deepcopy_dict)
>>  39443308  192.752    0.000  217.518    0.000 copy.py:261(_keep_alive)
>>       859  149.007    0.173  149.466    0.174
>>       storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
>> 601/1891502  147.901    0.246  579.963    0.000
>> xmlrpclib.py:765(Marshaller.dump_struct)
>>  65493648  128.666    0.000  169.473    0.000 string.py:511(replace)
>> 7858586/32213416   95.784    0.000  209.567    0.000
>> encoder.py:284(JSONEncoder._iterencode)
>>  21831216   89.865    0.000  259.338    0.000 xmlrpclib.py:167(escape)
>> 7858587/20679836   80.741    0.000  196.755    0.000
>> encoder.py:213(JSONEncoder._iterencode_dict)
>> 601/12515060   66.632    0.111  579.970    0.000
>> xmlrpclib.py:683(Marshaller.__dump)
>>    227968   50.938    0.000   59.783    0.000
>>    __init__.py:764(UserGroupEnforcingHandler.emit)
>>    155279   45.445    0.000    5.345    0.000 posixpath.py:94(splitext)
>>    133463   45.351    0.000  104.987    0.001 __init__.py:1175(Logger.handle)
>>   7290900   45.228    0.000  149.552    0.000
>>   xmlrpclib.py:744(Marshaller.dump_unicode)
>>       601   43.538    0.072    1.056    0.002
>>       socket.py:196(_socketobject.accept)
>>  35095608   31.967    0.000   31.967    0.000 copy.py:197(_deepcopy_atomic)
>>     50498   24.912    0.000   55.671    0.001 inspect.py:247(getmembers)
>> 7479142/8217142   24.764    0.000  150.280    0.000
>> encoder.py:180(JSONEncoder._iterencode_list)
>>     75750   19.871    0.000   75.819    0.001
>>     vm.py:2271(Vm._getRunningVmStats)
>>   4447686   17.603    0.000   17.603    0.000
>>   sampling.py:323(AdvancedStatsFunction.interval)
>>
>>
>> Patched VDSM:
>>          165586271 function calls (181657572 primitive calls) in 2406.080
>>          seconds
>>
>>    Ordered by: internal time
>>    List reduced from 1045 to 20 due to restriction <20>
>>
>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>> 51229/51594  198.728    0.004  184.402    0.004
>> Monitor.py:182(GuestMonitor._should_run)
>>    234244  183.784    0.001  570.998    0.002
>>    vm.py:2274(Vm._getRunningVmStats)
>> 647/1891502  173.119    0.268  671.222    0.000
>> xmlrpclib.py:765(Marshaller.dump_struct)
>> 66247525/66247536  150.364    0.000  192.790    0.000 string.py:511(replace)
>>       653  118.867    0.182  119.219    0.183
>>       storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
>> 22052164/22082512   90.992    0.000  294.362    0.000
>> xmlrpclib.py:167(escape)
>> 637/12640708   75.551    0.119  671.442    0.000
>> xmlrpclib.py:683(Marshaller.__dump)
>> 5096229/5096259   66.229    0.000   64.007    0.000
>> utils.py:296(convertToStr)
>>   7200900   49.677    0.000  161.784    0.000
>>   xmlrpclib.py:744(Marshaller.dump_unicode)
>>   1318320   39.141    0.000   39.141    0.000 vm.py:610(compute_latency)
>> 57937/57940   37.975    0.001   80.458    0.001 inspect.py:247(getmembers)
>>    234244   36.289    0.000  155.929    0.001
>>    vm.py:519(VmStatsThread._getDiskStats)
>>    234244   35.013    0.000   46.974    0.000
>>    vm.py:2359(Vm._getGraphicsStats)
>>    365102   24.819    0.000  106.858    0.000 glob.py:18(iglob)
>>    234244   23.588    0.000   32.139    0.000
>>    vm.py:402(VmStatsThread._getBalloonStats)
>>    439440   22.295    0.000   22.295    0.000 vm.py:595(_calcDiskRate)
>>    439440   22.178    0.000   61.320    0.000 vm.py:607(_calcDiskLatency)
>>    834132   21.402    0.000   21.402    0.000 vm.py:1311(Vm._get_lastStatus)
>>   1071684   19.455    0.000   24.374    0.000 pthreading.py:77(Lock.release)
>>   2961805   18.907    0.000   61.027    0.000
>>   xmlrpclib.py:737(Marshaller.dump_string)
>>
>
> This look strange:
>
> vanila vdsm:   75750   19.871    0.000   75.819    0.001  vm.py:2271(Vm._getRunningVmStats)
> patched vdsm: 234244  183.784    0.001  570.998    0.002  vm.py:2274(Vm._getRunningVmStats)
>
> The second profile is doing about 3 times more Vm._getRunningVmStats calls,
> which takes 10 times more cpu time(?!).
>
> We don't have any info on the load - maybe vdsm is overloaded in both runs, which
> can explain why it does 1/3 of the work vs the patched version, but the patched
> version is probably overloaded as well, which can explain the much higher
> cpu time.
>
> I suggest to check the cpu usage, and do *not* make profiles where vdsm uses more
> then 100% cpu.
>
> Lets profile much lower load to understand where time is taken.
>
>> Open points:
>> * profile remaining flows again and report like this (in progress for
>> getAllVmStats)
>> * add wiki page
>> * find out why there is a so high minimum price for list verb
>> * profile JSONRPC (needs patches to vdsClient)
>
> Comparing jsonrpc would be very interesting - but we don't have a jsonrpc
> client yet, so you better use the engine for this intead of vdsclient.
>

I can help to reduce engine noise so we can profiles using both protocols.

>
> Nir
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel