[ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks
Piotr Kliczewski
piotr.kliczewski at gmail.com
Fri Nov 14 07:52:38 UTC 2014
On Thu, Nov 13, 2014 at 9:35 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> ----- Original Message -----
>> From: "Francesco Romani" <fromani at redhat.com>
>> To: "engine-devel at ovirt.org" <devel at ovirt.org>
>> Sent: Thursday, November 13, 2014 6:47:04 PM
>> Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks
>>
>> ----- Original Message -----
>> > From: "Piotr Kliczewski" <piotr.kliczewski at gmail.com>
>> > To: "Francesco Romani" <fromani at redhat.com>
>> > Cc: "engine-devel at ovirt.org" <devel at ovirt.org>
>> > Sent: Monday, November 10, 2014 4:06:38 PM
>> > Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained
>> > benchmarks
>> >
>> > Great work!
>> >
>> > On Mon, Nov 10, 2014 at 3:50 PM, Francesco Romani <fromani at redhat.com>
>> > wrote:
>> > > Hi everyone,
>> > >
>> > > I'd like to share initial profiles of VDSM comparing
>> > > XMLRPC and JSONRPC, plus two patches optimizing the latter.
>> > >
>> > > While the low-level profiler (yappi) is now working nicely,
>> > > and this is a good building block, I'm not really satisfied about
>> > > the testcase - altough it is a fair approximation of reality;
>> > > most important, I'm not satisfied about benchmark driver, which
>> > > is too coarse grained.
>> > >
>> > > The benchmark driver is a simple python script using the REST APIs
>> > > of Engine.
>> > >
>> > > I'd like to be able to bypass Engine, but vdsClient is not good enough:
>> > > no JSONRPC option yet.
>> > >
>> >
>> > Once I have a bit o time will work on it.
>>
>> Well, it's on my list as well since long time so no pressure.
>>
>> I'm profiling four major flows at the moment, and improving tools along the
>> way.
>>
>> Here's my series, still being worked on and not yet ready for review (still
>> draft for this reason)
>>
>> http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:profile_runtime,n,z
>>
>> Here we have
>> 1. one patch to start/stop profiling at runtime using vdsClient
>
> Note that stopping yappi may segfault, as yappi is using unsafe iteration
> on Python internal link list without locking.
>
> If we add such option, it should work only from local connection.
>
>> 2. one simple (simplistic?) benchmark runner script
>>
>> more will come as need arises. A proper wiki page about how to profile VDSM
>> is planned as well.
>>
>> The flow which I'm testing are
>> 1. VM startup
>> 2. VM shutdown
>> 3. VM list
>> 4. VM getAllVmStats
>>
>> with focus on VDSM side, not all the flow; thus, I'm intentionally ruling out
>> Engine from the picture.
>>
>> 1 and 2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1155146
>> 3 and 4 because of https://bugzilla.redhat.com/show_bug.cgi?id=1110835
>>
>> So far, the profile of #2 was the more puzzling, as no clear bottleneck
>> emerged yet. A new deeper round
>> is scheduled ASAP.
>> I found optimizations that affect 1, 3, 4, especially with JSONRPC.
>> Unfortunately, I can't yet test
>> with vdsClient (see above). Will follow up on that ASAP.
>>
>> Patches, in order of readiness for review:
>>
>> http://gerrit.ovirt.org/#/c/35024/
>> http://gerrit.ovirt.org/#/c/34989/
>> http://gerrit.ovirt.org/#/c/35139 (DRAFT)
>> http://gerrit.ovirt.org/#/c/35140 (DRAFT)
>>
>> Let's see the effect of these in the #3 flows.
>> Scenario:
>> RHEL 6.5 on minidell running VDSM master + patches, with 150 VMs each one 1
>
> What is the cpu usage and load during the test? In previous tests you had nice
> graph with this data.
>
>> VCPU, 16 MB (megs) RAM and 1 GB HDD
>> on ISCSI. Run 300 times
>> "vdsClient -s 0 list table"
>
> This is not a good test - you just overload vdsm. Instead, run this verb in a more
> typical rate (like engine does) and limit the test by the running time, not by the
> number of calls.
>
> Running like this will show that the patched version needs less cpu and complete more
> calls during the test duration (e.g. 5 mintues).
>
>>
>> vanilla VDSM:
>> average=6985ms min=3220ms max=11880ms
>> response time: percentage
>> <3000ms: 0.0
>> <3500ms: 8.3
>> <4000ms: 18.3
>> <4500ms: 23.0
>> <5000ms: 27.7
>> <5500ms: 31.0
>> <6000ms: 36.3
>> <6500ms: 41.0
>> <7000ms: 49.0
>> <7500ms: 54.7
>> <8000ms: 59.0
>> <8500ms: 66.0
>> <9000ms: 71.7
>> <9500ms: 79.7
>> <10000ms: 88.3
>> <10500ms: 96.3
>> <11000ms: 99.0
>> <11500ms: 99.7
>> <12000ms: 100.0
>>
>> patched VDSM:
>
> What is "patched"? which patches are applied?
>
>> average=5761ms min=2830ms max=11450ms
>> response time: percentage
>> <3000ms: 0.3
>> <3500ms: 6.7
>> <4000ms: 20.7
>> <4500ms: 30.7
>> <5000ms: 39.7
>> <5500ms: 48.0
>> <6000ms: 56.3
>> <6500ms: 66.3
>> <7000ms: 72.3
>> <7500ms: 80.0
>> <8000ms: 87.7
>> <8500ms: 94.3
>> <9000ms: 97.7
>> <9500ms: 98.0
>> <10000ms: 98.7
>> <10500ms: 99.3
>> <11000ms: 99.7
>> <11500ms: 100.0
>> <12000ms: 100.0
>>
>> See attachment for an oocalc spreadsheat with raw samples and graph
>>
>> profiles.
>> Vanilla VDSM:
>> 243874010 function calls (349761179 primitive calls) in 2612.654
>> seconds
>>
>> Ordered by: internal time
>> List reduced from 883 to 20 due to restriction <20>
>>
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 196905/49111117 500.567 0.003 1002.618 0.000 copy.py:144(deepcopy)
>> 120996/4106203 217.146 0.002 998.929 0.000
>> copy.py:251(_deepcopy_dict)
>> 39443308 192.752 0.000 217.518 0.000 copy.py:261(_keep_alive)
>> 859 149.007 0.173 149.466 0.174
>> storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
>> 601/1891502 147.901 0.246 579.963 0.000
>> xmlrpclib.py:765(Marshaller.dump_struct)
>> 65493648 128.666 0.000 169.473 0.000 string.py:511(replace)
>> 7858586/32213416 95.784 0.000 209.567 0.000
>> encoder.py:284(JSONEncoder._iterencode)
>> 21831216 89.865 0.000 259.338 0.000 xmlrpclib.py:167(escape)
>> 7858587/20679836 80.741 0.000 196.755 0.000
>> encoder.py:213(JSONEncoder._iterencode_dict)
>> 601/12515060 66.632 0.111 579.970 0.000
>> xmlrpclib.py:683(Marshaller.__dump)
>> 227968 50.938 0.000 59.783 0.000
>> __init__.py:764(UserGroupEnforcingHandler.emit)
>> 155279 45.445 0.000 5.345 0.000 posixpath.py:94(splitext)
>> 133463 45.351 0.000 104.987 0.001 __init__.py:1175(Logger.handle)
>> 7290900 45.228 0.000 149.552 0.000
>> xmlrpclib.py:744(Marshaller.dump_unicode)
>> 601 43.538 0.072 1.056 0.002
>> socket.py:196(_socketobject.accept)
>> 35095608 31.967 0.000 31.967 0.000 copy.py:197(_deepcopy_atomic)
>> 50498 24.912 0.000 55.671 0.001 inspect.py:247(getmembers)
>> 7479142/8217142 24.764 0.000 150.280 0.000
>> encoder.py:180(JSONEncoder._iterencode_list)
>> 75750 19.871 0.000 75.819 0.001
>> vm.py:2271(Vm._getRunningVmStats)
>> 4447686 17.603 0.000 17.603 0.000
>> sampling.py:323(AdvancedStatsFunction.interval)
>>
>>
>> Patched VDSM:
>> 165586271 function calls (181657572 primitive calls) in 2406.080
>> seconds
>>
>> Ordered by: internal time
>> List reduced from 1045 to 20 due to restriction <20>
>>
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 51229/51594 198.728 0.004 184.402 0.004
>> Monitor.py:182(GuestMonitor._should_run)
>> 234244 183.784 0.001 570.998 0.002
>> vm.py:2274(Vm._getRunningVmStats)
>> 647/1891502 173.119 0.268 671.222 0.000
>> xmlrpclib.py:765(Marshaller.dump_struct)
>> 66247525/66247536 150.364 0.000 192.790 0.000 string.py:511(replace)
>> 653 118.867 0.182 119.219 0.183
>> storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
>> 22052164/22082512 90.992 0.000 294.362 0.000
>> xmlrpclib.py:167(escape)
>> 637/12640708 75.551 0.119 671.442 0.000
>> xmlrpclib.py:683(Marshaller.__dump)
>> 5096229/5096259 66.229 0.000 64.007 0.000
>> utils.py:296(convertToStr)
>> 7200900 49.677 0.000 161.784 0.000
>> xmlrpclib.py:744(Marshaller.dump_unicode)
>> 1318320 39.141 0.000 39.141 0.000 vm.py:610(compute_latency)
>> 57937/57940 37.975 0.001 80.458 0.001 inspect.py:247(getmembers)
>> 234244 36.289 0.000 155.929 0.001
>> vm.py:519(VmStatsThread._getDiskStats)
>> 234244 35.013 0.000 46.974 0.000
>> vm.py:2359(Vm._getGraphicsStats)
>> 365102 24.819 0.000 106.858 0.000 glob.py:18(iglob)
>> 234244 23.588 0.000 32.139 0.000
>> vm.py:402(VmStatsThread._getBalloonStats)
>> 439440 22.295 0.000 22.295 0.000 vm.py:595(_calcDiskRate)
>> 439440 22.178 0.000 61.320 0.000 vm.py:607(_calcDiskLatency)
>> 834132 21.402 0.000 21.402 0.000 vm.py:1311(Vm._get_lastStatus)
>> 1071684 19.455 0.000 24.374 0.000 pthreading.py:77(Lock.release)
>> 2961805 18.907 0.000 61.027 0.000
>> xmlrpclib.py:737(Marshaller.dump_string)
>>
>
> This look strange:
>
> vanila vdsm: 75750 19.871 0.000 75.819 0.001 vm.py:2271(Vm._getRunningVmStats)
> patched vdsm: 234244 183.784 0.001 570.998 0.002 vm.py:2274(Vm._getRunningVmStats)
>
> The second profile is doing about 3 times more Vm._getRunningVmStats calls,
> which takes 10 times more cpu time(?!).
>
> We don't have any info on the load - maybe vdsm is overloaded in both runs, which
> can explain why it does 1/3 of the work vs the patched version, but the patched
> version is probably overloaded as well, which can explain the much higher
> cpu time.
>
> I suggest to check the cpu usage, and do *not* make profiles where vdsm uses more
> then 100% cpu.
>
> Lets profile much lower load to understand where time is taken.
>
>> Open points:
>> * profile remaining flows again and report like this (in progress for
>> getAllVmStats)
>> * add wiki page
>> * find out why there is a so high minimum price for list verb
>> * profile JSONRPC (needs patches to vdsClient)
>
> Comparing jsonrpc would be very interesting - but we don't have a jsonrpc
> client yet, so you better use the engine for this intead of vdsclient.
>
I can help to reduce engine noise so we can profiles using both protocols.
>
> Nir
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
More information about the Devel
mailing list