[ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks
Nir Soffer
nsoffer at redhat.com
Thu Nov 13 20:35:30 UTC 2014
----- Original Message -----
> From: "Francesco Romani" <fromani at redhat.com>
> To: "engine-devel at ovirt.org" <devel at ovirt.org>
> Sent: Thursday, November 13, 2014 6:47:04 PM
> Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks
>
> ----- Original Message -----
> > From: "Piotr Kliczewski" <piotr.kliczewski at gmail.com>
> > To: "Francesco Romani" <fromani at redhat.com>
> > Cc: "engine-devel at ovirt.org" <devel at ovirt.org>
> > Sent: Monday, November 10, 2014 4:06:38 PM
> > Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained
> > benchmarks
> >
> > Great work!
> >
> > On Mon, Nov 10, 2014 at 3:50 PM, Francesco Romani <fromani at redhat.com>
> > wrote:
> > > Hi everyone,
> > >
> > > I'd like to share initial profiles of VDSM comparing
> > > XMLRPC and JSONRPC, plus two patches optimizing the latter.
> > >
> > > While the low-level profiler (yappi) is now working nicely,
> > > and this is a good building block, I'm not really satisfied about
> > > the testcase - altough it is a fair approximation of reality;
> > > most important, I'm not satisfied about benchmark driver, which
> > > is too coarse grained.
> > >
> > > The benchmark driver is a simple python script using the REST APIs
> > > of Engine.
> > >
> > > I'd like to be able to bypass Engine, but vdsClient is not good enough:
> > > no JSONRPC option yet.
> > >
> >
> > Once I have a bit o time will work on it.
>
> Well, it's on my list as well since long time so no pressure.
>
> I'm profiling four major flows at the moment, and improving tools along the
> way.
>
> Here's my series, still being worked on and not yet ready for review (still
> draft for this reason)
>
> http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:profile_runtime,n,z
>
> Here we have
> 1. one patch to start/stop profiling at runtime using vdsClient
Note that stopping yappi may segfault, as yappi is using unsafe iteration
on Python internal link list without locking.
If we add such option, it should work only from local connection.
> 2. one simple (simplistic?) benchmark runner script
>
> more will come as need arises. A proper wiki page about how to profile VDSM
> is planned as well.
>
> The flow which I'm testing are
> 1. VM startup
> 2. VM shutdown
> 3. VM list
> 4. VM getAllVmStats
>
> with focus on VDSM side, not all the flow; thus, I'm intentionally ruling out
> Engine from the picture.
>
> 1 and 2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1155146
> 3 and 4 because of https://bugzilla.redhat.com/show_bug.cgi?id=1110835
>
> So far, the profile of #2 was the more puzzling, as no clear bottleneck
> emerged yet. A new deeper round
> is scheduled ASAP.
> I found optimizations that affect 1, 3, 4, especially with JSONRPC.
> Unfortunately, I can't yet test
> with vdsClient (see above). Will follow up on that ASAP.
>
> Patches, in order of readiness for review:
>
> http://gerrit.ovirt.org/#/c/35024/
> http://gerrit.ovirt.org/#/c/34989/
> http://gerrit.ovirt.org/#/c/35139 (DRAFT)
> http://gerrit.ovirt.org/#/c/35140 (DRAFT)
>
> Let's see the effect of these in the #3 flows.
> Scenario:
> RHEL 6.5 on minidell running VDSM master + patches, with 150 VMs each one 1
What is the cpu usage and load during the test? In previous tests you had nice
graph with this data.
> VCPU, 16 MB (megs) RAM and 1 GB HDD
> on ISCSI. Run 300 times
> "vdsClient -s 0 list table"
This is not a good test - you just overload vdsm. Instead, run this verb in a more
typical rate (like engine does) and limit the test by the running time, not by the
number of calls.
Running like this will show that the patched version needs less cpu and complete more
calls during the test duration (e.g. 5 mintues).
>
> vanilla VDSM:
> average=6985ms min=3220ms max=11880ms
> response time: percentage
> <3000ms: 0.0
> <3500ms: 8.3
> <4000ms: 18.3
> <4500ms: 23.0
> <5000ms: 27.7
> <5500ms: 31.0
> <6000ms: 36.3
> <6500ms: 41.0
> <7000ms: 49.0
> <7500ms: 54.7
> <8000ms: 59.0
> <8500ms: 66.0
> <9000ms: 71.7
> <9500ms: 79.7
> <10000ms: 88.3
> <10500ms: 96.3
> <11000ms: 99.0
> <11500ms: 99.7
> <12000ms: 100.0
>
> patched VDSM:
What is "patched"? which patches are applied?
> average=5761ms min=2830ms max=11450ms
> response time: percentage
> <3000ms: 0.3
> <3500ms: 6.7
> <4000ms: 20.7
> <4500ms: 30.7
> <5000ms: 39.7
> <5500ms: 48.0
> <6000ms: 56.3
> <6500ms: 66.3
> <7000ms: 72.3
> <7500ms: 80.0
> <8000ms: 87.7
> <8500ms: 94.3
> <9000ms: 97.7
> <9500ms: 98.0
> <10000ms: 98.7
> <10500ms: 99.3
> <11000ms: 99.7
> <11500ms: 100.0
> <12000ms: 100.0
>
> See attachment for an oocalc spreadsheat with raw samples and graph
>
> profiles.
> Vanilla VDSM:
> 243874010 function calls (349761179 primitive calls) in 2612.654
> seconds
>
> Ordered by: internal time
> List reduced from 883 to 20 due to restriction <20>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 196905/49111117 500.567 0.003 1002.618 0.000 copy.py:144(deepcopy)
> 120996/4106203 217.146 0.002 998.929 0.000
> copy.py:251(_deepcopy_dict)
> 39443308 192.752 0.000 217.518 0.000 copy.py:261(_keep_alive)
> 859 149.007 0.173 149.466 0.174
> storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
> 601/1891502 147.901 0.246 579.963 0.000
> xmlrpclib.py:765(Marshaller.dump_struct)
> 65493648 128.666 0.000 169.473 0.000 string.py:511(replace)
> 7858586/32213416 95.784 0.000 209.567 0.000
> encoder.py:284(JSONEncoder._iterencode)
> 21831216 89.865 0.000 259.338 0.000 xmlrpclib.py:167(escape)
> 7858587/20679836 80.741 0.000 196.755 0.000
> encoder.py:213(JSONEncoder._iterencode_dict)
> 601/12515060 66.632 0.111 579.970 0.000
> xmlrpclib.py:683(Marshaller.__dump)
> 227968 50.938 0.000 59.783 0.000
> __init__.py:764(UserGroupEnforcingHandler.emit)
> 155279 45.445 0.000 5.345 0.000 posixpath.py:94(splitext)
> 133463 45.351 0.000 104.987 0.001 __init__.py:1175(Logger.handle)
> 7290900 45.228 0.000 149.552 0.000
> xmlrpclib.py:744(Marshaller.dump_unicode)
> 601 43.538 0.072 1.056 0.002
> socket.py:196(_socketobject.accept)
> 35095608 31.967 0.000 31.967 0.000 copy.py:197(_deepcopy_atomic)
> 50498 24.912 0.000 55.671 0.001 inspect.py:247(getmembers)
> 7479142/8217142 24.764 0.000 150.280 0.000
> encoder.py:180(JSONEncoder._iterencode_list)
> 75750 19.871 0.000 75.819 0.001
> vm.py:2271(Vm._getRunningVmStats)
> 4447686 17.603 0.000 17.603 0.000
> sampling.py:323(AdvancedStatsFunction.interval)
>
>
> Patched VDSM:
> 165586271 function calls (181657572 primitive calls) in 2406.080
> seconds
>
> Ordered by: internal time
> List reduced from 1045 to 20 due to restriction <20>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 51229/51594 198.728 0.004 184.402 0.004
> Monitor.py:182(GuestMonitor._should_run)
> 234244 183.784 0.001 570.998 0.002
> vm.py:2274(Vm._getRunningVmStats)
> 647/1891502 173.119 0.268 671.222 0.000
> xmlrpclib.py:765(Marshaller.dump_struct)
> 66247525/66247536 150.364 0.000 192.790 0.000 string.py:511(replace)
> 653 118.867 0.182 119.219 0.183
> storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
> 22052164/22082512 90.992 0.000 294.362 0.000
> xmlrpclib.py:167(escape)
> 637/12640708 75.551 0.119 671.442 0.000
> xmlrpclib.py:683(Marshaller.__dump)
> 5096229/5096259 66.229 0.000 64.007 0.000
> utils.py:296(convertToStr)
> 7200900 49.677 0.000 161.784 0.000
> xmlrpclib.py:744(Marshaller.dump_unicode)
> 1318320 39.141 0.000 39.141 0.000 vm.py:610(compute_latency)
> 57937/57940 37.975 0.001 80.458 0.001 inspect.py:247(getmembers)
> 234244 36.289 0.000 155.929 0.001
> vm.py:519(VmStatsThread._getDiskStats)
> 234244 35.013 0.000 46.974 0.000
> vm.py:2359(Vm._getGraphicsStats)
> 365102 24.819 0.000 106.858 0.000 glob.py:18(iglob)
> 234244 23.588 0.000 32.139 0.000
> vm.py:402(VmStatsThread._getBalloonStats)
> 439440 22.295 0.000 22.295 0.000 vm.py:595(_calcDiskRate)
> 439440 22.178 0.000 61.320 0.000 vm.py:607(_calcDiskLatency)
> 834132 21.402 0.000 21.402 0.000 vm.py:1311(Vm._get_lastStatus)
> 1071684 19.455 0.000 24.374 0.000 pthreading.py:77(Lock.release)
> 2961805 18.907 0.000 61.027 0.000
> xmlrpclib.py:737(Marshaller.dump_string)
>
This look strange:
vanila vdsm: 75750 19.871 0.000 75.819 0.001 vm.py:2271(Vm._getRunningVmStats)
patched vdsm: 234244 183.784 0.001 570.998 0.002 vm.py:2274(Vm._getRunningVmStats)
The second profile is doing about 3 times more Vm._getRunningVmStats calls,
which takes 10 times more cpu time(?!).
We don't have any info on the load - maybe vdsm is overloaded in both runs, which
can explain why it does 1/3 of the work vs the patched version, but the patched
version is probably overloaded as well, which can explain the much higher
cpu time.
I suggest to check the cpu usage, and do *not* make profiles where vdsm uses more
then 100% cpu.
Lets profile much lower load to understand where time is taken.
> Open points:
> * profile remaining flows again and report like this (in progress for
> getAllVmStats)
> * add wiki page
> * find out why there is a so high minimum price for list verb
> * profile JSONRPC (needs patches to vdsClient)
Comparing jsonrpc would be very interesting - but we don't have a jsonrpc
client yet, so you better use the engine for this intead of vdsclient.
Nir
More information about the Devel
mailing list