[ovirt-devel] Doctor Rest PostgreSQL report

Fri Oct 2 07:34:03 UTC 2015

On 1 Oct 2015, at 23:17, Martin Betak wrote:

> 
> 
> 
> 
> ----- Original Message -----
>> From: "Piotr Kliczewski" <pkliczew at redhat.com>
>> To: "Martin Betak" <mbetak at redhat.com>
>> Cc: "engine-devel at ovirt.org" <devel at ovirt.org>, "Vojtech Szocs" <vszocs at redhat.com>, "Einav Cohen"
>> <ecohen at redhat.com>
>> Sent: Thursday, October 1, 2015 6:34:02 PM
>> Subject: Re: [ovirt-devel] Doctor Rest PostgreSQL report
>> 
>> On Thu, Oct 1, 2015 at 5:21 PM, Martin Betak <mbetak at redhat.com> wrote:
>> 
>>> 
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>>> From: "Piotr Kliczewski" <pkliczew at redhat.com>
>>>> To: "Martin Betak" <mbetak at redhat.com>
>>>> Cc: "engine-devel at ovirt.org" <devel at ovirt.org>
>>>> Sent: Thursday, October 1, 2015 5:08:39 PM
>>>> Subject: Re: [ovirt-devel] Doctor Rest PostgreSQL report
>>>> 
>>>> Martin,
>>>> 
>>>> Looking at the stats you generated I can see that there is almost no
>>>> difference for cpu and memory. Load seems to be at the same level for
>>> both.
>>>> I tried to understand the differences by looking and # of calls and total
>>>> time (top 10) but there was almost no difference. The only slight
>>> difference
>>>> I can see is in avg time. It seems that we haven't generate enough load
>>> to
>>>> see significant improvement. How many client have you used during
>>> testing?
>>> 
>>> This benchmark was without any clients. This was just to demonstrate the
>>> load
>>> Doctor generates on DB in context of existing DB load by backend (mosty
>>> VM/Host
>>> monitoring), which was I believe one of Barak's major concerns: how
>>> expensive it
>>> is when we *don't* have thousands of connected users.
>>> 
>>> 
>> I am not sure what we are testing here. I can see a lot of queries which
>> were generated
>> by host monitoring. You said that you had around 200+ hosts and you run
>> full dump for
>> doctor every 5 seconds. Next you moved to 1 second intervals.
>> I seems that host monitoring generated enough load to hide any impact of
>> doctor rest
>> queries.

That's my impression as well. The setup is a bit small to my liking, I'd definitely like to see how does it look like with few more hundred hosts and ~10000VMs. That's something we still should be able to reach on a laptop

More importantly, I would like to figure out the worst query in the report, which consumes really excessive amount of time (those 30 mins+ during a 20 minute run)
Another thing I want to clear out is the impact of "list" command since Martin did run on xml-rpc…that might be behind the crazy query, since we don't really need it in idle test we can just increase "list" to some high number.

Either way, indeed the additional doctor rest load seems tolerable, but we need a bit more profiles and understanding of why we see what we see
I would recommend to combine the testing with Roman's tool/idea, which he is about to send out as well:)

Thanks,
michal

> 
> Yes, this is exactly what we wanted to estimate. Whether even the most naive
> implementation of Doctor connector imposes some significant impact on the overall system.
> Essentially, how much do we have to pay to get all the benefits of Doctor Rest.
> 
> As you say, from these measurements it would seem that the price is negligible.
> 
> Martin
> 
>> 
>> 
>> 
>>>> 
>>>> I am not sure how much it would take but what do you think about using
>>>> doctor as caching service for the engine and run few tests.
>>>> I wonder what would be the result of such PoC.
>>> 
>>> I would like to move to more real world tests as soon as we have at least
>>> some working prototype of next gen UI (most significantly the new
>>> Dashboard).
>>> 
>>> I understand that the UI discussions (adding Vojtech to CC) have progressed
>>> recently so as soon as we have some actual frontend code to work with I
>>> would
>>> like to hook it up to Doctor and make some end-to-end measurements.
>>> 
>>> Martin
>>> 
>>>> 
>>>> Thanks,
>>>> Piotr
>>>> 
>>>> 
>>>> On Thu, Oct 1, 2015 at 3:25 PM, Martin Betak <mbetak at redhat.com> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> so I installed a few more plugins to the pgCluu and PostgreSQL itself.
>>>>> Now I have also the overall system load and total numbers for specific
>>>>> queries.
>>>>> 
>>>>> If you look at the report, database 'engine' and 'Statement statistics'
>>>>> we can clearly see that the overwhelming majority of DB time is spent
>>> in
>>>>> GetVmsRunningOnVds() stored procedure.
>>>>> 
>>>>> Turining on Doctor Rest with 5 second full-dump interval you can see
>>>>> that the calls used by DoctorCacheManager (GetAllFromVms,
>>>>> GetAllFromVds....)
>>>>> have hard time to add up to at least 1% of the overall load.
>>>>> 
>>>>> Also you can see the 'System': CPU and memory statistics that those are
>>>>> largely
>>>>> unaffected by running Doctor service alongside engine.
>>>>> 
>>>>> I also tried setting the full update interval to Doctor to 1 second to
>>> see
>>>>> how
>>>>> this would go - so essentially each second do a full dump of business
>>>>> entities -
>>>>> and this moved the overall Doctor overhead to ~2.5% of total DB load.
>>>>> 
>>>>> Of course for the UI purposes interval in the range of 3-5 seconds
>>> should
>>>>> be more
>>>>> than acceptable in my opinion.
>>>>> 
>>>>> As always - questions and remarks are more than welcome :-)
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Martin
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Martin Betak" <mbetak at redhat.com>
>>>>>> To: "Piotr Kliczewski" <pkliczew at redhat.com>
>>>>>> Cc: "engine-devel at ovirt.org" <devel at ovirt.org>
>>>>>> Sent: Wednesday, September 30, 2015 4:52:12 PM
>>>>>> Subject: Re: [ovirt-devel] Doctor Rest PostgreSQL report
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Piotr Kliczewski" <pkliczew at redhat.com>
>>>>>>> To: "Martin Betak" <mbetak at redhat.com>
>>>>>>> Cc: "engine-devel at ovirt.org" <devel at ovirt.org>, "Eli Mesika"
>>>>>>> <emesika at redhat.com>, "Martin Perina"
>>>>>>> <mperina at redhat.com>
>>>>>>> Sent: Wednesday, September 30, 2015 3:29:28 PM
>>>>>>> Subject: Re: Doctor Rest PostgreSQL report
>>>>>>> 
>>>>>>> Martin,
>>>>>>> 
>>>>>>> For me it would be great to understand how cpu, memory changes over
>>>>> time
>>>>>>> for the engine. I would like to see the same for doctor service.
>>>>>>> I was not able to find it but it would be great to understand how
>>> many
>>>>>>> queries there were for both tests and how log it took to run them.
>>>>>> 
>>>>>> Yes, right now I'm looking for other tools to provide me exactly with
>>>>> that.
>>>>>> I just wanted to share the preliminary aggregated statistics.
>>>>>> 
>>>>>>> 
>>>>>>> It would be good to understand implications of running doctor on
>>> the
>>>>> same
>>>>>>> machine as engine and on other machine.
>>>>>>> 
>>>>>> 
>>>>>> Indeed, this is precisely what I'm testing. The attached reports were
>>>>> with
>>>>>> Doctor running on the same machine as the engine.
>>>>>> 
>>>>>>> Thanks,
>>>>>>> Piotr
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Sep 30, 2015 at 3:13 PM, Martin Betak <mbetak at redhat.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> I performed a stress test using FakeVDSM environment with 200+
>>> hosts
>>>>>>>> and 500+ VMs.
>>>>>>>> 
>>>>>>>> Attached are generated HTML reports for this environment.
>>>>>>>> In both cases I tried to simulate some random load using existing
>>>>>>>> webadmin. In the '_doctor' case the simple connector from [1] was
>>>>>>>> running *in addition to* the legacy UI.
>>>>>>>> 
>>>>>>>> The used pgCluu tool [2] which may be useful
>>>>>>>> for DB experts for some further insight.
>>>>>>>> 
>>>>>>>> I wanted to send this out as soon as possible so we can better
>>>>> analyze
>>>>>>>> our current performance and the possible impact Doctor Rest
>>>>>>>> integration would have on the system.
>>>>>>>> 
>>>>>>>> Please feel free to review the attached reports and/or suggest
>>> other
>>>>>>>> ways/tools how to better benchmark the DB load caused by Doctor
>>> Rest.
>>>>>>>> 
>>>>>>>> Thank you very much.
>>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> 
>>>>>>>> Martin
>>>>>>>> 
>>>>>>>> [1] https://gerrit.ovirt.org/#/c/45233/
>>>>>>>> [2] http://pgcluu.darold.net/
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Devel mailing list
>>>>>> Devel at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel