Hi everyone,

For the upcoming oVirt 4.1, the UX team has been focused hard on webadmin performance improvements. There have been some reports [1] [2] of UI sluggishness in both 3.6 and 4.0, usually after the browser had been open some time, and usually in scale environments.

After some research, we determined that the primary cause of this sluggishness was memory leaks. 

We embarked on several weeks of hunting down memory leak bugs and squashing them. Alexander Wels and Vojtech Szocs led this work, and I helped test the performance of each patch as they created them. As they created patches to squash leaks, performance kept getting better and better. Today we've merged the last of our patches [*], and I'm happy to announce that we are now seeing much better UI performance on 4.1-master and 4.0.6.

Over the course of several hours with the browser window open, users should see no sluggishness at all.

[*] This last patch switches oVirt from using de-rpc to gwt-rpc in the frontend. This improves performance, but it also allows us to upgrade to GWT 2.8. We'd been previously blocked on that.

If you're interested in UI performance testing, continue reading. If not, you can stop here :)

.....

To verify our performance improvements, we took some simple measurements using selenium webdriver. The tests were unscientific, but very helpful. We ran a webdriver flow through oVirt that clicked some buttons and tabs and refreshed some grids. We did it a few hundred or thousand times. The tests were run using stubbed hosts (ovirt-vdsmfake) so that only the engine and UI were under test.

Below are the important takeaways. The x axis is time, and each point on a graph is a loop through the same webdriver flow. The (ms) y axes are response times, and memory is in MB.

In this graph, we compare oVirt 4.1 with and without our most impactful patch applied. As you can see, with the patch applied, response time stays flat for 200 loops of my test script over the course of 18 and 43 minutes. Without the patch applied, response time quickly degraded such that 200 loops of my test script took 1 hr 2 minutes vs. 18 minutes with the patch applied -- a 66% improvement!
Inline image 1
In this graph [ignore the spike], we tested oVirt hard for 6 hours 25 minutes (2000 loops). As you can see, the response times stay relatively flat over 6 hours! This is a great improvement. Do note that the memory is still growing, albeit much more slowly now. You can see towards the end of this run, maybe around hour 5, that the deviation starts to go up (the line thickens). Takeaway: maybe refresh your browser after many hours of having webadmin open. But, this is a stress test -- I'm betting users won't notice this slowdown after even 6 hours of regular webadmin use or idling.


Last, here is a graph that shows gwt-rpc performing slightly better than de-rpc. Memory consumption is about the same -- gwt-rpc is just a faster rpc implementation.


Reply with any questions or concerns. Thanks!

Best wishes,
Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1368101
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1388462

--
Greg Sheremeta, MBA
Red Hat, Inc.
Sr. Software Engineer
gshereme@redhat.com