REST API data aggregation

24 Mar 2017

      Hi All,

for quite some time I have been more or less involved in development of
various UIs for oVirt based entirely on the oVirt's REST API ranging from
the quite mature moVirt [1] through some cockpit extensions to a young and
experimental user portal replacement [2].

One issue we hit over and over again is the missing data aggregation. In
the 3.x era we used to use in moVirt the detail=something
api to get the disks and nics of the VM, something like:

GET /ovirt-engine/api/vms
Accept: application/json; detail=disks

This allowed us to store this data in local database leading to great user
experience. Since this feature has been removed in 4.x API [3]
we needed to retire to a different solution. When the VM detail is selected
by the user, start loading the disks and nics and hope the user
will not be fast enough to see the delay. The UX is slightly worse bug
kinda acceptable.

We hit this issue harder in the new user portal [2], because we already
have the VM cached and show the whole VM in one screen. So, if you pick it,
you will get it's details immediately.
But, since you don't have all the details, we need to do an additional call
(two actually) to load this data and they start to appear later.
So, something which would be very fast and smooth starts to feel sluggish.

Recently, we hit this issue again which forced us to sacrifice the UX even
more - it is the "console in use" feature of user portal.
The use case is this:
- if the console is already taken by some user, there are complications if
other current user tryes to take it as well (will avoid details about
settings and permissins involved, but long story short, the user will
probably not be allowed to connect to it. The "probably" is the key here
since we can not do any intelligent decision in advance, we can only warn
the user that the console is taken).
- in the current GWT user portal, if the VM's console is taken, it is shown
on the VM's "box" that "console is taken". This was a highly requested
feature
- to get this information using the current REST API, we need to go to the
/vms/<vmid>/sessions subcollection. To get this for all VMs, it would be
doing N queries per poll which we can not afford
- so the current PR [4] will probably end up to only check it on the
attempt to connect to the console warning the user. Maybe it will be also
shown in Vm details. But the UX in case the user will look for a VM which
has free console will suffer significantly (e.g. try one by one until some
opens or look at details one by one to see if the warning appears (with a
delay))

I understand that embedding the details of the VM to the response comes
with a cost, namely:
- performance hit
- complexity of the API code
- the "cleanness" of REST suffers

But I think we should seriously consider to provide some option to data
aggregation.

I know this has been discussed many times with no result, but I think it is
time to bring this topic up again. I'll try to summarize the (failed)
attempts tried so far:
- the detail=<something> parameter with ad-hoc embedding of data. This has
been there and removed in 4.0 [3]
- the DoctorREST project - e.g. a proxy above the current api. The idea was
to create a service which will be independent of the engine itself, will
locally poll the engine's REST, store all data in local (mongo)DB and
provide a rich api with aggregations and projections and push
notifications. This polling of everything to get the data to DoctorREST
proved to be pretty costy, so also a more invasive approach of pushing data
from engine to doctor has been discused [5]. None of this two approaches
have been accepted (too complicated, too invasive).
- writing some custom ad-hoc servlet serving only a purpose of one frontend
- this is actually there for the dashboard, but it is not a generic
solution for the other frontends and we really should not develop custom
"APIs" for every frontend
- there were some other proposals discussed (some 3th party solutions etc)
but I think none of them made it even to a PoC

So, now I would try again and try small to get at least some benefit. I see
2 paths we could try:
1: embed something which burns us immediatly, e.g. the /sessions into VMs.
I really liked the ;detail=sessions approach, could we move it back?
2: add some tiny service which would just accept a list of queries, execute
them locally (but using real HTTP requests) and return in one bulk. A naive
implementation just to give a sense of what I mean of this would be a shell
script getting list of strings like "
https://localhost/ovirt-engine/api/vms/123/sessions" iterate over them and
do a curl request for each, mangle the results into one string and return
(credits for this idea to msivak). Easy to implement, possibility to add
also projections later to save some bandwidth. But the API would anyway be
hammered by bunch of queries, only the network roundtrip would be saved.
3: any other simple approaches?

I honestly prefer the first approach. It is not beautiful, it is not
REST-ful, but it is easy to implement, very pragmatic and useful.
What do you think?

Thank you and sorry for the long mail :)
Tomas

[1]: https://github.com/oVirt/moVirt
[2]: https://github.com/oVirt/ovirt-web-ui
[3]: https://gerrit.ovirt.org/#/c/61260
[4]: https://github.com/oVirt/ovirt-web-ui/pull/106/
[5]: https://gerrit.ovirt.org/#/c/45233/

Tomas Jelinek

Martin Sivak

Greg Sheremeta

Martin Sivak

Tomas Jelinek

Martin Sivak

Yaniv Kaul

Martin Sivak

Yaniv Kaul

Juan Hernández

Tomas Jelinek

Juan Hernández

Tomas Jelinek

Greg Sheremeta

Shmuel Melamud

Martin Betak

tags

participants (7)