Top posting, sorry.
There are a few things I'd like to clarify, regarding this subject:
1. Data aggregation, as requested now by Tomas, and by other people in
the past.
We used to have that 'detail' parameter, to aggregate certain very
specific types of data, in particular to aggregate VM disks and NICs. We
removed that in version 4 of the API because the implementation was
extremely inefficient, from the engine point of view. An innocent
request like this:
GET /ovirt-engine/api/vms?detail=+disks,+nics
Would generate, with the implementation we used to have, 1 query for the
VMs and then as many queries for disks and NICs as VMs in the system. In
our scale test environments, for example, with approx 4000 VMs and 10000
disks, that would take more than 20 hours to execute.
In addition, we didn't have in the past any mechanism to make this
available in a generic one, because there was no knowledge in the API of
what are 'details'.
In version 4 of the API we introduced a formal (kind of) specification
of the API (a.k.a. the model), and int includes knowledge about what are
'links'. For example, the specification of the VM type contains this:
@Link DiskAttachment[] diskAttachments();
@Link Nic[] nics();
With this information we are now in a position where we can implement
this in a generic way.
We intend to implement this using a mechanism similar to the existing
'detail' parameter:
GET /ovirt-engine/api/vms/123?follow=disk_attachments,nics
The naive implementation of this is to let the API call itself. For
example, when the user requests to follow the 'disk_attachments' detail
the API can just call itself to get that:
GET /ovirt-engine/api/vms/123/disk_attachments
However, we can't use that naive approach, if we do we end with the
1+C*N query problem described before. We need to use specific
implementations for certain frequent use cases, like VMs+disks+nics, and
that needs work in the API and in the backend.
Tomas, if you want to help moving this forward, please open a RFE and
makes sure it gets attention.
2. Reuse of TLS sessions.
The part of creating TLS sessions that is expensive is the generation of
the shared session key. That can be avoided if both the server and the
client are careful and reuse the session, using the session cache
mechanism built-in into TLS itself. The web servers that we use (Apache
and Undertow) do implement this mechanism, and so do most of our
clients. Make sure that your client uses it as well. In Java this is
achieved re-using the SSLContext. We already do that for the engine to
VDSM communciation for example. In JavaScript the browser already takes
care of this.
3. Parallelism and latency.
A typical problem that we have is that we send many request to the
server. For example, to retrieve user sessions for a set of VMs we tend
to send many requests like this:
GET /ovirt-engine/api/vms/1/sessions
GET /ovirt/engine/api/vms/2/sessions
GET /ovirt-engine/api/vms/3/sessions
...
And we do that in a synchronous way: send one, wait for the result, send
another one, wait for the result, etc. This means that we don't take
advantage of the parallelism of the server and that we add to each
request the network round trip time. So if we have N requests, we have
to wait at least N*RTT.
The web servers that we use support multiple connections, and the
protocol that we use, HTTP, supports pipe-lining. This means that you
can send multiple requests in parallel, and that you can send multiple
requests without waiting for the response. To give you an idea of the
improvement that can be achieved, we recently added asynchronous request
support to the Ruby SDK, with multiple connections and pipe-lining. In
our scale testing environment that reduced the time to collect a
complete inventory from approx 30 min to approx 2 min. Here you have an
example:
https://github.com/oVirt/ovirt-engine-sdk-ruby/blob/master/sdk/examples/a...
So make sure that you take advantage of that in your clients. Sadly
pipe-lining is disabled by default in most browsers, so this isn't
helpful for JavaScript applications.
4. HTTP/2 support.
The application server that we use, WildFly, supports HTTP/2, including
ALPN, out of the box, since version 10.1. We need a mechanism to enable it:
core: Add support for enabling HTTP/2
https://gerrit.ovirt.org/74621
And then we need to get Apache out of the way, for API traffic, at
least. I think that is something we can do in the context of the engine
"podification" effort.
However, note that HTTP/2 won't have that big impact in performance for
applications that continue to use a synchronous/serial style of
interaction with the API.
On 03/24/2017 11:16 PM, Yaniv Kaul wrote:
On Fri, Mar 24, 2017 at 8:57 PM, Martin Sivak <msivak(a)redhat.com
<mailto:msivak@redhat.com>> wrote:
> Current Apache used has only experimental module for it.
> Undertow is supposed to have a better support. I wonder when/if we can drop
> Apache...
The last info I have about that from mperina is that we need Apache
for kerberos support atm.
I don't think we need it - I remember reading that Undertow does support
it as well.
The only issue is that there are probably 10 people in the world who
know how to configure Undertow for Kerberos, while many do for Apache.
And since we leave it for the user to configure...
Y.
Martin
On Fri, Mar 24, 2017 at 5:30 PM, Yaniv Kaul <ykaul(a)redhat.com
<mailto:ykaul@redhat.com>> wrote:
>
>
> On Fri, Mar 24, 2017 at 6:43 PM, Martin Sivak <msivak(a)redhat.com
<mailto:msivak@redhat.com>> wrote:
>>
>> > 2: you can have more api gateways (e.g. more apis) tailored for
every
>> > frontend. I don't think we need this - the current API serves
us pretty
>> > well
>> > in every FE Im involved in. The only thing which I miss is the data
>> > aggregation.
>>
>> So it does not serve us well. Aggregation of data is one the usual
>> points of using the gateway.
>> Yes microservices are affected by this indeed, but so are we because
>> implementing the aggregation directly in the current engine API layer
>> is hard.
>>
>> > So I would go back to the original topic of this thread - do
some small
>> > change which has a chance to be merged to the project and helps
us where
>> > it
>> > hurts.
>
>
> I'm wondering if very specific additional REST API calls can suffice.
> For example, a 'Get VM + disks + NIC' API call seems reasonable to
add for
> the various clients who commonly need it.
>
>>
>> Can a simple HTTP/2 to HTTP/AJP gateway be the simplest solution? Our
>> Apache might even have a module for it already.
>
>
> Current Apache used has only experimental module for it.
> Undertow is supposed to have a better support. I wonder when/if we
can drop
> Apache...
> Y.
>
>>
>> That way you can multiplex all the REST calls using a single tcp
>> connection (and a single SSL negotiation).
>>
>> A custom SSO enabled service like that might be even better as it
>> would be able to skip the authentication
>> layers too and that would lower the engine load. But I am not sure it
>> is possible with the current codebase.
>>
>> Martin
>>
>> On Fri, Mar 24, 2017 at 4:22 PM, Tomas Jelinek
<tjelinek(a)redhat.com <mailto:tjelinek@redhat.com>>
>> wrote:
>> >
>> >
>> > On Fri, Mar 24, 2017 at 3:58 PM, Martin Sivak
<msivak(a)redhat.com <mailto:msivak@redhat.com>> wrote:
>> >>
>> >> > I feel like every REST API I've ever worked with has had
the
>> >> > aggregation
>> >> > +
>> >> > projection problem. It's like we're trying to use REST
as a
>> >> > replacement
>> >> > for
>> >> > SQL -- but the logic that executes the "SQL" lives
in a
browser now,
>> >> > and
>> >> > it
>> >> > used to live on a server close to the DB. And REST isn't
expressive
>> >> > for
>> >> > selecting data like SQL is.
>> >>
>> >> The current industry solution I know about is called API gateway..
>> >> most of the big players have internal API with lots of low
level stuff
>> >> and then couple of external API gateways tailored to what the
client
>> >> needs.
>> >>
>> >>
http://microservices.io/patterns/apigateway.html
<
http://microservices.io/patterns/apigateway.html> (check the backend
>> >> for frontend section)
>> >>
>> >> This trend is also visible when you think about services that
offer
>> >> API gateway management and billing like
>> >>
https://aws.amazon.com/api-gateway/
<
https://aws.amazon.com/api-gateway/> or our very own
>> >>
https://www.3scale.net/
>> >
>> >
>> > right, but the api gateway solves 2 problems:
>> >
>> > 1: if you have a microservice architecture it is hard for
frontend to
>> > talk
>> > to 20 different moving services. So the gateway hides this
complexity
>> > behind
>> > it. This is not the problem we have.
>> >
>> > 2: you can have more api gateways (e.g. more apis) tailored for
every
>> > frontend. I don't think we need this - the current API serves
us pretty
>> > well
>> > in every FE Im involved in. The only thing which I miss is the data
>> > aggregation.
>> >
>> > So I would go back to the original topic of this thread - do
some small
>> > change which has a chance to be merged to the project and helps
us where
>> > it
>> > hurts.
>> >
>> >>
>> >>
>> >>
>> >> Martin
>> >>
>> >> On Fri, Mar 24, 2017 at 3:47 PM, Greg Sheremeta
<gshereme(a)redhat.com <mailto:gshereme@redhat.com>>
>> >> wrote:
>> >> > I feel like every REST API I've ever worked with has had
the
>> >> > aggregation
>> >> > +
>> >> > projection problem. It's like we're trying to use REST
as a
>> >> > replacement
>> >> > for
>> >> > SQL -- but the logic that executes the "SQL" lives
in a
browser now,
>> >> > and
>> >> > it
>> >> > used to live on a server close to the DB. And REST isn't
expressive
>> >> > for
>> >> > selecting data like SQL is.
>> >> >
>> >> > There must be some industry solution to this "I want to
do
SQL over
>> >> > REST"
>> >> > problem.
>> >> >
>> >> > On Fri, Mar 24, 2017 at 5:54 AM, Martin Sivak
<msivak(a)redhat.com <mailto:msivak@redhat.com>>
>> >> > wrote:
>> >> >>
>> >> >> > for quite some time I have been more or less involved
in
>> >> >> > development
>> >> >> > of
>> >> >> > various UIs for oVirt based entirely on the
oVirt's REST API
>> >> >> > ranging
>> >> >> > from
>> >> >> > the quite mature moVirt [1] through some cockpit
extensions to a
>> >> >> > young
>> >> >> > and
>> >> >> > experimental user portal replacement [2].
>> >> >>
>> >> >> oVirt optimizer has the same issue..
>> >> >>
>> >> >> > 2: add some tiny service which would just accept a
list of
>> >> >> > queries,
>> >> >> > execute
>> >> >> > them locally (but using real HTTP requests) and
return in one
>> >> >> > bulk. A
>> >> >> > naive
>> >> >> > implementation just to give a sense of what I mean
of
this would
>> >> >> > be a
>> >> >> > shell
>> >> >> > script getting list of strings like
>> >> >> >
"https://localhost/ovirt-engine/api/vms/123/sessions
<
https://localhost/ovirt-engine/api/vms/123/sessions>" iterate over
>> >> >> > them
>> >> >> > and
>> >> >> > do a curl request for each, mangle the results into
one
string and
>> >> >> > return
>> >> >> > (credits for this idea to msivak). Easy to
implement,
possibility
>> >> >> > to
>> >> >> > add
>> >> >> > also projections later to save some bandwidth. But
the
API would
>> >> >> > anyway
>> >> >> > be
>> >> >> > hammered by bunch of queries, only the network
roundtrip
would be
>> >> >> > saved.
>> >> >>
>> >> >> The biggest cost for (especially mobile) clients is the
cost of
>> >> >> establishing new SSL connection. SSL is also pretty
expensive on the
>> >> >> server side.
>> >> >>
>> >> >> So running the aggregation service on the ovirt-engine
machine
>> >> >> (behind
>> >> >> Apache) means the client will do a single SSL request
with
list of N
>> >> >> urls and the local "reverse-proxy" will perform
single
>> >> >> authentication
>> >> >> and N plain HTTP requests (or even better - AJP). It
won't
remove
>> >> >> any
>> >> >> time from the actual command run time, but it will reduce
protocol
>> >> >> overhead.
>> >> >>
>> >> >> I think this is the simplest first step that requires
almost no
>> >> >> change
>> >> >> to existing infrastructure.
>> >> >>
>> >> >> --
>> >> >> Martin Sivak
>> >> >> SLA / oVirt
>> >> >>
>> >> >> On Fri, Mar 24, 2017 at 10:20 AM, Tomas Jelinek
>> >> >> <tjelinek(a)redhat.com
<mailto:tjelinek@redhat.com>>
>> >> >> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > for quite some time I have been more or less involved
in
>> >> >> > development
>> >> >> > of
>> >> >> > various UIs for oVirt based entirely on the
oVirt's REST API
>> >> >> > ranging
>> >> >> > from
>> >> >> > the quite mature moVirt [1] through some cockpit
extensions to a
>> >> >> > young
>> >> >> > and
>> >> >> > experimental user portal replacement [2].
>> >> >> >
>> >> >> > One issue we hit over and over again is the missing
data
>> >> >> > aggregation.
>> >> >> > In
>> >> >> > the
>> >> >> > 3.x era we used to use in moVirt the
detail=something
>> >> >> > api to get the disks and nics of the VM, something
like:
>> >> >> >
>> >> >> > GET /ovirt-engine/api/vms
>> >> >> > Accept: application/json; detail=disks
>> >> >> >
>> >> >> > This allowed us to store this data in local database
leading to
>> >> >> > great
>> >> >> > user
>> >> >> > experience. Since this feature has been removed in
4.x
API [3]
>> >> >> > we needed to retire to a different solution. When the
VM
detail is
>> >> >> > selected
>> >> >> > by the user, start loading the disks and nics and
hope
the user
>> >> >> > will not be fast enough to see the delay. The UX is
slightly worse
>> >> >> > bug
>> >> >> > kinda
>> >> >> > acceptable.
>> >> >> >
>> >> >> > We hit this issue harder in the new user portal [2],
because we
>> >> >> > already
>> >> >> > have
>> >> >> > the VM cached and show the whole VM in one screen.
So, if
you pick
>> >> >> > it,
>> >> >> > you
>> >> >> > will get it's details immediately.
>> >> >> > But, since you don't have all the details, we
need to do an
>> >> >> > additional
>> >> >> > call
>> >> >> > (two actually) to load this data and they start to
appear
later.
>> >> >> > So, something which would be very fast and smooth
starts
to feel
>> >> >> > sluggish.
>> >> >> >
>> >> >> > Recently, we hit this issue again which forced us to
sacrifice the
>> >> >> > UX
>> >> >> > even
>> >> >> > more - it is the "console in use" feature
of user portal.
>> >> >> > The use case is this:
>> >> >> > - if the console is already taken by some user, there
are
>> >> >> > complications
>> >> >> > if
>> >> >> > other current user tryes to take it as well (will
avoid
details
>> >> >> > about
>> >> >> > settings and permissins involved, but long story
short,
the user
>> >> >> > will
>> >> >> > probably not be allowed to connect to it. The
"probably"
is the
>> >> >> > key
>> >> >> > here
>> >> >> > since we can not do any intelligent decision in
advance,
we can
>> >> >> > only
>> >> >> > warn
>> >> >> > the user that the console is taken).
>> >> >> > - in the current GWT user portal, if the VM's
console is
taken, it
>> >> >> > is
>> >> >> > shown
>> >> >> > on the VM's "box" that "console is
taken". This was a highly
>> >> >> > requested
>> >> >> > feature
>> >> >> > - to get this information using the current REST API,
we
need to
>> >> >> > go
>> >> >> > to
>> >> >> > the
>> >> >> > /vms/<vmid>/sessions subcollection. To get this
for all
VMs, it
>> >> >> > would
>> >> >> > be
>> >> >> > doing N queries per poll which we can not afford
>> >> >> > - so the current PR [4] will probably end up to only
check it on
>> >> >> > the
>> >> >> > attempt
>> >> >> > to connect to the console warning the user. Maybe it
will
be also
>> >> >> > shown
>> >> >> > in
>> >> >> > Vm details. But the UX in case the user will look for
a
VM which
>> >> >> > has
>> >> >> > free
>> >> >> > console will suffer significantly (e.g. try one by
one
until some
>> >> >> > opens
>> >> >> > or
>> >> >> > look at details one by one to see if the warning
appears
(with a
>> >> >> > delay))
>> >> >> >
>> >> >> > I understand that embedding the details of the VM to
the
response
>> >> >> > comes
>> >> >> > with
>> >> >> > a cost, namely:
>> >> >> > - performance hit
>> >> >> > - complexity of the API code
>> >> >> > - the "cleanness" of REST suffers
>> >> >> >
>> >> >> > But I think we should seriously consider to provide
some
option to
>> >> >> > data
>> >> >> > aggregation.
>> >> >> >
>> >> >> > I know this has been discussed many times with no
result,
but I
>> >> >> > think
>> >> >> > it
>> >> >> > is
>> >> >> > time to bring this topic up again. I'll try to
summarize the
>> >> >> > (failed)
>> >> >> > attempts tried so far:
>> >> >> > - the detail=<something> parameter with ad-hoc
embedding
of data.
>> >> >> > This
>> >> >> > has
>> >> >> > been there and removed in 4.0 [3]
>> >> >> > - the DoctorREST project - e.g. a proxy above the
current
api. The
>> >> >> > idea
>> >> >> > was
>> >> >> > to create a service which will be independent of the
engine
>> >> >> > itself,
>> >> >> > will
>> >> >> > locally poll the engine's REST, store all data in
local
(mongo)DB
>> >> >> > and
>> >> >> > provide a rich api with aggregations and projections
and push
>> >> >> > notifications.
>> >> >> > This polling of everything to get the data to
DoctorREST
proved to
>> >> >> > be
>> >> >> > pretty
>> >> >> > costy, so also a more invasive approach of pushing
data from
>> >> >> > engine
>> >> >> > to
>> >> >> > doctor has been discused [5]. None of this two
approaches
have
>> >> >> > been
>> >> >> > accepted
>> >> >> > (too complicated, too invasive).
>> >> >> > - writing some custom ad-hoc servlet serving only a
purpose of one
>> >> >> > frontend
>> >> >> > - this is actually there for the dashboard, but it is
not a
>> >> >> > generic
>> >> >> > solution
>> >> >> > for the other frontends and we really should not
develop
custom
>> >> >> > "APIs"
>> >> >> > for
>> >> >> > every frontend
>> >> >> > - there were some other proposals discussed (some 3th
party
>> >> >> > solutions
>> >> >> > etc)
>> >> >> > but I think none of them made it even to a PoC
>> >> >> >
>> >> >> > So, now I would try again and try small to get at
least some
>> >> >> > benefit.
>> >> >> > I
>> >> >> > see
>> >> >> > 2 paths we could try:
>> >> >> > 1: embed something which burns us immediatly, e.g.
the
/sessions
>> >> >> > into
>> >> >> > VMs. I
>> >> >> > really liked the ;detail=sessions approach, could we
move
it back?
>> >> >> > 2: add some tiny service which would just accept a
list of
>> >> >> > queries,
>> >> >> > execute
>> >> >> > them locally (but using real HTTP requests) and
return in one
>> >> >> > bulk. A
>> >> >> > naive
>> >> >> > implementation just to give a sense of what I mean
of
this would
>> >> >> > be a
>> >> >> > shell
>> >> >> > script getting list of strings like
>> >> >> >
"https://localhost/ovirt-engine/api/vms/123/sessions
<
https://localhost/ovirt-engine/api/vms/123/sessions>" iterate over
>> >> >> > them
>> >> >> > and
>> >> >> > do a curl request for each, mangle the results into
one
string and
>> >> >> > return
>> >> >> > (credits for this idea to msivak). Easy to
implement,
possibility
>> >> >> > to
>> >> >> > add
>> >> >> > also projections later to save some bandwidth. But
the
API would
>> >> >> > anyway
>> >> >> > be
>> >> >> > hammered by bunch of queries, only the network
roundtrip
would be
>> >> >> > saved.
>> >> >> > 3: any other simple approaches?
>> >> >> >
>> >> >> > I honestly prefer the first approach. It is not
beautiful, it is
>> >> >> > not
>> >> >> > REST-ful, but it is easy to implement, very pragmatic
and
useful.
>> >> >> > What do you think?
>> >> >> >
>> >> >> > Thank you and sorry for the long mail :)
>> >> >> > Tomas
>> >> >> >
>> >> >> > [1]:
https://github.com/oVirt/moVirt
<
https://github.com/oVirt/moVirt>
>> >> >> > [2]:
https://github.com/oVirt/ovirt-web-ui
<
https://github.com/oVirt/ovirt-web-ui>
>> >> >> > [3]:
https://gerrit.ovirt.org/#/c/61260
<
https://gerrit.ovirt.org/#/c/61260>
>> >> >> > [4]:
https://github.com/oVirt/ovirt-web-ui/pull/106/
<
https://github.com/oVirt/ovirt-web-ui/pull/106/>
>> >> >> > [5]:
https://gerrit.ovirt.org/#/c/45233/
<
https://gerrit.ovirt.org/#/c/45233/>
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Devel mailing list
>> >> >> > Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>> >> >> >
http://lists.ovirt.org/mailman/listinfo/devel
<
http://lists.ovirt.org/mailman/listinfo/devel>
>> >> >> _______________________________________________
>> >> >> Devel mailing list
>> >> >> Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>> >> >>
http://lists.ovirt.org/mailman/listinfo/devel
<
http://lists.ovirt.org/mailman/listinfo/devel>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Greg Sheremeta, MBA
>> >> > Red Hat, Inc.
>> >> > Sr. Software Engineer
>> >> > gshereme(a)redhat.com <mailto:gshereme@redhat.com>
>> >
>> >
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/devel
<
http://lists.ovirt.org/mailman/listinfo/devel>
>
>