[ovirt-devel] Caching of data from the database done properly

Sat Jul 9 08:44:25 UTC 2016

Hi Martin,

Great feedback. Thanks for the clarifications.

On Thu, Jul 7, 2016 at 3:25 PM, Martin Mucha <mmucha at redhat.com> wrote:
> Hi,
>
> some of information in mail are not exactly true. Namely MacPoolPerCluster *does not do caching*, it does not even have DB layer structures it could cache. So how it works is: pool has configuration upon which it initialize itself. After that, it looks into db for all used MACs, which currently happens to be querying all MAC addresses of all VmNics. So this is initialized from data in DB, but it does not cache them. Clients [of pool] asks pool for MAC address, which is then used somewhere without pool supervision. I don't want to question this design, and I'm not saying that it wouldn't be possible to move it's logic to db layer, but once, long, long time ago someone decided this to be done on bll, and so it is on bll layer.
>

had another look at the MacPoolPerCluster source. You are right. It is
caching some calculations and not database data. I agree that this
should not be in the dao layer or the database. Sorry for the wrong
accusations regarding the MacPoolPerCluster class.

> I understand, that these might come as a problem in arquillian testing, but that's to be resolved, since, not all singletons are for caching. And even if they are, testing framework should be able to cope with such common beans, we shouldn't limit ourselves not to use singletons. Therefore, I wouldn't invest into changing these 'caches', but into allowing more complex setups in our testing. If it's not possible, then 'reset' method is second best solution — we have to use write lock as suggested in cr, and then it should be fine.

For the context: [1]
Having singletons is fine from my perspective too. It is just about
caching data from the database. Spring offeres the @DirtiesContext (A
little bit nicer than with Arquillian where as far as I have seen you
would create different test cases and do new @Deployments). But I
prefer to reset these rare singletons explicitly in a base class for
every test. Otherwise it is always very hard to track down possible
side effects in the class because you did not set up a new context.

For me tests are first class citizens of an application, so having a
way to reinitialize singletons directly is what I prefer. When it is
about caching from the database it is normally not needed since you
just disable the database cache during the tests.

>
> About drawbacks:
> ad 1) yes, this comes as an extra problem, one has to deal with tx on his own, that's true. This wasn't part of original solution, but it should be fixed already.

As long as it is really just the last resort I am fine with it.

> ad 2) No. Caching done correctly is done closest to the consumer. I assume you can similarly ruin hibernate l2 cache via accessing data through different channel. But that's kinda common to all caches — if you bypass them, you'll corrupt them. So do not bypass them, or in this case, use them as they was designed. As it have been said, you ask pool for mac, or inform it, that you're going to use your own, and then use it. It means, that it's designed to actually bypass it on all writes. Therefore if someone writes a code using MAC without prior notification to the pool about such action, it would be a problem. To avoid this problem there has to be bigger refactor — pool would have to persist MAC addresses somehow and not vmNicDao, or if moved entirelly to db layer, there would have to be trigger on vmnic table or something like that...

I missed the calculation part in the macPoolPerCluster, so this is ok,
most of my comments do not apply there now. Of course you can cache
wherever you have to to meet the requirements. Still the best thing
you can have is that the loaded entities, when cached in higher layers
get evicted too when you change something in the lower layers (e.g.
DAO). This is the normal expectation and what all the hibernate level2
cache is about. Since we don't use all these fancy caches which do all
the hard lifting for us it would be even more complex for us to do the
caching on higher layers right.

> ad 3) It was requested, to have at least tens of millions of macs in pool. Forget about loop, initializing this structure for given clusterId is not acceptable even once per request. Loading that structure is quite cheap (now), but not that cheap to allow what you ask for. Moving whole stuff to db layer would be probably beneficial (when it was originally implemented), but it's worthless doing it now.
>

It is really just about caching from the database. Moving logic to the
dao layer or the database is definitely what I had in mind.

> About suggestions: neither of them applies to MacPoolPerCluster — point (3) for example: since pool is not a simple cache of db structure, and does not have corresponding data in db layer, it cannot cache writes and it even does not do any writes at all...

Not caching writes yourself is just a general thing. Regarding the
MacPoolPerCluster I was referring to the constraints checks. But again
most of it does not really apply now to that class anyway.

>
> ——
> Surely I can imagine better implementation of this, but it'd require bigger changes whose benefits aren't worth of cost of this change. (I hope that) we have to deal with it. This was original design, and since testing framework (with changed caches or without) should deal with singletons etc. anyways, it's not worthy to change it. If there aren't any better options(I don't know), reinitializing bean can be used (and I apologize for blocking that). I'd avoid bigger rewrites in this area.

For the context: [1]

Hoping for a continuation of the discussion :)

Roman

>
> M.
>
>
> ----- Original Message -----
>> On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr <rmohr at redhat.com> wrote:
>> > Hi Everyone,
>> >
>> > I wanted to discuss a practice which seems to be pretty common in the
>> > engine which I find very limiting, dangerous and for some things it
>> > can even be a blocker.
>> >
>> > There are several places in the engine where we are using maps as
>> > cache in singletons to avoid reloading data from the database. Two
>> > prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
>> >
>> > While it looks tempting to just use a map as cache, add some locks
>> > around it and create an injectable singleton, this has some drawbacks:
>> >
>> > 1) We have an autoritative source for our data and it offers
>> > transactions to take care of inconsistencies or parallel updates.
>> > Doing all that in a service again duplicates this.
>> > 2) Caching on the service layer is definitely not a good idea. It can
>> > introduce unwanted side effects when someone invokes the DAOs
>> > directly.
>> > 3) The point is more about the question if a cache is really needed:
>> > Do I just want that cache because I find it convenient to do a
>> > #getMacPoolForCluster(Guid clusterId) in a loop instead of just
>> > loading it once before the loop, or do my usage requirements really
>> > force me to use a cache?
>> >
>> > If you really need a cache, consider the following:
>> >
>> > 1) Do the caching on the DAO layer. This guarantees the best
>> > consistency across the data.
>> > 2) Yes this means either locking in the DAOs or a transactional cache.
>> > But before you complain, think about what in [1] and [2] is done. We
>> > do exactly that there, so the complexity is already introduced anyway.
>> > 3) Since we are working with transactions, a custom cache should NEVER
>> > cache writes (really just talking about our use case here). This makes
>> > checks for existing IDs before adding an entity or similar checks
>> > unnecessary, don't duplicate constraint checks like in [2].
>> > 4) There should always be a way to disable the cache (even if it is
>> > just for testing).
>> > 5) If I can't convince you to move the cache to the DAO layer, still
>> > add a way to disable the cache.
>> >
>>
>> I forgot to mention one thing: There are of course cases where
>> something is loaded on startup. Mostly things which can have multiple
>> sources.
>> For instance for the application configuration itself it is pretty
>> common, or like in the scheduler the scheduling policies where some
>> are Java only,
>> some are coming from other sources. It is still good
>>
>> But for normal business entities accessing parts of it through
>> services and parts of it through services is not the best thing to do
>> (if constructiong the whole business entity out of multiple daos is
>> complex, Repositories can help, but the cache should still be in the
>> dao layer).
>> I hope  you get what I mean.
>>
>> > For as long as there is no general caching solution with something
>> > like ehcache or infinispan, in my eyes such small things matter a lot
>> > to keep a project maintainable.
>> >
>> > That are some of the best practices I have seen around caching
>> > database data. It would be great if we could agree on something like
>> > that. Maybe there is already an agreement on something and I am just
>> > not aware of it.
>> >
>> > Looking forward to hear your feedback.
>> >
>> > Roman
>> >
>> > [1]
>> > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/quota/QuotaManager.java
>> > [2]
>> > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/macpool/MacPoolPerCluster.java
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>>

[1] https://gerrit.ovirt.org/#/c/59912/