Hi,
some of information in mail are not exactly true. Namely MacPoolPerCluster *does not do
caching*, it does not even have DB layer structures it could cache. So how it works is:
pool has configuration upon which it initialize itself. After that, it looks into db for
all used MACs, which currently happens to be querying all MAC addresses of all VmNics. So
this is initialized from data in DB, but it does not cache them. Clients [of pool] asks
pool for MAC address, which is then used somewhere without pool supervision. I don't
want to question this design, and I'm not saying that it wouldn't be possible to
move it's logic to db layer, but once, long, long time ago someone decided this to be
done on bll, and so it is on bll layer.
I understand, that these might come as a problem in arquillian testing, but that's to
be resolved, since, not all singletons are for caching. And even if they are, testing
framework should be able to cope with such common beans, we shouldn't limit ourselves
not to use singletons. Therefore, I wouldn't invest into changing these
'caches', but into allowing more complex setups in our testing. If it's not
possible, then 'reset' method is second best solution — we have to use write lock
as suggested in cr, and then it should be fine.
About drawbacks:
ad 1) yes, this comes as an extra problem, one has to deal with tx on his own, that's
true. This wasn't part of original solution, but it should be fixed already.
ad 2) No. Caching done correctly is done closest to the consumer. I assume you can
similarly ruin hibernate l2 cache via accessing data through different channel. But
that's kinda common to all caches — if you bypass them, you'll corrupt them. So do
not bypass them, or in this case, use them as they was designed. As it have been said, you
ask pool for mac, or inform it, that you're going to use your own, and then use it. It
means, that it's designed to actually bypass it on all writes. Therefore if someone
writes a code using MAC without prior notification to the pool about such action, it would
be a problem. To avoid this problem there has to be bigger refactor — pool would have to
persist MAC addresses somehow and not vmNicDao, or if moved entirelly to db layer, there
would have to be trigger on vmnic table or something like that...
ad 3) It was requested, to have at least tens of millions of macs in pool. Forget about
loop, initializing this structure for given clusterId is not acceptable even once per
request. Loading that structure is quite cheap (now), but not that cheap to allow what you
ask for. Moving whole stuff to db layer would be probably beneficial (when it was
originally implemented), but it's worthless doing it now.
About suggestions: neither of them applies to MacPoolPerCluster — point (3) for example:
since pool is not a simple cache of db structure, and does not have corresponding data in
db layer, it cannot cache writes and it even does not do any writes at all...
——
Surely I can imagine better implementation of this, but it'd require bigger changes
whose benefits aren't worth of cost of this change. (I hope that) we have to deal with
it. This was original design, and since testing framework (with changed caches or without)
should deal with singletons etc. anyways, it's not worthy to change it. If there
aren't any better options(I don't know), reinitializing bean can be used (and I
apologize for blocking that). I'd avoid bigger rewrites in this area.
M.
----- Original Message -----
On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr <rmohr(a)redhat.com>
wrote:
> Hi Everyone,
>
> I wanted to discuss a practice which seems to be pretty common in the
> engine which I find very limiting, dangerous and for some things it
> can even be a blocker.
>
> There are several places in the engine where we are using maps as
> cache in singletons to avoid reloading data from the database. Two
> prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
>
> While it looks tempting to just use a map as cache, add some locks
> around it and create an injectable singleton, this has some drawbacks:
>
> 1) We have an autoritative source for our data and it offers
> transactions to take care of inconsistencies or parallel updates.
> Doing all that in a service again duplicates this.
> 2) Caching on the service layer is definitely not a good idea. It can
> introduce unwanted side effects when someone invokes the DAOs
> directly.
> 3) The point is more about the question if a cache is really needed:
> Do I just want that cache because I find it convenient to do a
> #getMacPoolForCluster(Guid clusterId) in a loop instead of just
> loading it once before the loop, or do my usage requirements really
> force me to use a cache?
>
> If you really need a cache, consider the following:
>
> 1) Do the caching on the DAO layer. This guarantees the best
> consistency across the data.
> 2) Yes this means either locking in the DAOs or a transactional cache.
> But before you complain, think about what in [1] and [2] is done. We
> do exactly that there, so the complexity is already introduced anyway.
> 3) Since we are working with transactions, a custom cache should NEVER
> cache writes (really just talking about our use case here). This makes
> checks for existing IDs before adding an entity or similar checks
> unnecessary, don't duplicate constraint checks like in [2].
> 4) There should always be a way to disable the cache (even if it is
> just for testing).
> 5) If I can't convince you to move the cache to the DAO layer, still
> add a way to disable the cache.
>
I forgot to mention one thing: There are of course cases where
something is loaded on startup. Mostly things which can have multiple
sources.
For instance for the application configuration itself it is pretty
common, or like in the scheduler the scheduling policies where some
are Java only,
some are coming from other sources. It is still good
But for normal business entities accessing parts of it through
services and parts of it through services is not the best thing to do
(if constructiong the whole business entity out of multiple daos is
complex, Repositories can help, but the cache should still be in the
dao layer).
I hope you get what I mean.
> For as long as there is no general caching solution with something
> like ehcache or infinispan, in my eyes such small things matter a lot
> to keep a project maintainable.
>
> That are some of the best practices I have seen around caching
> database data. It would be great if we could agree on something like
> that. Maybe there is already an agreement on something and I am just
> not aware of it.
>
> Looking forward to hear your feedback.
>
> Roman
>
> [1]
>
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules...
> [2]
>
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules...
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel