[ovirt-devel] Writing SQL queries in DAO code

Thu Mar 30 07:31:28 UTC 2017

On Wed, Mar 29, 2017 at 4:23 PM, Martin Mucha <mmucha at redhat.com> wrote:

> Hi,
>
> I didn't want to step into this, but I was asked by dan to express my
> opinion about this. I'm not pushing for change (since I believe it will be
> blocked), I'm just trying to help us focus. We should have some questions
> answered prior to saying what we want to do...
>
> Current state of accessing db is that there's fired lots of unnecessary db
> calls, just because there isn't easy way to do them correctly, someone is
> lazy, whatever, that isn't important. The blame was laid on invalid code
> review. Now let me remember maintainer Kolesnik, who during CR *insisted*
> on creating N+1 problem[1] when accessing db, to save ±10 lines of code in
> dal layer. So Eli is right, our code suck because of improper code revision
> process, but sometimes it is maintainers who wants invalid code to be
> pushed to simplify db access code. Therefore we cannot rely on review
> process, this is real issue we have. I'm fine with it, this is how we do
> things. But we also have (sometimes (a lot)) degraded performance because
> of it.
>
> We should have answered following questions:
>
> 1) is performance even important at all? Should we optimize db access? I'm
> assuming "yes" should be the answer. If it's not the answer, we can skip
> next questions.
>
> 2) can someone responsible for whole project provide timings and count of
> queries? This should easily bring crosshair on commands requiring attention.
>   2.1) customer is issuing synchronous command. What is maximum number of
> queries this command (plus all transitive internal commands) should fire?
> What's the number for asynchronous command?
>   2.2) same for time; what is the maximum time for synchronous and
> asynchronous command?
>
> 3) lets say that we found command which requires optimization (and that
> should not be hard). If we go back to mentioned N+1 problem, it's rather
> easy to avoid it for one association(yet we did not do it even there), but
> it's not that easy for multiple joins [1]. But it's superbly easy to
> achieve it using ORM tool, even with still using only named queries, even
> native ones defined outside of java. Supposing we want to optimize found
> command, what would dal maintainers consider as a acceptable optimization?
> Is there even possibility to not use stored procedures? I believe saying
> simple "no", if that's the case, is valid response, which can save time. If
> there is theoretical possibility, what criteria must be met so that we can
> consider replacing most trivial stored procedures with some tool in our
> project? There are lots of tools/frameworks, based on your criteria we
> might find some, which would work best...
>
> —————
>
> My generic opinion would be to stick with sp and allowing to use something
> less heavy for simplest queries/stuff. It could beJPA using ORM and
> entities. Or named queries (still using entities) or nativequeries (which
> does not use entities, but plain sql) and both ends up creating prepared
> statements on server startup. Or we could use some lighter orm. But I'd
> definitely avoid writing any new homebrewed approach, this isn't a sane
> option.
>

In that case , if we will not use SP , we still will have to secure the
data (for example , hidden columns for some users like in the VDS view)

>
> [1] example: Lets discuss query association A—>B—>C, 5xA, each A has 5B
> etc. In our typical dao this will mean 1+5+5*5=31 queries. Same with
> properly annotated ORM sums up to 1 or 3 queries.
>
>
> M.
>
> On Mon, Mar 27, 2017 at 5:38 PM, Yevgeny Zaspitsky <yzaspits at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Mar 27, 2017 at 5:30 PM, Yevgeny Zaspitsky <yzaspits at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Mar 27, 2017 at 12:45 PM, Eli Mesika <emesika at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 27, 2017 at 12:26 PM, Yevgeny Zaspitsky <
>>>> yzaspits at redhat.com> wrote:
>>>>
>>>>> > I don't think that looking in SQL in Java code is clear than
>>>>> looking in a SP code
>>>>>
>>>>> Looking in SQL isn't the problem I'm trying to solve, but finding that.
>>>>>
>>>>> > 1) You will pass more data on the wire instead of calling a SP with
>>>>> parameters
>>>>>
>>>>> Passing the SQL string shouldn't be a problem unless that is very long
>>>>> one (several K) and then we probably do something wrong.
>>>>>
>>>>> > 2) Your data that is passed on the wire is exposed to attacks since
>>>>> you will have to implement DB security in the engine level (for example
>>>>> hidden columns).
>>>>>
>>>>> IIUC currently querying tables/views with hidden columns is
>>>>> implemented in a SP that consist of at least 2 SQL's, so those aren't good
>>>>> candidates for my proposal and will stay as is.
>>>>> BTW how other projects resolve the security problem? AFAIK usually
>>>>> hidden columns are hidden by defining a proper view that do not include
>>>>> those.
>>>>>
>>>>
>>>> That's a bad solution as long as your data is passed unmasked on the
>>>> wire 
>>>>
>>>
>>> In some cases we do send a sensitive info in the current solution that
>>> uses SP's.
>>> Should we use (maybe already) a kind of secured connection to DB?
>>>
>>> Database security should be done in the database level and you will not
>>>> be able to do that without using SPs
>>>>
>>>
>>> I'm not a DB security expert, but from my experience from my previous
>>> jobs none of them used "SP's only" approach, in fact they didn't use SP's
>>> at all.
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> > 3) Changes in the SQL code done in patches may be more complicated
>>>>> to track
>>>>>
>>>>> Most of SQL code changes involve changes in a DAO too, so this
>>>>> shouldn't be an issue.
>>>>>
>>>> It is !!! , changes in DAO are easy to track since it is a Java code ,
>>>> you are suggesting to write the SQL itself 
>>>>
>>>
>>> If you need to update a Java code together with SQL, then how much
>>> difference is in seeing the change in a single file or in two separate ones?
>>> Frankly speaking I do not get what do you mean by "track". If you're
>>> about following what a DAO method does, then my approach simplifies the job.
>>>
>>>>
>>>>
>>>>>
>>>>> > 4) SQL Injection
>>>>>
>>>>> Again, how other projects resolve the security problem? Internet is
>>>>> full of articles/blogs of why stored procedures should be avoided ("avoid
>>>>> 
>>>>> stored procedures" Google query returned ~6.6M results), so I guess
>>>>> there are some other approaches for the security issue if that exists.
>>>>>
>>>>
>>>> "use stored procedures" Google query returned About 4,840,000 results 
>>>>
>>>> How many of those add "do not" to the search term? Anyhow avoid gives
>>> more results...
>>>
>>
>> Again, I'm not a DB security expert, but IIUC using prepared statements
>> with parameters (what my patches do) doesn't allow a SQL injection. On the
>> other side the existing SearchDao.getAllWithQuery implementors are exposed
>> to SQL injection attempts. Am I right?
>>
>>
>>>>
>>>>>
>>>>> > 5) I think that SP performs better than SQL inside Java
>>>>>
>>>>> Do we have a measurement of how much better?
>>>>> One of my intentions for this proposal is that people evidently avoid
>>>>> creating neat SQL queries and re-use the existing ones, which has much
>>>>> bigger performance impact. I guess that the biggest limit here is how
>>>>> complicated that procedure is in the engine code.
>>>>>
>>>>
>>>> That's why we have code review process and we should nack such
>>>> patches , so you think that if people are not familiar with Java8 syntax
>>>> for example we should move this code to be performed by a "easier"
>>>> mechanism ?
>>>> If people are not doing the right thing , we have gerrit for that, we
>>>> can comment , nack , whatever to make the code good
>>>>
>>>
>>> In a perfect word you're right, proper reviews should've stop that. But
>>> in the hard reality of oVirt code base it doesn't happen somehow... My
>>> proposal bases on the current situation of oVirt code that was created by
>>> using all Gerrit features.
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 27, 2017 at 11:09 AM, Eli Mesika <emesika at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 27, 2017 at 1:56 AM, Moti Asayag <masayag at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 26, 2017 at 5:12 PM, Yevgeny Zaspitsky <
>>>>>>> yzaspits at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Recently I had a task of performance improvement in one of our
>>>>>>>> network related flows and had some hard time following our DAL code and one
>>>>>>>> of the outcomes of the task was defining a couple of new quite simple, but
>>>>>>>> neat queries.
>>>>>>>> When I came to coding those new queries I remembered how hard was
>>>>>>>> following the existing DAL code, I decided to make my own new methods
>>>>>>>> clearer. So I created [1] and [2] patches.
>>>>>>>>
>>>>>>>> Everything is quite standard there beside the fact that they do not
>>>>>>>> use any stored procedure, but use SQL directly, IMHO by that they save time
>>>>>>>> that I spent in trying to follow what a DAO method does. Looking into the
>>>>>>>> method code you get the understanding of what this method is all about:
>>>>>>>>
>>>>>>>>    - no looking for a stored procedure name that is buried in the
>>>>>>>>    DAO class hierarchy
>>>>>>>>    - no looking for the SP definition
>>>>>>>>
>>>>>>>>
>>>>>>> There are additional pros and cons for the suggestion to consider:
>>>>>>>
>>>>>>> Pros:
>>>>>>>
>>>>>>>    1. No need to run engine-setup after changing DB related code
>>>>>>>    (in case of SQL inside Java).
>>>>>>>
>>>>>>> Cons:
>>>>>>>
>>>>>>>    1. DAO files might become very long.
>>>>>>>    2. If you decide to return the business entity associated with
>>>>>>>    the DAO as a returned object, you won't know as a caller which fields to
>>>>>>>    expect to be populated, which lead to 3:
>>>>>>>    3. An inflation of business entities to represent partial
>>>>>>>    populated business entity or inflation of mappers inflation (this will be
>>>>>>>    required for SP as well).
>>>>>>>    4. SQL code inside of Java:
>>>>>>>    1. Beside of the fact that a multi-line concatenated string that
>>>>>>>       cannot be easily copied and run with psql, it means that we should compile
>>>>>>>       the code in order to test the change (vs building with DEV_REBUILD=0 which
>>>>>>>       only package the SQL file).
>>>>>>>       2. No syntax highlighting when performing code review. i.e. I
>>>>>>>       don't think reviewing a patch such as
>>>>>>>       https://gerrit.ovirt.org/#/c/66729/10/packaging/dbscripts/ne
>>>>>>>       twork_sp.sql
>>>>>>>       <https://gerrit.ovirt.org/#/c/66729/10/packaging/dbscripts/network_sp.sql>
>>>>>>>       would be more clear inside a java file.
>>>>>>>       3. The user permissions management is implemented on DB
>>>>>>>       level. That means that SQL will be more complex (more concatenated java
>>>>>>>       lines).
>>>>>>>    5. Stored procedure are cached by project's code. See
>>>>>>>    SimpleJdbcCallsHandler.getCall(), while the
>>>>>>>    NamedParameterJdbcTemplate are cached by spring's code which seems less
>>>>>>>    optimal (sync all calls using synchronized vs using ConcurrentHashMap as in
>>>>>>>    SP cache).
>>>>>>>    6. With the NamedParametersJdbcTemplate there is no use of the
>>>>>>>    DbEngineDialect. What's the impact of it ?
>>>>>>>
>>>>>>> So besides 5 and 6, the rest is a matter of style. I'd like to hear
>>>>>>> more opinions from other members.
>>>>>>>
>>>>>>
>>>>>> I agree with all you wrote Moti
>>>>>> I don't think that looking in SQL in Java code is clear than looking
>>>>>> in a SP code
>>>>>> Additionally
>>>>>> 1) You will pass more data on the wire instead of calling a SP with
>>>>>> parameters
>>>>>> 2) Your data that is passed on the wire is exposed to attacks since
>>>>>> you will have to implement DB security in the engine level (for example
>>>>>> hidden columns)
>>>>>> 3) Changes in the SQL code done in patches may be more complicated to
>>>>>> track
>>>>>> 4) SQL Injection
>>>>>> 5) I think that SP performs better than SQL inside Java
>>>>>>
>>>>>> I see no real reason to replace the SPs with SQL code , SP is just a
>>>>>> container for SQL code
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Moti
>>>>>>>
>>>>>>> So I'd like to propose moving towards this approach in general in
>>>>>>>> all cases when nothing beyond a simple SQL is needed (no stored procedure
>>>>>>>> programming language is needed).
>>>>>>>> From my experience with the performance improvement task it looks
>>>>>>>> like people avoid adding new queries for a specific need of a flow, instead
>>>>>>>> they use the existing general ones (e.g. dao.getAllForX()) and do the
>>>>>>>> actual join in the bll code.
>>>>>>>> IMHO the proposed approach would simplify adding new specific
>>>>>>>> queries and by that would prevent a decent part of performance issues in
>>>>>>>> the future.
>>>>>>>>
>>>>>>>> I do not propose changing all existing SP's to inline queries in a
>>>>>>>> once, but to allow adding new queries this way. Also we might consider
>>>>>>>> converting relatively simple SP's to inline SQL statements later in a
>>>>>>>> graduate way.
>>>>>>>>
>>>>>>>> [1] - https://gerrit.ovirt.org/#/c/74456
>>>>>>>> [2] - https://gerrit.ovirt.org/#/c/74458
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Yevgeny
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Devel mailing list
>>>>>>>> Devel at ovirt.org
>>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Moti
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Devel mailing list
>>>>>>> Devel at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170330/7bc674cb/attachment-0001.html>