Re: [Engine-devel] short research on postgresql enums

15 Dec 2011


      ----- Original Message -----
...
From: "Mike Kolesnik" <mkolesni@redhat.com>
To: "Itamar Heim" <iheim@redhat.com>
Cc: engine-devel@ovirt.org
Sent: Thursday, December 15, 2011 1:50:05 PM
Subject: Re: [Engine-devel] short research on postgresql enums
----- Original Message -----
...
On 12/14/2011 10:51 AM, Roy Golan wrote:
...
On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
...
----- Original Message -----
...
From: "Itamar Heim"<iheim@redhat.com>
To: "Maor"<mlipchuk@redhat.com>
Cc: engine-devel@ovirt.org, "Laszlo
Hornyak"<lhornyak@redhat.com>
Sent: Sunday, December 11, 2011 4:57:57 PM
Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
...
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
> Hi,
>
> I looked into Mike's database patch (
> http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's
> comment on it. I have seen another patches related to enums
> and
> how they are stored in the database. I made a quick test to
> compare between varchar and enum and the results are here:
> http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
+1 on the post!
Thank you :)
...
If you could update it with benchmark data vs. INTEGER also (as used
today) I think it'll give us a better perspective as to what to
expect.
I did not do that test because I was quite sure that the integer == to the enum since they both take 4 bytes. But I will give it a try.
I am wondering why enum is 4 bytes, 2 bytes would be just enough. That is 2^16 possible values, I can't imagine an enum exceeding that range.

So in fact when moving from integer to varchar, we have to give up some performance :-( but IMO this is a neccesary step if we want to move toward JPA/hibernate without using the evil ordinals.
I think it will pay back once we can build on JPA features such as caching.
...
From what I see there isn't a big gain from using enums for your
sample data, so what we can do it start with VARCHAR and switch to
enum in cases where we actually need performance/size improvement.
...
...
...
...
...
>
>
> IMO enums could be a good solution, but changing enums could
> be
> a
> pain under postgres 9.1. So what if we could use varchar now
> and
> migrate to enum once postgres 9.1 replaces the older
> installations :)
>
> Laszlo
+1 liked the article.
I wonder, how common are database enums among different DB
providers,
for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path
of
moving
to hibernate.
I did not do a test with hibernate yet, but since a postgresql
enum
column is read/written as character data, I believe hibernate
would
not see a difference between enum and varchar. On other RDBMS'
these
columns could be just plain varchar columns. So we can look at
this as
a PostfgreSQL-specific optimization.
I agree that if we want to use this on any sort of DB, this should be
transparent to the layer above.
Unfortunately, it is not the case for Hibernate as there is a bug
open on it:
  https://hibernate.onjira.com/browse/HHH-5188
In short, what's going on is that EnumType of Hibernate is using
rs.getObject() to get the enum column value, expecting the
underlying Driver to cast it to String/int for it (maybe there's a
problem with set, too).
You can see it at line 85:
https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/m...
So for PGSQL enums, the rs.getObject() will actually return an
instance of PGObject which needs to be treated specially, which is
not quite good if we would want to use Hibernate.
However, currently we're not using Hibernate, so when we do switch to
it (if ever) then either this issue will be fixed, or we can have a
workaround for it for postgres.
...
...
...
JPA supports ordinal or name (
http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html
).
The current JDBC DAO's persist using the custom logic written
into
the
enums. I am not sure this is portable to JPA. So if we move to
JPA, we
will have a choice:
- persist using ordinal - I think we would want to avoid this,
since
the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus
unreliable.
afaik we are not using ordinal at all today. we are using the the
"value" of the enum (all enums are enumerated numericaly regardless
of
ordinal)
...
...
- persist using name - in this case the column can be an enum on
postgresql, which actually has the same performance as the
ordinal
values (and the custom DAO code), or it can be a varchar (if
enum
is
not supported by the rdbms) and it can be plain varchar, which
performs somewhat under the integers.
do we need to watch out from long enum-member names in that case?
Since it's VARCHAR, you can have it at very big lengths, and the DB
will optimize the storage anyway.
...
...
...
...
...
I also searched a little bit on the subject and found the
following
article:
http://www.designmagick.com/article/29/PostgreSQL-Enum-Types
Postgres 8.4.7 supports the CHECK function, which validate
specific
values. (not sure how is the performance).
but maybe we can use it, if we choose to adapt the enum idea
later.
The check is a nice-to-have IMHO, since just adding it to each enum
field will complicate things and degrade insert/update performance.
If we get it with the enum type, its good, if not then it's a layer
of protection that we can probably live without.
Another option is to have the checks working only on development DB,
since if the check fails it's definitely a bug and it's cool for
your system to crash&burn but on customer's DB I'd even avoid it.
Regards,
Mike
_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel