[Engine-devel] short research on postgresql enums

Laszlo Hornyak lhornyak at redhat.com
Thu Dec 15 13:34:45 UTC 2011



----- Original Message -----
> From: "Mike Kolesnik" <mkolesni at redhat.com>
> To: "Itamar Heim" <iheim at redhat.com>
> Cc: engine-devel at ovirt.org
> Sent: Thursday, December 15, 2011 1:50:05 PM
> Subject: Re: [Engine-devel] short research on postgresql enums
> 
> 
> 
> ----- Original Message -----
> > On 12/14/2011 10:51 AM, Roy Golan wrote:
> > > On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
> > >>
> > >>
> > >> ----- Original Message -----
> > >>> From: "Itamar Heim"<iheim at redhat.com>
> > >>> To: "Maor"<mlipchuk at redhat.com>
> > >>> Cc: engine-devel at ovirt.org, "Laszlo
> > >>> Hornyak"<lhornyak at redhat.com>
> > >>> Sent: Sunday, December 11, 2011 4:57:57 PM
> > >>> Subject: Re: [Engine-devel] short research on postgresql enums
> > >>>
> > >>> On 12/11/2011 05:54 PM, Maor wrote:
> > >>>> On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> I looked into Mike's database patch (
> > >>>>> http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's
> > >>>>> comment on it. I have seen another patches related to enums
> > >>>>> and
> > >>>>> how they are stored in the database. I made a quick test to
> > >>>>> compare between varchar and enum and the results are here:
> > >>>>> http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
> 
> +1 on the post!

Thank you :)

> If you could update it with benchmark data vs. INTEGER also (as used
> today) I think it'll give us a better perspective as to what to
> expect.

I did not do that test because I was quite sure that the integer == to the enum since they both take 4 bytes. But I will give it a try.
I am wondering why enum is 4 bytes, 2 bytes would be just enough. That is 2^16 possible values, I can't imagine an enum exceeding that range.

So in fact when moving from integer to varchar, we have to give up some performance :-( but IMO this is a neccesary step if we want to move toward JPA/hibernate without using the evil ordinals.
I think it will pay back once we can build on JPA features such as caching.

> 
> From what I see there isn't a big gain from using enums for your
> sample data, so what we can do it start with VARCHAR and switch to
> enum in cases where we actually need performance/size improvement.
> 
> > >>>>>
> > >>>>>
> > >>>>> IMO enums could be a good solution, but changing enums could
> > >>>>> be
> > >>>>> a
> > >>>>> pain under postgres 9.1. So what if we could use varchar now
> > >>>>> and
> > >>>>> migrate to enum once postgres 9.1 replaces the older
> > >>>>> installations :)
> > >>>>>
> > >>>>> Laszlo
> > >>>> +1 liked the article.
> > >>>> I wonder, how common are database enums among different DB
> > >>>> providers,
> > >>>> for supporting DB migration in the future if needed.
> > >>>
> > >>> that's a very good point, considering we are still on the path
> > >>> of
> > >>> moving
> > >>> to hibernate.
> > >>
> > >> I did not do a test with hibernate yet, but since a postgresql
> > >> enum
> > >> column is read/written as character data, I believe hibernate
> > >> would
> > >> not see a difference between enum and varchar. On other RDBMS'
> > >> these
> > >> columns could be just plain varchar columns. So we can look at
> > >> this as
> > >> a PostfgreSQL-specific optimization.
> 
> I agree that if we want to use this on any sort of DB, this should be
> transparent to the layer above.
> Unfortunately, it is not the case for Hibernate as there is a bug
> open on it:
>   https://hibernate.onjira.com/browse/HHH-5188
> 
> In short, what's going on is that EnumType of Hibernate is using
> rs.getObject() to get the enum column value, expecting the
> underlying Driver to cast it to String/int for it (maybe there's a
> problem with set, too).
> You can see it at line 85:
> https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/type/EnumType.java
> 
> So for PGSQL enums, the rs.getObject() will actually return an
> instance of PGObject which needs to be treated specially, which is
> not quite good if we would want to use Hibernate.
> 
> However, currently we're not using Hibernate, so when we do switch to
> it (if ever) then either this issue will be fixed, or we can have a
> workaround for it for postgres.
> 
> > >>
> > >> JPA supports ordinal or name (
> > >> http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html
> > >> ).
> > >> The current JDBC DAO's persist using the custom logic written
> > >> into
> > >> the
> > >> enums. I am not sure this is portable to JPA. So if we move to
> > >> JPA, we
> > >> will have a choice:
> > >> - persist using ordinal - I think we would want to avoid this,
> > >> since
> > >> the code we have in enums and the DAO classes is to avoid this
> > > +1 avoid in any case. Ordinal is very fragile and thus
> > > unreliable.
> > 
> > afaik we are not using ordinal at all today. we are using the the
> > "value" of the enum (all enums are enumerated numericaly regardless
> > of
> > ordinal)
> > 
> > >> - persist using name - in this case the column can be an enum on
> > >> postgresql, which actually has the same performance as the
> > >> ordinal
> > >> values (and the custom DAO code), or it can be a varchar (if
> > >> enum
> > >> is
> > >> not supported by the rdbms) and it can be plain varchar, which
> > >> performs somewhat under the integers.
> > > do we need to watch out from long enum-member names in that case?
> 
> Since it's VARCHAR, you can have it at very big lengths, and the DB
> will optimize the storage anyway.
> 
> > >>
> > >>>
> > >>>>
> > >>>> I also searched a little bit on the subject and found the
> > >>>> following
> > >>>> article:
> > >>>> http://www.designmagick.com/article/29/PostgreSQL-Enum-Types
> > >>>> Postgres 8.4.7 supports the CHECK function, which validate
> > >>>> specific
> > >>>> values. (not sure how is the performance).
> > >>>> but maybe we can use it, if we choose to adapt the enum idea
> > >>>> later.
> 
> The check is a nice-to-have IMHO, since just adding it to each enum
> field will complicate things and degrade insert/update performance.
> If we get it with the enum type, its good, if not then it's a layer
> of protection that we can probably live without.
> 
> Another option is to have the checks working only on development DB,
> since if the check fails it's definitely a bug and it's cool for
> your system to crash&burn but on customer's DB I'd even avoid it.
> 
> Regards,
> Mike
> _______________________________________________
> Engine-devel mailing list
> Engine-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
> 



More information about the Engine-devel mailing list