[Engine-devel] short research on postgresql enums

Mike Kolesnik mkolesni at redhat.com
Thu Dec 15 12:50:05 UTC 2011



----- Original Message -----
> On 12/14/2011 10:51 AM, Roy Golan wrote:
> > On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Itamar Heim"<iheim at redhat.com>
> >>> To: "Maor"<mlipchuk at redhat.com>
> >>> Cc: engine-devel at ovirt.org, "Laszlo Hornyak"<lhornyak at redhat.com>
> >>> Sent: Sunday, December 11, 2011 4:57:57 PM
> >>> Subject: Re: [Engine-devel] short research on postgresql enums
> >>>
> >>> On 12/11/2011 05:54 PM, Maor wrote:
> >>>> On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I looked into Mike's database patch (
> >>>>> http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's
> >>>>> comment on it. I have seen another patches related to enums and
> >>>>> how they are stored in the database. I made a quick test to
> >>>>> compare between varchar and enum and the results are here:
> >>>>> http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html

+1 on the post!
If you could update it with benchmark data vs. INTEGER also (as used today) I think it'll give us a better perspective as to what to expect.

>From what I see there isn't a big gain from using enums for your sample data, so what we can do it start with VARCHAR and switch to enum in cases where we actually need performance/size improvement.

> >>>>>
> >>>>>
> >>>>> IMO enums could be a good solution, but changing enums could be
> >>>>> a
> >>>>> pain under postgres 9.1. So what if we could use varchar now
> >>>>> and
> >>>>> migrate to enum once postgres 9.1 replaces the older
> >>>>> installations :)
> >>>>>
> >>>>> Laszlo
> >>>> +1 liked the article.
> >>>> I wonder, how common are database enums among different DB
> >>>> providers,
> >>>> for supporting DB migration in the future if needed.
> >>>
> >>> that's a very good point, considering we are still on the path of
> >>> moving
> >>> to hibernate.
> >>
> >> I did not do a test with hibernate yet, but since a postgresql
> >> enum
> >> column is read/written as character data, I believe hibernate
> >> would
> >> not see a difference between enum and varchar. On other RDBMS'
> >> these
> >> columns could be just plain varchar columns. So we can look at
> >> this as
> >> a PostfgreSQL-specific optimization.

I agree that if we want to use this on any sort of DB, this should be transparent to the layer above.
Unfortunately, it is not the case for Hibernate as there is a bug open on it:
  https://hibernate.onjira.com/browse/HHH-5188

In short, what's going on is that EnumType of Hibernate is using rs.getObject() to get the enum column value, expecting the underlying Driver to cast it to String/int for it (maybe there's a problem with set, too).
You can see it at line 85: https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/type/EnumType.java

So for PGSQL enums, the rs.getObject() will actually return an instance of PGObject which needs to be treated specially, which is not quite good if we would want to use Hibernate.

However, currently we're not using Hibernate, so when we do switch to it (if ever) then either this issue will be fixed, or we can have a workaround for it for postgres.

> >>
> >> JPA supports ordinal or name (
> >> http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html
> >> ).
> >> The current JDBC DAO's persist using the custom logic written into
> >> the
> >> enums. I am not sure this is portable to JPA. So if we move to
> >> JPA, we
> >> will have a choice:
> >> - persist using ordinal - I think we would want to avoid this,
> >> since
> >> the code we have in enums and the DAO classes is to avoid this
> > +1 avoid in any case. Ordinal is very fragile and thus unreliable.
> 
> afaik we are not using ordinal at all today. we are using the the
> "value" of the enum (all enums are enumerated numericaly regardless
> of
> ordinal)
> 
> >> - persist using name - in this case the column can be an enum on
> >> postgresql, which actually has the same performance as the ordinal
> >> values (and the custom DAO code), or it can be a varchar (if enum
> >> is
> >> not supported by the rdbms) and it can be plain varchar, which
> >> performs somewhat under the integers.
> > do we need to watch out from long enum-member names in that case?

Since it's VARCHAR, you can have it at very big lengths, and the DB will optimize the storage anyway.

> >>
> >>>
> >>>>
> >>>> I also searched a little bit on the subject and found the
> >>>> following
> >>>> article:
> >>>> http://www.designmagick.com/article/29/PostgreSQL-Enum-Types
> >>>> Postgres 8.4.7 supports the CHECK function, which validate
> >>>> specific
> >>>> values. (not sure how is the performance).
> >>>> but maybe we can use it, if we choose to adapt the enum idea
> >>>> later.

The check is a nice-to-have IMHO, since just adding it to each enum field will complicate things and degrade insert/update performance. If we get it with the enum type, its good, if not then it's a layer of protection that we can probably live without.

Another option is to have the checks working only on development DB, since if the check fails it's definitely a bug and it's cool for your system to crash&burn but on customer's DB I'd even avoid it.

Regards,
Mike



More information about the Devel mailing list