
----- Original Message -----
From: "Mike Kolesnik" <mkolesni@redhat.com> To: "Itamar Heim" <iheim@redhat.com> Cc: engine-devel@ovirt.org Sent: Thursday, December 15, 2011 1:50:05 PM Subject: Re: [Engine-devel] short research on postgresql enums
----- Original Message -----
On 12/14/2011 10:51 AM, Roy Golan wrote:
On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
----- Original Message -----
From: "Itamar Heim"<iheim@redhat.com> To: "Maor"<mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak"<lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote: > Hi, > > I looked into Mike's database patch ( > http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's > comment on it. I have seen another patches related to enums > and > how they are stored in the database. I made a quick test to > compare between varchar and enum and the results are here: > http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
+1 on the post!
Thank you :)
If you could update it with benchmark data vs. INTEGER also (as used today) I think it'll give us a better perspective as to what to expect.
I did not do that test because I was quite sure that the integer == to the enum since they both take 4 bytes. But I will give it a try. I am wondering why enum is 4 bytes, 2 bytes would be just enough. That is 2^16 possible values, I can't imagine an enum exceeding that range. So in fact when moving from integer to varchar, we have to give up some performance :-( but IMO this is a neccesary step if we want to move toward JPA/hibernate without using the evil ordinals. I think it will pay back once we can build on JPA features such as caching.
From what I see there isn't a big gain from using enums for your sample data, so what we can do it start with VARCHAR and switch to enum in cases where we actually need performance/size improvement.
> > > IMO enums could be a good solution, but changing enums could > be > a > pain under postgres 9.1. So what if we could use varchar now > and > migrate to enum once postgres 9.1 replaces the older > installations :) > > Laszlo +1 liked the article. I wonder, how common are database enums among different DB providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization.
I agree that if we want to use this on any sort of DB, this should be transparent to the layer above. Unfortunately, it is not the case for Hibernate as there is a bug open on it: https://hibernate.onjira.com/browse/HHH-5188
In short, what's going on is that EnumType of Hibernate is using rs.getObject() to get the enum column value, expecting the underlying Driver to cast it to String/int for it (maybe there's a problem with set, too). You can see it at line 85: https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/m...
So for PGSQL enums, the rs.getObject() will actually return an instance of PGObject which needs to be treated specially, which is not quite good if we would want to use Hibernate.
However, currently we're not using Hibernate, so when we do switch to it (if ever) then either this issue will be fixed, or we can have a workaround for it for postgres.
JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus unreliable.
afaik we are not using ordinal at all today. we are using the the "value" of the enum (all enums are enumerated numericaly regardless of ordinal)
- persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers. do we need to watch out from long enum-member names in that case?
Since it's VARCHAR, you can have it at very big lengths, and the DB will optimize the storage anyway.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
The check is a nice-to-have IMHO, since just adding it to each enum field will complicate things and degrade insert/update performance. If we get it with the enum type, its good, if not then it's a layer of protection that we can probably live without.
Another option is to have the checks working only on development DB, since if the check fails it's definitely a bug and it's cool for your system to crash&burn but on customer's DB I'd even avoid it.
Regards, Mike _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel