[Engine-devel] short research on postgresql enums

Hi, I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :) Laszlo

----- Original Message -----
From: "Laszlo Hornyak" <lhornyak@redhat.com> To: engine-devel@ovirt.org Sent: Friday, December 9, 2011 10:43:43 PM Subject: [Engine-devel] short research on postgresql enums
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
+1
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

On 12/10/2011 09:02 PM, Eli Mesika wrote:
----- Original Message -----
From: "Laszlo Hornyak"<lhornyak@redhat.com> To: engine-devel@ovirt.org Sent: Friday, December 9, 2011 10:43:43 PM Subject: [Engine-devel] short research on postgresql enums
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
+1
several considerations: 1. for existing enums - their numbering is part of the "api" (for example, audit log even id. 2. they are also part of the "api" of the data warehouse. 3. space is a critical difference when you think of the data warehouse rather than on the operational db. that alone is reason enough to have these as numeric. 4. we need to provide i18n - not sure if this will be easier in numeric or string form. 5. if 9.1 will support enums, and currently all enums are stored as numeric - why not keep it consistent until moving to 9.1? however, worth remembering the fact a platform component brings a shiny new feature is not always a reason to try it before it matures a bit: a. it may be buggy (though it is fair to try and help it mature as well). b. it may not be available in the various distributions yet, making it harder to consume the code.

On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel +1 liked the article. I wonder, how common are database enums among different DB providers, for supporting DB migration in the future if needed.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.

On 12/11/2011 05:54 PM, Maor wrote:
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel +1 liked the article. I wonder, how common are database enums among different DB providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

----- Original Message -----
From: "Itamar Heim" <iheim@redhat.com> To: "Maor" <mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak" <lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel +1 liked the article. I wonder, how common are database enums among different DB
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote: providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization. JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this - persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
----- Original Message -----
From: "Itamar Heim"<iheim@redhat.com> To: "Maor"<mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak"<lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel +1 liked the article. I wonder, how common are database enums among different DB
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote: providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization.
JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus unreliable.
- persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers. do we need to watch out from long enum-member names in that case?
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

On 12/14/2011 10:51 AM, Roy Golan wrote:
On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
----- Original Message -----
From: "Itamar Heim"<iheim@redhat.com> To: "Maor"<mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak"<lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel +1 liked the article. I wonder, how common are database enums among different DB
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote: providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization.
JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus unreliable.
afaik we are not using ordinal at all today. we are using the the "value" of the enum (all enums are enumerated numericaly regardless of ordinal)
- persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers. do we need to watch out from long enum-member names in that case?
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

----- Original Message -----
On 12/14/2011 10:51 AM, Roy Golan wrote:
On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
----- Original Message -----
From: "Itamar Heim"<iheim@redhat.com> To: "Maor"<mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak"<lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote:
Hi,
I looked into Mike's database patch ( http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's comment on it. I have seen another patches related to enums and how they are stored in the database. I made a quick test to compare between varchar and enum and the results are here: http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
+1 on the post! If you could update it with benchmark data vs. INTEGER also (as used today) I think it'll give us a better perspective as to what to expect.
From what I see there isn't a big gain from using enums for your sample data, so what we can do it start with VARCHAR and switch to enum in cases where we actually need performance/size improvement.
IMO enums could be a good solution, but changing enums could be a pain under postgres 9.1. So what if we could use varchar now and migrate to enum once postgres 9.1 replaces the older installations :)
Laszlo
+1 liked the article. I wonder, how common are database enums among different DB providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization.
I agree that if we want to use this on any sort of DB, this should be transparent to the layer above. Unfortunately, it is not the case for Hibernate as there is a bug open on it: https://hibernate.onjira.com/browse/HHH-5188 In short, what's going on is that EnumType of Hibernate is using rs.getObject() to get the enum column value, expecting the underlying Driver to cast it to String/int for it (maybe there's a problem with set, too). You can see it at line 85: https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/m... So for PGSQL enums, the rs.getObject() will actually return an instance of PGObject which needs to be treated specially, which is not quite good if we would want to use Hibernate. However, currently we're not using Hibernate, so when we do switch to it (if ever) then either this issue will be fixed, or we can have a workaround for it for postgres.
JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus unreliable.
afaik we are not using ordinal at all today. we are using the the "value" of the enum (all enums are enumerated numericaly regardless of ordinal)
- persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers. do we need to watch out from long enum-member names in that case?
Since it's VARCHAR, you can have it at very big lengths, and the DB will optimize the storage anyway.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
The check is a nice-to-have IMHO, since just adding it to each enum field will complicate things and degrade insert/update performance. If we get it with the enum type, its good, if not then it's a layer of protection that we can probably live without. Another option is to have the checks working only on development DB, since if the check fails it's definitely a bug and it's cool for your system to crash&burn but on customer's DB I'd even avoid it. Regards, Mike

----- Original Message -----
From: "Mike Kolesnik" <mkolesni@redhat.com> To: "Itamar Heim" <iheim@redhat.com> Cc: engine-devel@ovirt.org Sent: Thursday, December 15, 2011 1:50:05 PM Subject: Re: [Engine-devel] short research on postgresql enums
----- Original Message -----
On 12/14/2011 10:51 AM, Roy Golan wrote:
On Mon 12 Dec 2011 10:43:39 AM IST, Laszlo Hornyak wrote:
----- Original Message -----
From: "Itamar Heim"<iheim@redhat.com> To: "Maor"<mlipchuk@redhat.com> Cc: engine-devel@ovirt.org, "Laszlo Hornyak"<lhornyak@redhat.com> Sent: Sunday, December 11, 2011 4:57:57 PM Subject: Re: [Engine-devel] short research on postgresql enums
On 12/11/2011 05:54 PM, Maor wrote:
On 12/09/2011 10:43 PM, Laszlo Hornyak wrote: > Hi, > > I looked into Mike's database patch ( > http://gerrit.ovirt.org/#change,500 ) today and read Yaniv's > comment on it. I have seen another patches related to enums > and > how they are stored in the database. I made a quick test to > compare between varchar and enum and the results are here: > http://dummywarhead.blogspot.com/2011/12/postgresql-enums-vs-varchar.html
+1 on the post!
Thank you :)
If you could update it with benchmark data vs. INTEGER also (as used today) I think it'll give us a better perspective as to what to expect.
I did not do that test because I was quite sure that the integer == to the enum since they both take 4 bytes. But I will give it a try. I am wondering why enum is 4 bytes, 2 bytes would be just enough. That is 2^16 possible values, I can't imagine an enum exceeding that range. So in fact when moving from integer to varchar, we have to give up some performance :-( but IMO this is a neccesary step if we want to move toward JPA/hibernate without using the evil ordinals. I think it will pay back once we can build on JPA features such as caching.
From what I see there isn't a big gain from using enums for your sample data, so what we can do it start with VARCHAR and switch to enum in cases where we actually need performance/size improvement.
> > > IMO enums could be a good solution, but changing enums could > be > a > pain under postgres 9.1. So what if we could use varchar now > and > migrate to enum once postgres 9.1 replaces the older > installations :) > > Laszlo +1 liked the article. I wonder, how common are database enums among different DB providers, for supporting DB migration in the future if needed.
that's a very good point, considering we are still on the path of moving to hibernate.
I did not do a test with hibernate yet, but since a postgresql enum column is read/written as character data, I believe hibernate would not see a difference between enum and varchar. On other RDBMS' these columns could be just plain varchar columns. So we can look at this as a PostfgreSQL-specific optimization.
I agree that if we want to use this on any sort of DB, this should be transparent to the layer above. Unfortunately, it is not the case for Hibernate as there is a bug open on it: https://hibernate.onjira.com/browse/HHH-5188
In short, what's going on is that EnumType of Hibernate is using rs.getObject() to get the enum column value, expecting the underlying Driver to cast it to String/int for it (maybe there's a problem with set, too). You can see it at line 85: https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/m...
So for PGSQL enums, the rs.getObject() will actually return an instance of PGObject which needs to be treated specially, which is not quite good if we would want to use Hibernate.
However, currently we're not using Hibernate, so when we do switch to it (if ever) then either this issue will be fixed, or we can have a workaround for it for postgres.
JPA supports ordinal or name ( http://docs.oracle.com/javaee/6/api/javax/persistence/EnumType.html ). The current JDBC DAO's persist using the custom logic written into the enums. I am not sure this is portable to JPA. So if we move to JPA, we will have a choice: - persist using ordinal - I think we would want to avoid this, since the code we have in enums and the DAO classes is to avoid this
+1 avoid in any case. Ordinal is very fragile and thus unreliable.
afaik we are not using ordinal at all today. we are using the the "value" of the enum (all enums are enumerated numericaly regardless of ordinal)
- persist using name - in this case the column can be an enum on postgresql, which actually has the same performance as the ordinal values (and the custom DAO code), or it can be a varchar (if enum is not supported by the rdbms) and it can be plain varchar, which performs somewhat under the integers. do we need to watch out from long enum-member names in that case?
Since it's VARCHAR, you can have it at very big lengths, and the DB will optimize the storage anyway.
I also searched a little bit on the subject and found the following article: http://www.designmagick.com/article/29/PostgreSQL-Enum-Types Postgres 8.4.7 supports the CHECK function, which validate specific values. (not sure how is the performance). but maybe we can use it, if we choose to adapt the enum idea later.
The check is a nice-to-have IMHO, since just adding it to each enum field will complicate things and degrade insert/update performance. If we get it with the enum type, its good, if not then it's a layer of protection that we can probably live without.
Another option is to have the checks working only on development DB, since if the check fails it's definitely a bug and it's cool for your system to crash&burn but on customer's DB I'd even avoid it.
Regards, Mike _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
participants (6)
-
Eli Mesika
-
Itamar Heim
-
Laszlo Hornyak
-
Maor
-
Mike Kolesnik
-
Roy Golan