OVIRT_ENGINE_DWH org.postgresql.util.PSQLException:ERROR: more than one row returned by a subquery used as an expression

After many attempts and many uninstalls, (wiping of the ovirt_engine_history database, removal of the associated database user, removal of all ovirt-engine-dwh folders and configs in /etc, and so on...) and reinstalls, I can't figure out how to get Data Warehouse to work properly. The dashboard isn't getting populated with data and this error message constantly shows in Alerts and Events in the Web Administration console: "ETL service sampling has encountered an error. Please consult the service log for more details." Also, the file /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log shows the following error repeatedly: 2017-XX-XX 13:45:01|BFAPtJ|3TsCll|B5fdZD|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1 Exception in component tJDBCInput_9 org.postgresql.util.PSQLException: ERROR: more than one row returned by a subquery used as an expression at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:283) at ovirt_engine_dwh.configurationsync_4_1.ConfigurationSync.tJDBCInput_9Process(ConfigurationSync.java:3727) at ovirt_engine_dwh.configurationsync_4_1.ConfigurationSync$2.run(ConfigurationSync.java:60001) 2017-XX-XX 13:45:01|wqfyxb|3TsCll|76BFD3|OVIRT_ENGINE_DWH|ConfigurationSync|Default|6|Java Exception|tJDBCInput_9|org.postgresql.util.PSQLException:ERROR: more than one row returned by a subquery used as an expression|1 Exception in component tRunJob_1 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1198) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs$2.run(SampleRunJobs.java:2683) 2017-XX-XX 13:45:01|76BFD3|3TsCll|FlSijV|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1 Exception in component tRunJob_1 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6067) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5809) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4444) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4319) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4188) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3593) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:2977) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2295) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1649) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11363) I just finished updating to oVirt 4.1.5 earlier today. My Internet searches have not provided me with any further insight. Has anyone else experienced this issue? Has anyone else resolved this issue? What should I be looking at to start debugging this issue (a section of the database perhaps...)? Thanks, Charles

On Wed, Aug 23, 2017 at 9:06 PM, Charles Gruener <cjg9411@rit.edu> wrote:
After many attempts and many uninstalls, (wiping of the ovirt_engine_history database, removal of the associated database user, removal of all ovirt-engine-dwh folders and configs in /etc, and so on...) and reinstalls, I can't figure out how to get Data Warehouse to work properly. The dashboard isn't getting populated with data and this error message constantly shows in Alerts and Events in the Web Administration console:
"ETL service sampling has encountered an error. Please consult the service log for more details."
Also, the file /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log shows the following error repeatedly:
2017-XX-XX 13:45:01|BFAPtJ|3TsCll|B5fdZD|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1 Exception in component tJDBCInput_9 org.postgresql.util.PSQLException: ERROR: more than one row returned by a subquery used as an expression at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:283) at ovirt_engine_dwh.configurationsync_4_1.ConfigurationSync.tJDBCInput_9Process(ConfigurationSync.java:3727)
This is the first line in the above stack trace that is from dwh (not from postgresql libraries). If you check the sources looking at this file around line 3727, you see: String dbquery_tJDBCInput_9 = "SELECT\n user_id,\n first_name,\n last_name,\n domain,\n username,\n department,\n user_role_title,\n email,\n external_id,\n active,\n create_date,\n update_date\nFROM dwh_users_history_view"; globalMap.put("tJDBCInput_9_QUERY", dbquery_tJDBCInput_9); java.sql.ResultSet rs_tJDBCInput_9 = null; try { rs_tJDBCInput_9 = stmt_tJDBCInput_9 .executeQuery(dbquery_tJDBCInput_9); So I guess the query selects stuff from dwh_users_history_view. This view is defined in the engine sources (not dwh), if you search its name you find it at packaging/dbscripts/create_dwh_views.sql: CREATE OR REPLACE VIEW dwh_users_history_view AS SELECT user_id, name AS first_name, surname AS last_name, DOMAIN, username, department, '' AS user_role_title, email, external_id, TRUE AS active, _create_date AS create_date, _update_date AS update_date FROM users WHERE ( ( _create_date > ( SELECT var_datetime FROM dwh_history_timekeeping WHERE (var_name = 'lastSync') ) ) OR ( _update_date > ( SELECT var_datetime FROM dwh_history_timekeeping AS history_timekeeping_1 WHERE (var_name = 'lastSync') ) ) ); So, as you see, it has subqueries selecting from dwh_history_timekeeping, which is a table in the engine db used by dwh to track various stuff about its operation.
at ovirt_engine_dwh.configurationsync_4_1.ConfigurationSync$2.run(ConfigurationSync.java:60001) 2017-XX-XX 13:45:01|wqfyxb|3TsCll|76BFD3|OVIRT_ENGINE_DWH|ConfigurationSync|Default|6|Java Exception|tJDBCInput_9|org.postgresql.util.PSQLException:ERROR: more than one row returned by a subquery used as an expression|1 Exception in component tRunJob_1 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1198) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642) at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs$2.run(SampleRunJobs.java:2683) 2017-XX-XX 13:45:01|76BFD3|3TsCll|FlSijV|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1 Exception in component tRunJob_1 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6067) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5809) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4444) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4319) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4188) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3593) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:2977) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2295) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1649) at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11363)
I just finished updating to oVirt 4.1.5 earlier today. My Internet searches have not provided me with any further insight. Has anyone else experienced this issue? Has anyone else resolved this issue? What should I be looking at to start debugging this issue (a section of the database perhaps...)?
You can try looking at the dwh_history_timekeeping table in the engine (not dwh) database: su - postgres -c 'psql engine -c "select * from dwh_history_timekeeping;"' Most likely you'll find there more than one line with var_name 'lastSync'. How this happened is quite interesting/useful to know, because it should not normally happen, and is most likely a bug. If you can reproduce this, please file a bug with relevant details. Thanks! How to fix? If you do not need your engine, just remove everything and try again. If you do, you can try removing some of the duplicate lines. If they are different, you'll have to use your own judgment to choose the correct ones, or share the content and we can try to help. Best,
Thanks,
Charles _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

Thank you so much for the detailed response!
You can try looking at the dwh_history_timekeeping table in the engine (not dwh) database:
su - postgres -c 'psql engine -c "select * from dwh_history_timekeeping;"'
Most likely you'll find there more than one line with var_name 'lastSync’.
And that I most certainly did. There were double of each var_name. I simply deleted one of each of the pairs that made the most sense to delete, reran engine-setup and all appears to be working now!
How this happened is quite interesting/useful to know, because it should not normally happen, and is most likely a bug. If you can reproduce this, please file a bug with relevant details. Thanks!
I’m pretty sure this was a self-inflicted issue as a while back when things broke, we actually had two oVirt heads running but we didn’t catch it for a while. Basically, we migrated from a VM running the head (on a separate VM solution) to a hardware solution. Someone ended up turning the VM back on and it started wreaking havoc on our installation. This was likely a leftover from that condition. If it happens to return, I’ll be sure to file a bug. One last question: Data for the Storage section of the Global Utilization part of the dashboard is empty. We are using Ceph via Cinder for our storage. Is that the issue? Side note: we are now being bitten by this bug - https://bugzilla.redhat.com/show_bug.cgi?id=1465825 Thanks again for the assistance. Charles

On Fri, Aug 25, 2017 at 10:47 PM, Charles Gruener <cjg9411@rit.edu> wrote:
Thank you so much for the detailed response!
Glad it helped :-)
You can try looking at the dwh_history_timekeeping table in the engine (not dwh) database:
su - postgres -c 'psql engine -c "select * from dwh_history_timekeeping;"'
Most likely you'll find there more than one line with var_name 'lastSync’.
And that I most certainly did. There were double of each var_name. I simply deleted one of each of the pairs that made the most sense to delete, reran engine-setup and all appears to be working now!
How this happened is quite interesting/useful to know, because it should not normally happen, and is most likely a bug. If you can reproduce this, please file a bug with relevant details. Thanks!
I’m pretty sure this was a self-inflicted issue as a while back when things broke, we actually had two oVirt heads running but we didn’t catch it for a while. Basically, we migrated from a VM running the head (on a separate VM solution) to a hardware solution.
How? Using engine-backup? Some kind of duplication/imaging?
Someone ended up turning the VM back on and it started wreaking havoc on our installation.
Ouch. This is one of my bad dreams re backup/restore/migration. We try to emphasize in various guides that you must stop and disable the engine service on the old machine. If you can think of anything that could have further helped in your own situation/flow/case, do not hesitate to ping us! Saying "This was obviously our own fault, the software was just fine" is helpful only to some extent. That said, I did not hear so far about exactly-same cases, although this does not mean there aren't any.
This was likely a leftover from that condition.
Can you think how, exactly? Can't tell exactly from your emails, but it seems you had engine+dwh on the same machine. Did you have DBs on a separate machine (which is not the default)? If so, it makes sense. The two machines' processes' both updated the same DBs. But if you did use local DBs, and accessed them with host 'localhost' (which is the default), the above should not have happened. Each machine would then write to its own DBs. This still quite bad - because you are then having two engines talking to the same hosts - but in a different way. Also: if I were you, I'd not keep trusting this system. If it works, fine. It might break in the future - the above is definitely not part of the design, not tested, not supported, etc. If at all possible, perhaps consider reinstalling from scratch (not engine-backup backup/restore). You can import the existing storage domains, if they are not damaged as well. Can't even tell you how to test this. If the individual VMs' disks seem ok, you might backup/restore these.
If it happens to return, I’ll be sure to file a bug.
Very well.
One last question: Data for the Storage section of the Global Utilization part of the dashboard is empty. We are using Ceph via Cinder for our storage. Is that the issue?
I really have no idea, but it sounds reasonable. If you do not find an existing open bug/RFE, please open one. Or start a new thread on this list with a suitable subject header.
Side note: we are now being bitten by this bug - https://bugzilla.redhat.com/show_bug.cgi?id=1465825
Thanks again for the assistance.
Charles
Best, -- Didi
participants (2)
-
Charles Gruener
-
Yedidyah Bar David