Postgres stuck on 100% after Engine upgrade from 4.2.8 to 4.3.7

Hi oVirt, Since oVirt upgrade from 4.2.8 to 4.3.7, SHE postgres is running 100% CPU with 4 processes, related to ovirt_engine_history DB: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2617 postgres 20 0 372668 30876 26352 R 100.0 0.2 63:40.19 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50226) INSERT 3554 postgres 20 0 368680 15120 11716 R 100.0 0.1 56:45.72 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50448) BIND 2623 postgres 20 0 368068 17640 14896 R 99.3 0.1 42:35.36 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50228) BIND 2880 postgres 20 0 368680 20404 17004 R 99.0 0.1 65:13.96 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50252) BIND Output of select * from pg_stat_activity; is giving me following: 16401 | ovirt_engine_history | 2617 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50226 | 2019-12-09 13:37:07.779261+01 | 2019-12-09 13:37:09.555704+01 | 2019-12-09 13:39:04.915431+01 | 2019-12-09 13:39:04.915432+01 | | | active | 260332629 | 260332629 | INSERT INTO vm_disks_usage_samples_history (history_datetime,vm_id,disks_usage) VALUES ($1,$2,$3) | client backend 16401 | ovirt_engine_history | 2623 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50228 | 2019-12-09 13:37:07.820982+01 | 2019-12-09 14:00:00.042448+01 | 2019-12-09 14:00:00.997228+01 | 2019-12-09 14:00:00.997229+01 | | | active | 260333458 | 260332629 | SELECT +| client backend | | | | | | | | | | | | | | | | | | history_datetime, +| | | | | | | | | | | | | | | | | | | vm_id, +| | | | | | | | | | | | | | | | | | | disks_usage +| | | | | | | | | | | | | | | | | | | FROM vm_disks_usage_samples_history +| | | | | | | | | | | | | | | | | | | WHERE history_datetime >= '2019-12-09 07:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | AND history_datetime < '2019-12-09 08:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | ORDER BY history_id DESC, +| | | | | | | | | | | | | | | | | | | vm_id | 16401 | ovirt_engine_history | 2624 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50230 | 2019-12-09 13:37:07.826391+01 | | | 2019-12-09 13:37:07.835886+01 | Client | ClientRead | idle | | | | client backend 16401 | ovirt_engine_history | 2625 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50232 | 2019-12-09 13:37:07.847262+01 | | | 2019-12-09 13:37:07.850961+01 | Client | ClientRead | idle | | | | client backend 13214 | postgres | 6275 | 10 | postgres | psql | | | -1 | 2019-12-09 14:37:13.890996+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439897+01 | | | active | | 260332629 | select * from pg_stat_activity; | client backend [A screenshot of a computer Description automatically generated] Any ideas why this could be happening? Kindly awaiting your reply. ----- kind regards/met vrindelijke groet Marko Vrgotic ActiveVideo

Hi Marko, aren't there some deadlocks present? Try the queries here https://wiki.postgresql.org/wiki/Lock_Monitoring Please also check dwh log - /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log Best regards, Lucie On 12/9/19 2:50 PM, Vrgotic, Marko wrote:
Hi oVirt,
Since oVirt upgrade from 4.2.8 to 4.3.7, SHE postgres is running 100% CPU with 4 processes, related to ovirt_engine_history DB:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2617 postgres 20 0 372668 30876 26352 R 100.0 0.2 63:40.19 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50226) INSERT
3554 postgres 20 0 368680 15120 11716 R 100.0 0.1 56:45.72 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50448) BIND
2623 postgres 20 0 368068 17640 14896 R 99.3 0.1 42:35.36 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50228) BIND
2880 postgres 20 0 368680 20404 17004 R 99.0 0.1 65:13.96 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50252) BIND
Output of select * from pg_stat_activity; is giving me following:
16401 | ovirt_engine_history | 2617 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50226 | 2019-12-09 13:37:07.779261+01 | 2019-12-09 13:37:09.555704+01 | 2019-12-09 13:39:04.915431+01 | 2019-12-09 13:39:04.915432+01 | | | active
| 260332629 | 260332629 | INSERT INTO vm_disks_usage_samples_history (history_datetime,vm_id,disks_usage) VALUES ($1,$2,$3)
| client backend
16401 | ovirt_engine_history | 2623 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50228 | 2019-12-09 13:37:07.820982+01 | 2019-12-09 14:00:00.042448+01 | 2019-12-09 14:00:00.997228+01 | 2019-12-09 14:00:00.997229+01 | | | active
| 260333458 | 260332629 | SELECT
+| client backend
| | | | | | | | | | | | | | |
| | | history_datetime,
+|
| | | | | | | | | | | | | | |
| | | vm_id,
+|
| | | | | | | | | | | | | | |
| | | disks_usage
+|
| | | | | | | | | | | | | | |
| | | FROM vm_disks_usage_samples_history
+|
| | | | | | | | | | | | | | |
| | | WHERE history_datetime >= '2019-12-09 07:00:00.000000+0100'
+|
| | | | | | | | | | | | | | |
| | | AND history_datetime < '2019-12-09 08:00:00.000000+0100'
+|
| | | | | | | | | | | | | | |
| | | ORDER BY history_id DESC,
+|
| | | | | | | | | | | | | | |
| | | vm_id
|
16401 | ovirt_engine_history | 2624 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50230 | 2019-12-09 13:37:07.826391+01 | | | 2019-12-09 13:37:07.835886+01 | Client | ClientRead | idle
| | |
| client backend
16401 | ovirt_engine_history | 2625 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50232 | 2019-12-09 13:37:07.847262+01 | | | 2019-12-09 13:37:07.850961+01 | Client | ClientRead | idle
| | |
| client backend
13214 | postgres | 6275 | 10 | postgres | psql | | | -1 | 2019-12-09 14:37:13.890996+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439897+01 | | | active
| | 260332629 | select * from pg_stat_activity;
| client backend
A screenshot of a computer Description automatically generated
Any ideas why this could be happening?
Kindly awaiting your reply.
-----
kind regards/met vrindelijke groet
Marko Vrgotic ActiveVideo
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2O2A4WQCTFTNA...
-- Lucie Leistnerova Senior Quality Engineer, QE Cloud, RHVM Red Hat EMEA IRC: lleistne @ #rhev-qe

Hi Lucie, My apologies for not replying sooner. While waiting for a reply, I have decided to execute SHE restore exercise in order to get familiar with Restore procedure. Therefore, I can not provide additional details. Regardless, thank you for providing me a way to check for deadlocks. If I may ask, in case there was a deadlock, is there a common or standard way to resolve them, or would that actually require digging deeper into DB and process schemas? Kindly awaiting your reply. Marko Vrgotic Sent from my iPhone On 10 Dec 2019, at 08:10, Lucie Leistnerova <lleistne@redhat.com> wrote: Hi Marko, aren't there some deadlocks present? Try the queries here https://wiki.postgresql.org/wiki/Lock_Monitoring Please also check dwh log - /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log Best regards, Lucie On 12/9/19 2:50 PM, Vrgotic, Marko wrote: Hi oVirt, Since oVirt upgrade from 4.2.8 to 4.3.7, SHE postgres is running 100% CPU with 4 processes, related to ovirt_engine_history DB: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2617 postgres 20 0 372668 30876 26352 R 100.0 0.2 63:40.19 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50226) INSERT 3554 postgres 20 0 368680 15120 11716 R 100.0 0.1 56:45.72 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50448) BIND 2623 postgres 20 0 368068 17640 14896 R 99.3 0.1 42:35.36 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50228) BIND 2880 postgres 20 0 368680 20404 17004 R 99.0 0.1 65:13.96 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50252) BIND Output of select * from pg_stat_activity; is giving me following: 16401 | ovirt_engine_history | 2617 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50226 | 2019-12-09 13:37:07.779261+01 | 2019-12-09 13:37:09.555704+01 | 2019-12-09 13:39:04.915431+01 | 2019-12-09 13:39:04.915432+01 | | | active | 260332629 | 260332629 | INSERT INTO vm_disks_usage_samples_history (history_datetime,vm_id,disks_usage) VALUES ($1,$2,$3) | client backend 16401 | ovirt_engine_history | 2623 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50228 | 2019-12-09 13:37:07.820982+01 | 2019-12-09 14:00:00.042448+01 | 2019-12-09 14:00:00.997228+01 | 2019-12-09 14:00:00.997229+01 | | | active | 260333458 | 260332629 | SELECT +| client backend | | | | | | | | | | | | | | | | | | history_datetime, +| | | | | | | | | | | | | | | | | | | vm_id, +| | | | | | | | | | | | | | | | | | | disks_usage +| | | | | | | | | | | | | | | | | | | FROM vm_disks_usage_samples_history +| | | | | | | | | | | | | | | | | | | WHERE history_datetime >= '2019-12-09 07:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | AND history_datetime < '2019-12-09 08:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | ORDER BY history_id DESC, +| | | | | | | | | | | | | | | | | | | vm_id | 16401 | ovirt_engine_history | 2624 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50230 | 2019-12-09 13:37:07.826391+01 | | | 2019-12-09 13:37:07.835886+01 | Client | ClientRead | idle | | | | client backend 16401 | ovirt_engine_history | 2625 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50232 | 2019-12-09 13:37:07.847262+01 | | | 2019-12-09 13:37:07.850961+01 | Client | ClientRead | idle | | | | client backend 13214 | postgres | 6275 | 10 | postgres | psql | | | -1 | 2019-12-09 14:37:13.890996+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439897+01 | | | active | | 260332629 | select * from pg_stat_activity; | client backend <image001.png> Any ideas why this could be happening? Kindly awaiting your reply. ----- kind regards/met vrindelijke groet Marko Vrgotic ActiveVideo _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2O2A4WQCTFTNA... -- Lucie Leistnerova Senior Quality Engineer, QE Cloud, RHVM Red Hat EMEA IRC: lleistne @ #rhev-qe

Hi Marko, On 12/16/19 9:50 AM, Vrgotic, Marko wrote:
Hi Lucie,
My apologies for not replying sooner. While waiting for a reply, I have decided to execute SHE restore exercise in order to get familiar with Restore procedure. Therefore, I can not provide additional details. Regardless, thank you for providing me a way to check for deadlocks.
No problem :) But DWH should not create deadlocks that sound suspicious. We test upgrade in regression but I'm not sure whether we would catch such issues. I'll check.
If I may ask, in case there was a deadlock, is there a common or standard way to resolve them, or would that actually require digging deeper into DB and process schemas?
Just restarting ovirt-engine-dwhd service should clean the locks.
Kindly awaiting your reply.
Marko Vrgotic Sent from my iPhone
On 10 Dec 2019, at 08:10, Lucie Leistnerova <lleistne@redhat.com> wrote:
Hi Marko,
aren't there some deadlocks present?
Try the queries here https://wiki.postgresql.org/wiki/Lock_Monitoring
Please also check dwh log - /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log
Best regards, Lucie
On 12/9/19 2:50 PM, Vrgotic, Marko wrote:
Hi oVirt,
Since oVirt upgrade from 4.2.8 to 4.3.7, SHE postgres is running 100% CPU with 4 processes, related to ovirt_engine_history DB:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2617 postgres 20 0 372668 30876 26352 R 100.0 0.2 63:40.19 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50226) INSERT
3554 postgres 20 0 368680 15120 11716 R 100.0 0.1 56:45.72 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50448) BIND
2623 postgres 20 0 368068 17640 14896 R 99.3 0.1 42:35.36 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50228) BIND
2880 postgres 20 0 368680 20404 17004 R 99.0 0.1 65:13.96 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50252) BIND
Output of select * from pg_stat_activity; is giving me following:
16401 | ovirt_engine_history | 2617 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50226 | 2019-12-09 13:37:07.779261+01 | 2019-12-09 13:37:09.555704+01 | 2019-12-09 13:39:04.915431+01 | 2019-12-09 13:39:04.915432+01 | | | active
| 260332629 | 260332629 | INSERT INTO vm_disks_usage_samples_history (history_datetime,vm_id,disks_usage) VALUES ($1,$2,$3)
| client backend
16401 | ovirt_engine_history | 2623 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50228 | 2019-12-09 13:37:07.820982+01 | 2019-12-09 14:00:00.042448+01 | 2019-12-09 14:00:00.997228+01 | 2019-12-09 14:00:00.997229+01 | | | active
| 260333458 | 260332629 | SELECT
+| client backend
| | | | | | | | | | | | | | |
| | | history_datetime,
+|
| | | | | | | | | | | | | | |
| | | vm_id,
+|
| | | | | | | | | | | | | | |
| | | disks_usage
+|
| | | | | | | | | | | | | | |
| | | FROM vm_disks_usage_samples_history
+|
| | | | | | | | | | | | | | |
| | | WHERE history_datetime >= '2019-12-09 07:00:00.000000+0100'
+|
| | | | | | | | | | | | | | |
| | | AND history_datetime < '2019-12-09 08:00:00.000000+0100'
+|
| | | | | | | | | | | | | | |
| | | ORDER BY history_id DESC,
+|
| | | | | | | | | | | | | | |
| | | vm_id
|
16401 | ovirt_engine_history | 2624 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50230 | 2019-12-09 13:37:07.826391+01 | | | 2019-12-09 13:37:07.835886+01 | Client | ClientRead | idle
| | |
| client backend
16401 | ovirt_engine_history | 2625 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50232 | 2019-12-09 13:37:07.847262+01 | | | 2019-12-09 13:37:07.850961+01 | Client | ClientRead | idle
| | |
| client backend
13214 | postgres | 6275 | 10 | postgres | psql | | | -1 | 2019-12-09 14:37:13.890996+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439897+01 | | | active
| | 260332629 | select * from pg_stat_activity;
| client backend
<image001.png>
Any ideas why this could be happening?
Kindly awaiting your reply.
-----
kind regards/met vrindelijke groet
Marko Vrgotic ActiveVideo
_______________________________________________ Users mailing list --users@ovirt.org To unsubscribe send an email tousers-leave@ovirt.org Privacy Statement:https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:https://www.ovirt.org/community/about/community-guidelines/ List Archives:https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2O2A4WQCTFTNA...
-- Lucie Leistnerova Senior Quality Engineer, QE Cloud, RHVM Red Hat EMEA
IRC: lleistne @ #rhev-qe
-- Lucie Leistnerova Senior Quality Engineer, QE Cloud, RHVM Red Hat EMEA IRC: lleistne @ #rhev-qe
participants (2)
-
Lucie Leistnerova
-
Vrgotic, Marko