Hi,
We're running oVirt 4.3.8 and even if this is a problem we've had since
a lot of time (I would say since 4.0.x), I decided to look for help in
case anything can be done.
Our environment is heavily used by users in our University (about 3000
users), and currently our oVirt infrastructure has 1928 virtual
machines, being 882 of them currently running. We have a separate
physical machine for the manager node, and the problem is that this
machine is very, very, very slow, despite it has (from my point of view)
enough physical resources to run efficiently.
By slow I mean even the SSH access to it takes about 10 seconds, not
just the admin/user portals. Any operation takes enough time to make the
experience not comfortable to our users (enter the VM portal, start a
VM, open a console...).
Node machine's parameters are:
18GB of RAM memory
1 processor with 12CPUs
300GB SCSI Disk, local storage. No Storage Domain is stored in this node
machine.
Currently, most consuming processes are:
28802 ovirt 20 0 3302876 1,5g 25988 S 1,0 8,7 22:46.36
ovirt-engine -server -XX:+TieredCompilation -Xms1024M -Xmx1024M -Xss1M
-Djava.aw+
28701 ovirt 20 0 5465432 807704 13140 S 5,9 4,4 3:15.53
ovirt-engine-dwhd
-Dorg.ovirt.engine.dwh.settings=/tmp/tmp8wtnTA/settings.proper+
# free -m
total used free shared buff/cache
available
Mem: 17886 6669 2734 255 8482 10625
Swap: 6143 1002 5141
There are also a lot of postgresql processes:
postgres 2186 0.0 0.0 261812 4136 ? Ss jun05 37:39
/opt/rh/rh-postgresql10/root/usr/bin/postmaster -D
/var/opt/rh/rh-postgresql10/lib/pgsql/data
postgres 3176 0.0 0.0 216084 656 ? Ss jun05 0:00
postgres: logger process
postgres 3290 0.0 0.2 262204 37476 ? Ss jun05 77:31
postgres: checkpointer process
postgres 3291 0.3 0.2 262052 36960 ? Ss jun05 754:02
postgres: writer process
postgres 3292 0.0 0.0 261812 1988 ? Ss jun05 100:51
postgres: wal writer process
postgres 3293 0.0 0.2 262380 36748 ? Ss jun05 21:58
postgres: autovacuum launcher process
postgres 3294 0.1 0.0 219216 1412 ? Ds jun05 335:41
postgres: stats collector process
postgres 3295 0.0 0.0 262220 1460 ? Ss jun05 0:11
postgres: bgworker: logical replication launcher
postgres 3393 0.0 0.2 265792 40452 ? Ss jun05 15:36
postgres: engine engine ::1(51664) idle
postgres 6105 0.3 0.0 271532 15976 ? Ds 13:01 0:00
postgres: autovacuum worker process ovirt_engine_history
postgres 6216 0.2 0.0 263864 11440 ? Ss 13:02 0:00
postgres: autovacuum worker process engine
postgres 6245 0.0 0.0 262888 6212 ? Ss 13:02 0:00
postgres: engine engine 127.0.0.1(42400) idle
postgres 6246 0.0 0.0 262844 3256 ? Ss 13:02 0:00
postgres: autovacuum worker process template1
postgres 18815 0.0 0.0 262912 5852 ? Ss nov01 0:00
postgres: django django 127.0.0.1(59564) idle
postgres 23148 0.0 0.2 266052 43024 ? Ss oct28 9:01
postgres: engine engine 127.0.0.1(59714) idle
postgres 23149 0.0 0.0 262980 6820 ? Ss oct28 0:00
postgres: engine engine 127.0.0.1(59716) idle
postgres 28784 0.0 0.0 262816 3492 ? Ss 12:02 0:00
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39470) idle
postgres 28785 0.0 0.0 262816 3496 ? Ss 12:02 0:00
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39472) idle
postgres 28921 0.8 0.7 375152 146452 ? Ss 12:03 0:30
postgres: engine engine 127.0.0.1(39484) idle
postgres 29007 3.3 0.7 369348 142184 ? Ds 12:03 1:58
postgres: engine engine 127.0.0.1(39498) SELECT
postgres 29009 5.0 0.3 294736 72776 ? Ss 12:03 2:58
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39500) idle
postgres 29048 0.0 0.0 263664 12176 ? Ss 12:03 0:00
postgres: engine engine 127.0.0.1(39530) idle
postgres 29064 0.6 0.8 384936 157852 ? Ss 12:03 0:24
postgres: engine engine 127.0.0.1(39532) idle
postgres 29065 1.2 0.7 358944 137028 ? Ss 12:03 0:43
postgres: engine engine 127.0.0.1(39534) idle
postgres 29066 1.9 0.7 355800 132800 ? Ss 12:03 1:08
postgres: engine engine 127.0.0.1(39536) idle
postgres 29067 1.0 0.7 370688 140504 ? Ss 12:03 0:37
postgres: engine engine 127.0.0.1(39538) idle
postgres 29068 1.7 0.7 360984 139584 ? Ss 12:03 1:02
postgres: engine engine 127.0.0.1(39540) idle
postgres 29069 1.3 0.7 358140 136268 ? Ss 12:03 0:48
postgres: engine engine 127.0.0.1(39542) idle
postgres 29070 3.7 0.7 372064 141724 ? Ss 12:03 2:13
postgres: engine engine 127.0.0.1(39544) idle
postgres 29071 0.8 0.6 337224 114132 ? Ss 12:03 0:31
postgres: engine engine 127.0.0.1(39546) idle
postgres 29072 0.5 0.6 336616 114276 ? Ss 12:03 0:18
postgres: engine engine 127.0.0.1(39548) idle
postgres 29073 1.7 0.7 357872 134540 ? Ss 12:03 1:02
postgres: engine engine 127.0.0.1(39550) idle
postgres 29139 3.5 0.7 361244 134176 ? Ss 12:03 2:07
postgres: engine engine 127.0.0.1(39572) idle
postgres 29140 3.0 0.7 367884 138372 ? Ss 12:03 1:46
postgres: engine engine 127.0.0.1(39570) idle
postgres 29141 1.0 0.9 391984 169204 ? Ss 12:03 0:38
postgres: engine engine 127.0.0.1(39574) idle
postgres 29142 1.3 0.7 358820 136140 ? Ss 12:03 0:48
postgres: engine engine 127.0.0.1(39576) idle
postgres 29143 1.6 0.8 385000 156592 ? Ss 12:03 0:58
postgres: engine engine 127.0.0.1(39578) idle
postgres 29144 3.6 0.9 403740 174552 ? Ss 12:03 2:09
postgres: engine engine 127.0.0.1(39580) idle
postgres 29145 3.4 0.7 355800 128876 ? Ss 12:03 2:01
postgres: engine engine 127.0.0.1(39582) idle
postgres 29146 1.8 0.8 380884 157828 ? Ss 12:03 1:06
postgres: engine engine 127.0.0.1(39586) idle
postgres 29147 3.5 0.6 353228 123200 ? Ss 12:03 2:05
postgres: engine engine 127.0.0.1(39584) idle
postgres 29148 0.9 0.8 383452 156268 ? Ss 12:03 0:34
postgres: engine engine 127.0.0.1(39588) idle
postgres 29149 1.3 0.7 365552 139612 ? Ss 12:03 0:47
postgres: engine engine 127.0.0.1(39590) idle
postgres 29150 1.2 0.7 363448 136564 ? Ss 12:03 0:43
postgres: engine engine 127.0.0.1(39592) SELECT
postgres 29151 1.4 0.7 365940 139184 ? Ss 12:03 0:51
postgres: engine engine 127.0.0.1(39594) idle
postgres 29152 1.2 0.7 358328 134268 ? Ss 12:03 0:44
postgres: engine engine 127.0.0.1(39596) idle
postgres 29402 1.3 0.7 362044 139812 ? Ss 12:04 0:47
postgres: engine engine 127.0.0.1(39698) idle
postgres 29403 2.1 0.8 377760 154556 ? Ss 12:04 1:15
postgres: engine engine 127.0.0.1(39700) idle
postgres 29404 2.2 0.7 369160 144600 ? Ss 12:04 1:18
postgres: engine engine 127.0.0.1(39702) idle
postgres 32230 0.1 0.5 336252 101912 ? Ss 12:19 0:04
postgres: engine engine 127.0.0.1(40564) idle
postgres 32232 3.4 0.6 357644 127732 ? Ss 12:19 1:29
postgres: engine engine 127.0.0.1(40566) idle
postgres 32234 0.0 0.1 270924 19836 ? Ss 12:19 0:00
postgres: engine engine 127.0.0.1(40568) idle
Actually, I'm not really sure why is this machine so slow. In terms of
RAM memory and CPU, it still has a bunch of free resources.
Could someone help track this down and point some advices on how to
optimize the user experience in this case?
Thanks!