Really slow manager node
by Nicolás
Hi,
We're running oVirt 4.3.8 and even if this is a problem we've had since
a lot of time (I would say since 4.0.x), I decided to look for help in
case anything can be done.
Our environment is heavily used by users in our University (about 3000
users), and currently our oVirt infrastructure has 1928 virtual
machines, being 882 of them currently running. We have a separate
physical machine for the manager node, and the problem is that this
machine is very, very, very slow, despite it has (from my point of view)
enough physical resources to run efficiently.
By slow I mean even the SSH access to it takes about 10 seconds, not
just the admin/user portals. Any operation takes enough time to make the
experience not comfortable to our users (enter the VM portal, start a
VM, open a console...).
Node machine's parameters are:
18GB of RAM memory
1 processor with 12CPUs
300GB SCSI Disk, local storage. No Storage Domain is stored in this node
machine.
Currently, most consuming processes are:
28802 ovirt 20 0 3302876 1,5g 25988 S 1,0 8,7 22:46.36
ovirt-engine -server -XX:+TieredCompilation -Xms1024M -Xmx1024M -Xss1M
-Djava.aw+
28701 ovirt 20 0 5465432 807704 13140 S 5,9 4,4 3:15.53
ovirt-engine-dwhd
-Dorg.ovirt.engine.dwh.settings=/tmp/tmp8wtnTA/settings.proper+
# free -m
total used free shared buff/cache
available
Mem: 17886 6669 2734 255 8482 10625
Swap: 6143 1002 5141
There are also a lot of postgresql processes:
postgres 2186 0.0 0.0 261812 4136 ? Ss jun05 37:39
/opt/rh/rh-postgresql10/root/usr/bin/postmaster -D
/var/opt/rh/rh-postgresql10/lib/pgsql/data
postgres 3176 0.0 0.0 216084 656 ? Ss jun05 0:00
postgres: logger process
postgres 3290 0.0 0.2 262204 37476 ? Ss jun05 77:31
postgres: checkpointer process
postgres 3291 0.3 0.2 262052 36960 ? Ss jun05 754:02
postgres: writer process
postgres 3292 0.0 0.0 261812 1988 ? Ss jun05 100:51
postgres: wal writer process
postgres 3293 0.0 0.2 262380 36748 ? Ss jun05 21:58
postgres: autovacuum launcher process
postgres 3294 0.1 0.0 219216 1412 ? Ds jun05 335:41
postgres: stats collector process
postgres 3295 0.0 0.0 262220 1460 ? Ss jun05 0:11
postgres: bgworker: logical replication launcher
postgres 3393 0.0 0.2 265792 40452 ? Ss jun05 15:36
postgres: engine engine ::1(51664) idle
postgres 6105 0.3 0.0 271532 15976 ? Ds 13:01 0:00
postgres: autovacuum worker process ovirt_engine_history
postgres 6216 0.2 0.0 263864 11440 ? Ss 13:02 0:00
postgres: autovacuum worker process engine
postgres 6245 0.0 0.0 262888 6212 ? Ss 13:02 0:00
postgres: engine engine 127.0.0.1(42400) idle
postgres 6246 0.0 0.0 262844 3256 ? Ss 13:02 0:00
postgres: autovacuum worker process template1
postgres 18815 0.0 0.0 262912 5852 ? Ss nov01 0:00
postgres: django django 127.0.0.1(59564) idle
postgres 23148 0.0 0.2 266052 43024 ? Ss oct28 9:01
postgres: engine engine 127.0.0.1(59714) idle
postgres 23149 0.0 0.0 262980 6820 ? Ss oct28 0:00
postgres: engine engine 127.0.0.1(59716) idle
postgres 28784 0.0 0.0 262816 3492 ? Ss 12:02 0:00
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39470) idle
postgres 28785 0.0 0.0 262816 3496 ? Ss 12:02 0:00
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39472) idle
postgres 28921 0.8 0.7 375152 146452 ? Ss 12:03 0:30
postgres: engine engine 127.0.0.1(39484) idle
postgres 29007 3.3 0.7 369348 142184 ? Ds 12:03 1:58
postgres: engine engine 127.0.0.1(39498) SELECT
postgres 29009 5.0 0.3 294736 72776 ? Ss 12:03 2:58
postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(39500) idle
postgres 29048 0.0 0.0 263664 12176 ? Ss 12:03 0:00
postgres: engine engine 127.0.0.1(39530) idle
postgres 29064 0.6 0.8 384936 157852 ? Ss 12:03 0:24
postgres: engine engine 127.0.0.1(39532) idle
postgres 29065 1.2 0.7 358944 137028 ? Ss 12:03 0:43
postgres: engine engine 127.0.0.1(39534) idle
postgres 29066 1.9 0.7 355800 132800 ? Ss 12:03 1:08
postgres: engine engine 127.0.0.1(39536) idle
postgres 29067 1.0 0.7 370688 140504 ? Ss 12:03 0:37
postgres: engine engine 127.0.0.1(39538) idle
postgres 29068 1.7 0.7 360984 139584 ? Ss 12:03 1:02
postgres: engine engine 127.0.0.1(39540) idle
postgres 29069 1.3 0.7 358140 136268 ? Ss 12:03 0:48
postgres: engine engine 127.0.0.1(39542) idle
postgres 29070 3.7 0.7 372064 141724 ? Ss 12:03 2:13
postgres: engine engine 127.0.0.1(39544) idle
postgres 29071 0.8 0.6 337224 114132 ? Ss 12:03 0:31
postgres: engine engine 127.0.0.1(39546) idle
postgres 29072 0.5 0.6 336616 114276 ? Ss 12:03 0:18
postgres: engine engine 127.0.0.1(39548) idle
postgres 29073 1.7 0.7 357872 134540 ? Ss 12:03 1:02
postgres: engine engine 127.0.0.1(39550) idle
postgres 29139 3.5 0.7 361244 134176 ? Ss 12:03 2:07
postgres: engine engine 127.0.0.1(39572) idle
postgres 29140 3.0 0.7 367884 138372 ? Ss 12:03 1:46
postgres: engine engine 127.0.0.1(39570) idle
postgres 29141 1.0 0.9 391984 169204 ? Ss 12:03 0:38
postgres: engine engine 127.0.0.1(39574) idle
postgres 29142 1.3 0.7 358820 136140 ? Ss 12:03 0:48
postgres: engine engine 127.0.0.1(39576) idle
postgres 29143 1.6 0.8 385000 156592 ? Ss 12:03 0:58
postgres: engine engine 127.0.0.1(39578) idle
postgres 29144 3.6 0.9 403740 174552 ? Ss 12:03 2:09
postgres: engine engine 127.0.0.1(39580) idle
postgres 29145 3.4 0.7 355800 128876 ? Ss 12:03 2:01
postgres: engine engine 127.0.0.1(39582) idle
postgres 29146 1.8 0.8 380884 157828 ? Ss 12:03 1:06
postgres: engine engine 127.0.0.1(39586) idle
postgres 29147 3.5 0.6 353228 123200 ? Ss 12:03 2:05
postgres: engine engine 127.0.0.1(39584) idle
postgres 29148 0.9 0.8 383452 156268 ? Ss 12:03 0:34
postgres: engine engine 127.0.0.1(39588) idle
postgres 29149 1.3 0.7 365552 139612 ? Ss 12:03 0:47
postgres: engine engine 127.0.0.1(39590) idle
postgres 29150 1.2 0.7 363448 136564 ? Ss 12:03 0:43
postgres: engine engine 127.0.0.1(39592) SELECT
postgres 29151 1.4 0.7 365940 139184 ? Ss 12:03 0:51
postgres: engine engine 127.0.0.1(39594) idle
postgres 29152 1.2 0.7 358328 134268 ? Ss 12:03 0:44
postgres: engine engine 127.0.0.1(39596) idle
postgres 29402 1.3 0.7 362044 139812 ? Ss 12:04 0:47
postgres: engine engine 127.0.0.1(39698) idle
postgres 29403 2.1 0.8 377760 154556 ? Ss 12:04 1:15
postgres: engine engine 127.0.0.1(39700) idle
postgres 29404 2.2 0.7 369160 144600 ? Ss 12:04 1:18
postgres: engine engine 127.0.0.1(39702) idle
postgres 32230 0.1 0.5 336252 101912 ? Ss 12:19 0:04
postgres: engine engine 127.0.0.1(40564) idle
postgres 32232 3.4 0.6 357644 127732 ? Ss 12:19 1:29
postgres: engine engine 127.0.0.1(40566) idle
postgres 32234 0.0 0.1 270924 19836 ? Ss 12:19 0:00
postgres: engine engine 127.0.0.1(40568) idle
Actually, I'm not really sure why is this machine so slow. In terms of
RAM memory and CPU, it still has a bunch of free resources.
Could someone help track this down and point some advices on how to
optimize the user experience in this case?
Thanks!
4 years, 3 months
easy way to migrate to new engine
by jb
Hello everybody!
At the moment I have here a standalone engine and two nodes. Storage
type is NFS.
Now I want to setup a 3 node self hosted cluster with Glusterfs and
import my VMs from the old engine.
I know, there is a guide for migration from standalone to self-hosted
engine, but there is so many changes (new Subnet/DNS names, different
type of storage, new CPU type) that I would like to skip this migration
guide.
Is there a different way to migrate my VMs? Mounting my old storage and
moving the disks to the Gluster would be nice, but I guess I can not
just mount a storage with existing VM images.
Best regards
Jonathan
4 years, 3 months
Re: Problem with Cluster-wise BIOS Settings in oVirt 4.4
by Vladislav S
PLEASE HELP
I changed the type in the cluster and now my hosted-engine machine won't
start!. There are no backups. How can I manually change the parameters of
the machine from under the roof so that it starts and I can restore ovirt
to work
Error in logs: XML error: The device at PCI address 0000:00:02.0 cannot be
plugged into the PCI controller with index='0'. It requires a controller
that accepts a pcie-root-port
4 years, 3 months
Root Password Reset
by ilirdaka@live.com
Hello!
I'm new to oVirt, one of my customers has a 2 node (cluster) of oVirt but has no password access to it!
Is there any way to reset that password?
Best regards,
Ilir
4 years, 3 months
Async release for oVirt 4.4.3
by Sandro Bonazzola
On November 16th the oVirt project released an async update to the
following packages:
-
ovirt-ansible-collection-1.2.2-1
-
ovirt-engine-4.4.3.12
-
ovirt-engine-sdk 4.4.7
Fixing the following bugs:
-
[BZ 1884599 <https://bugzilla.redhat.com/1884599>] - Hosted-engine
deploy is failing with error "Unable to access credentials
/etc/pki/vdsm/libvirt-vnc/ca-cert.pem"
-
[BZ 1894111 <https://bugzilla.redhat.com/1894111>] - After host upgrade
from 4.4.2 to 4.4.3 there are still packages left to upgrade
-
[BZ 1894758 <https://bugzilla.redhat.com/1894758>] - [DR] Remote data
sync to the secondary site never completes
-
Support enum URL parameters in Python SDK
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
4 years, 3 months
Migrate old storage domains to new oVirt
by Joris Dobbelsteen
Hi All,
I still have a single-node (all-in-one) oVirt 3.5 setup that has worked great and uses a local storage domain. I’m migrating to oVirt 4.4 hosted setup as I need to replace disks.
However, since this is a single node setup, making the migration a bit tricker than usual.
My plan is to move the physical disks over, mount the old local storage domains and use them in oVirt 4.4. E.g. to move the images.
Is this still possible with old domains? Or is there something special that should be taken care of?
Since the original domains are local is this something to take care of? E.g. can hosted engine setup be running on a “local” datacentre?
Best regards,
Joris Dobbelsteen
4 years, 3 months
oVirt Node Crash
by Anton Louw
Hi Everybody,
I have built a new host which has been running fine for the last couple of days. I noticed today that the host crashed, but it is not giving me a reason as to why.
It happened at 13:45 today, but I have given time before that on the logs as well.
Is there something I am missing here?
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
4 years, 3 months
Backporting of Fixes
by Gillingham, Eric J (US 393D)
I'm still running on oVirt 4.3 due to some hardware that will require some extra effort to move to 4.4 we're not quite ready to do yet, and am currently hitting what I believe to be https://bugzilla.redhat.com/show_bug.cgi?id=1820998 which is fixed in 4.4. I'm wondering if there's a process to request a backport, or should I just open a new bug against 4.3?
Thank You
- Eric
4 years, 3 months
oVirt 4.4 vnic profile to substitute vdsm mac spoof hook
by Gianluca Cecchi
Hello,
I have an Openstack Queens environment that is composed by VMs inside oVirt.
Initially this oVirt environment was on 4.3 and on director I had to set
vdsm mac spoof to be able to give dhcp address to the overcloud nodes.
I tried something at that time with vnic profiles but I was not able to
have it work as expected, so I used the hook.
Now I have migrated this same environment to 4.4.2 and I see that vdsm mac
spoof is no more an option:
https://bugzilla.redhat.com/show_bug.cgi?id=1703840
I tried some combinations of vnic profiles on director vnic but it seems
not to work as a dhcpserver as before.
I see DHCPDISCOVER from overcloud node in director messages (while before I
saw DHCPREQUEST), I see the DHCPOFFER from director, but I don't see the
DHCPACK from the overcloud node...
On director I have the br-ctlplane vswitch using its interface eth1:
ovs-vsctl show
. . .
Bridge br-ctlplane
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port br-ctlplane
Interface br-ctlplane
type: internal
Port "eth1"
Interface "eth1"
Port phy-br-ctlplane
Interface phy-br-ctlplane
type: patch
options: {peer=int-br-ctlplane}
ovs_version: "2.11.0"
[root@director ~]# ovs-ofctl show br-ctlplane
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000566f3d480014
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth1): addr:56:6f:3d:48:00:13
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-ctlplane): addr:76:76:8b:08:d4:2f
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-ctlplane): addr:56:6f:3d:48:00:14
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@director ~]#
[root@director ~]# ip addr show dev br-ctlplane
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default qlen 1000
link/ether 56:6f:3d:48:00:14 brd ff:ff:ff:ff:ff:ff
inet 172.23.0.220/24 brd 172.23.0.255 scope global br-ctlplane
valid_lft forever preferred_lft forever
inet 172.23.0.222/32 scope global br-ctlplane
valid_lft forever preferred_lft forever
inet 172.23.0.221/32 scope global br-ctlplane
valid_lft forever preferred_lft forever
inet6 fe80::546f:3dff:fe48:14/64 scope link
valid_lft forever preferred_lft forever
[root@director ~]#
The eth1 interface is configured in oVirt on vlan23.
The overcloud nodes have one vnic on vlan 23.
I have two dnsmasq processes on director, for ironic and for overcloud
nodes:
nobody 1527 1 0 Nov14 ? 00:00:00 /sbin/dnsmasq
--conf-file=/etc/ironic-inspector/dnsmasq.conf
nobody 3288 1 0 Nov14 ? 00:00:00 dnsmasq --no-hosts
--no-resolv
--pid-file=/var/lib/neutron/dhcp/22a5739b-acb6-4400-8247-080a66895f1a/pid
--dhcp-hostsfile=/var/lib/neutron/dhcp/22a5739b-acb6-4400-8247-080a66895f1a/host
--addn-hosts=/var/lib/neutron/dhcp/22a5739b-acb6-4400-8247-080a66895f1a/addn_hosts
--dhcp-optsfile=/var/lib/neutron/dhcp/22a5739b-acb6-4400-8247-080a66895f1a/opts
--dhcp-leasefile=/var/lib/neutron/dhcp/22a5739b-acb6-4400-8247-080a66895f1a/leases
--dhcp-match=set:ipxe,175 --local-service --bind-dynamic
--dhcp-range=set:subnet-d48e002b-f269-4ed5-a96e-9aeba0b119b9,172.23.0.0,static,255.255.255.0,86400s
--dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256
--conf-file=/etc/dnsmasq-ironic.conf --domain=localdomain
In messages of director when booting the overcloud node:
Nov 15 00:42:45 director dnsmasq-dhcp[3288]: DHCPDISCOVER(tap9fdd5920-62)
172.23.0.232 56:6f:3d:48:00:3c
Nov 15 00:42:45 director dnsmasq-dhcp[1527]: 3792136019
DHCPDISCOVER(br-ctlplane) 56:6f:3d:48:00:3c ignored
Nov 15 00:42:45 director dnsmasq-dhcp[3288]: DHCPOFFER(tap9fdd5920-62)
172.23.0.232 56:6f:3d:48:00:3c
Nov 15 00:42:45 director dnsmasq-dhcp[3288]: DHCPOFFER(tap9fdd5920-62)
172.23.0.232 56:6f:3d:48:00:3c
In oVirt 4.3 with the same setup and VDSM MAC spoof hook configured I saw
this inside messages of director when booting the overcloud nodes:
Oct 18 04:50:05 director dnsmasq-dhcp[3230]: DHCPREQUEST(tap9fdd5920-62)
172.23.0.232 56:6f:3d:48:00:3c
Oct 18 04:50:05 director dnsmasq-dhcp[3230]: DHCPACK(tap9fdd5920-62)
172.23.0.232 56:6f:3d:48:00:3c host-172-23-0-232
Questions:
what is the filter to give to have it work?
is it dynamic the effect at runtime for a running VM to edit its vnic and
change its profile?
is it dynamic for running VMs that for a vNIC with a certain profile if I
go in Networks --> vNIC Profiles --> Edit vNIC Profile and change its
network filter value?
Thanks for any insight.
Gianluca
4 years, 3 months
Failed to add new host to oVirt 4.4.3.11-1.el8
by Winfried de Heiden
Hi all,
Adding a brand new installed oVirt node to oVirt 4.4.3.11-1.el8 fails;
engine.log will tell:
2020-11-13 10:05:40,777+01 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-5)
[80e3f255-f8b9-4d5f-9be2-ebb59fec28db] EVENT_ID:
VDS_ANSIBLE_INSTALL_STARTED(560), Ansible host-deploy playbook execution
has started on host Bigvirt.
2020-11-13 10:05:40,784+01 ERROR
[org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor]
(EE-ManagedThreadFactory-engine-Thread-5)
[80e3f255-f8b9-4d5f-9be2-ebb59fec28db] Exception: Failed to read the
runner-service response. Unexpected character ('<' (code 60)): expected
a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: org.apache.http.conn.EofSensorInputStream@6cc9953a; line:
1, column: 2]
2020-11-13 10:05:40,785+01 ERROR
[org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand]
(EE-ManagedThreadFactory-engine-Thread-5)
[80e3f255-f8b9-4d5f-9be2-ebb59fec28db] Host installation failed for host
'df87842f-0852-4a3a-898b-2b724b068199', 'Bigvirt': Failed to execute
Ansible host-deploy role: Failed to read the runner-service response.
Unexpected character ('<' (code 60)): expected a valid value (number,
String, array, object, 'true', 'false' or 'null')
Anyone got a clue?
Winfried
4 years, 3 months