May 2022 - Users - Ovirt List Archives

storage high latency, sanlock errors, cluster instability

by Jonathan Baecker

Hello everybody, we run a 3 node self hosted cluster with GlusterFS. I had a lot of problem upgrading ovirt from 4.4.10 to 4.5.0.2 and now we have cluster instability. First I will write down the problems I had with upgrading, so you get a bigger picture: * engine update when fine * But nodes I could not update because of wrong version of imgbase, so I did a manual update to 4.5.0.1 and later to 4.5.0.2. First time after updating it was still booting into 4.4.10, so I did a reinstall. * Then after second reboot I ended up in the emergency mode. After a long searching I figure out that lvm.conf using *use_devicesfile *now but there it uses the wrong filters. So I comment out this and add the old filters back. This procedure I have done on all 3 nodes. * Then in cockpit on all nodes I see errors about: |ovs|00077|stream_ssl|ERR|Private key must be configured to use SSL| to fix that I run *vdsm-tool ovn-config [engine IP] ovirtmgmt, *and later in then web interface I choice for every node: enroll certificate. * between upgrading the nodes, I was a bit to fast to migrate all running VMs inclusive the HostedEngine, from one host to another and then hosted engine crashes one time. But it came back after some minutes and since this the engine runs normal. * Then I finish the installation with updating the cluster compatibility version to 4.7. * I notice some unsync volume warning, but because I had this in the past to, after upgrading, I though after some time they will disappear. The next day there still where there, so I decided to put the nodes again in the maintenance mode and restart the glusterd service. After some time the sync warnings where gone. So now the actual problem: Since this time the cluster is unstable. I get different errors and warning, like: * VM [name] is not responding * out of nothing HA VM gets migrated * VM migration can fail * VM backup with snapshoting and export take very long * VMs are getting very slow some times * Storage domain vmstore experienced a high latency of 9.14251 * ovs|00001|db_ctl_base|ERR|no key "dpdk-init" in Open_vSwitch record "." column other_config * 489279 [1064359]: s8 renewal error -202 delta_length 10 last_success 489249 * 444853 [2243175]: s27 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids * 471099 [2243175]: s27 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids * many of: 424035 [2243175]: s27 delta_renew long write time XX sec I will put here the sanlock.log messages and vdsm.log. Is there a way that I can fix this issues? Regards! Jonathan

3 years, 1 month

2
2
0 / 0

Install OKD 4.10 with Custom oVirt Certificate

by Fredrik Arneving

Hi, I've setup and ran Installer Provisioned Installation of OKD on several occations with OKD versions 4.4 - 4.8 on my oVirt (4.3?)/4.4 platform. However, after installing a Custom certificate for my self-hosted ovirt engine I've got problems getting the installation of OKD 4.10 (and 4.8) to complete. Is this a known problem with a known solution I can read up on somewhere? The install takes three times as long as the working ones did before and when I look at pods and cluster operators the "authentication" ones are in a bad state. I can use the KUBECONFIG environment variable to list pods and interact with the environment but the "oc login" fails with "unknown issuer". I had the choice of a "full install" of my custom cert or just the GUI/Web and I chose the latter. When installing the custom cert I followed the official RHV documentation that was pointed to by some oVirt user in a forum. Whatever certs I didn't change seemed to have worked before so I would be surprised if the solution is to go for the "full install". In all other cases (like my Foreman server and my freeIPA server) the oVirt works just fine with it's custom cert. Since I've made it before I'm pretty sure I've correctly followed the OKD installation instructions. What's new is the custom ovirt hosted-engine cert. Is there a detailed documentation on exactly what certificates from my oVirt installation that should be added to my "additionalTrustBundle" in OKD to make it work? In my previous working installations I added the custom root CA since I needed it for other purposes but maybe I need to add some other internal ovirt CA? I'm currently running oVirt version "4.4.10.7-1.el8" on CentOS Stream release 8 and OKD version "4.10.0-0.okd-2022-03-07-131213". No hardware changes between working installations and failed ones. Any hints on how to solve this would be appreciated

3 years, 1 month

1
0
0 / 0

why so many such logs ?

by tommy

In the new version of 4.5, we can see a lot of OVN synchronization items in the engine logs, very frequently, which was not seen in previous versions. Is it a new feature?

3 years, 1 month

2
5
1 / 0

about the bridge of the host

by tommy

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovirtmgmt state UP group default qlen 1000 link/ether 08:00:27:94:4d:e8 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 9e:5d:8f:94:00:86 brd ff:ff:ff:ff:ff:ff 4: br-int: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ea:20:e5:c3:d6:31 brd ff:ff:ff:ff:ff:ff 5: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 08:00:27:94:4d:e8 brd ff:ff:ff:ff:ff:ff inet 10.1.1.7/24 brd 10.1.1.255 scope global noprefixroute ovirtmgmt valid_lft forever preferred_lft forever 21: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 1e:cb:bf:02:f7:33 brd ff:ff:ff:ff:ff:ff what use of the 3/4/5/21/22 ? ( I know the item 5 ) are they all the bridge ? The out put of the brctl show appears only the ovirtmgmt and ;vdsmdummy' are bridge. [root@host1 ~]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.080027944de8 no enp0s3 [root@host1 ~]#

3 years, 1 month

2
2
0 / 0

why out of sync hosts ?

by tommy

What configuration is not synchronized to the host?

3 years, 1 month

2
3
0 / 0

infiniband for VM traffic

by Roberto Bertucci

Hi all, i am trying to use a mellanox 100g infiniband interface (EoIB) for VM usage. Acuallu, trying to configure hosts to user it, i have an error and in vsdm.log i see: The bridge ovirtib cannot use IP over InfiniBand interface ib0 as port. Please use RoCE interface instead. ib0 is configured with an ip address and it is correctly working, used to mount nfs directoryes on cluster nodes. Did anybody face this issue? Thank you all for help.

3 years, 1 month

1
1
0 / 0

VM HostedEngine is down with error

by souvaliotimaria＠mail.com

Hello everyone, I have a replica 2 + arbiter installation and this morning the Hosted Engine gave the following error on the UI and resumed on a different node (node3) than the one it was originally running(node1). (The original node has more memory than the one it ended up, but it had a better memory usage percentage at the time). Also, the only way I discovered the migration had happened and there was an Error in Events, was because I logged in the web interface of ovirt for a routine inspection. Βesides that, everything was working properly and still is. The error that popped is the following: VM HostedEngine is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: 2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-09-01T06:49:20.927274Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on: Failed to get "write" lock Is another process using the image?. Which from what I could gather concerns the following snippet from the HostedEngine.xml and it's the virtio disk of the Hosted Engine: <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads' iothread='1'/> <source file='/var/run/vdsm/storage/80f6e393-9718-4738-a14a-64cf43c3d8c2/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7'> <seclabel model='dac' relabel='no'/> </source> <target dev='vda' bus='virtio'/> <serial>d5de54b6-9f8e-4fba-819b-ebf6780757d2</serial> <alias name='ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> I've tried looking into the logs and the sar command but I couldn't find anything to relate with the above errors and determining the reason for it to happen. Is this a Gluster or a QEMU problem? The Hosted Engine was manually migrated five days before on node1. Is there a standard practice I could follow to determine what happened and secure my system? Thank you very much for your time, Maria Souvalioti

3 years, 1 month

4
6
1 / 0

Local Disk Usage

by mert tuncsav

Hello All, We have performance issues about i/o for some system on oVirt. Disk type is a nfs for shared datacenter. We need to use local disk as a secondary data-domain to deploy vm in shared datacenter. Is there a any chance configure that ? We couldn't find any solutions. Do you have suggestions about it ? Regards

3 years, 1 month

2
1
0 / 0

adding host to cluster with ovs switch failing

by ravi k

Hello all, I'm facing a strange error. I was able to add a host to a linux bridge based cluster. However if I try adding the host to a cluster with OVS switch it is failing. I can see that nmstate was able to create the ovirtmgmt bridge as well. At that point of time both the ovirtmgmt and the bond0.vlan interfaces have the ip assigned. It then fails and rolls back the config. A workaround that I found to be working was to add the host to linux bridge cluster first and then change the cluster to OVS cluster. Here's a background about the setup. The host is an AMD EPYC with OEL 8.6 installed. The OLVM manager is a standalone VM at 4.4.8. We have a bond0 and ip is assigned to bond0.1222 interface. The interfaces are in a LACP bond on the switch as well. I enabled debug in NetworkManager in the hope of finding some clues, but couldn't. I know 4.4 is EOL. As this is a user mailing list, I thought I'll reach out in hope if someone has seen any similar issue. supervdsm log MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,336::plugin::172::root::(apply_changes) Nispor: desired network state {'name': 'bond0', 'type': 'bond', 'state': 'up', 'mac-address': 'e4:3d:1a:82:9f:c0', 'link-aggregation': {'port': ['ens10f0np0', 'ens5f0np0'], 'options': {'ad_actor_sys_prio': 65535, 'ad_actor_system': '00:00:00:00:00:00', 'ad_select': 'stable', 'ad_user_port_key': 0, 'all_slaves_active': 'dropped', 'arp_all_targets': 'any', 'arp_interval': 0, 'arp_validate': 'none', 'downdelay': 0, 'lacp_rate': 'slow', 'miimon': 100, 'min_links': 0, 'updelay': 0, 'use_carrier': True, 'xmit_hash_policy': 'layer2', 'arp_ip_target': ''}, 'mode': '802.3ad'}, 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'mtu': 1500, 'lldp': {'enabled': False}, 'accept-all-mac-addresses': False, '_brport_options': {'name': 'bond0'}, '_controller': 'vdsmbr_6SMdIi3B', '_controller_type': 'ovs-bridge'} MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,336::plugin::172::root::(apply_changes) Nispor: desired network state {'name': 'ovirtmgmt', 'type': 'ovs-interface', 'state': 'up', 'mtu': 1500, 'ipv4': {'enabled': True, 'address': [{'ip': '10.129.221.19', 'prefix-length': 24}], 'dhcp': False, '_dns': {'server': ['10.150.5.100', '10.229.0.60'], 'search': [], '_priority': 0}, '_routes': [{'table-id': 329647082, 'destination': '0.0.0.0/0', 'next-hop-address': '10.129.221.1', 'next-hop-interface': 'ovirtmgmt'}, {'table-id': 329647082, 'destination': '10.129.221.0/24', 'next-hop-address': '10.129.221.19', 'next-hop-interface': 'ovirtmgmt'}, {'table-id': 254, 'destination': '0.0.0.0/0', 'next-hop-address': '10.129.221.1', 'next-hop-interface': 'ovirtmgmt'}], '_route_rules': [{'ip-from': '', 'ip-to': '10.129.221.0/24', 'priority': 3200, 'route-table': 329647082}, {'ip-from': '10.129.221.0/24', 'ip-to': '', 'priority': 3200, 'route-table': 329647082}]}, 'ipv6': {'enabled': False, '_routes': [], '_route_rules': []}, 'mac-address': 'E4:3D:1A:82:9F:C0', '_brport_options': {'name': 'ovirtmgmt', 'vlan': {'mode': 'access', 'tag': 1222}}, '_controller': 'vdsmbr_6SMdIi3B', '_controller_type': 'ovs-bridge'} MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,336::plugin::172::root::(apply_changes) Nispor: desired network state {'name': 'vdsmbr_6SMdIi3B', 'state': 'up', 'type': 'ovs-bridge', 'bridge': {'port': [{'name': 'bond0'}, {'name': 'ovirtmgmt', 'vlan': {'mode': 'access', 'tag': 1222}}]}, 'ipv6': {'enabled': False}} MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,340::context::148::root::(register_async) Async action: Update profile uuid:d8c57758-f784-44f4-a33a-c050ec50b9b9 iface:bond0 type:bond started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,340::context::148::root::(register_async) Async action: Add profile: 623b6249-7cfa-4813-9ef6-4870ec6f3a79, iface:bond0, type:ovs-port started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,340::context::148::root::(register_async) Async action: Add profile: ed8f5cae-5400-42fd-a72e-645e1fa61a39, iface:ovirtmgmt, type:ovs-interface started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,341::context::148::root::(register_async) Async action: Add profile: 3572b137-2091-4825-b418-4d6966430cc1, iface:ovirtmgmt, type:ovs-port started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,341::context::148::root::(register_async) Async action: Add profile: bd45447d-f241-4d14-bf5b-28c3966c011d, iface:vdsmbr_6SMdIi3B, type:ovs-bridge started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,343::context::157::root::(finish_async) Async action: Update profile uuid:d8c57758-f784-44f4-a33a-c050ec50b9b9 iface:bond0 type:bond finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,349::context::157::root::(finish_async) Async action: Add profile: 623b6249-7cfa-4813-9ef6-4870ec6f3a79, iface:bond0, type:ovs-port finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,350::context::157::root::(finish_async) Async action: Add profile: ed8f5cae-5400-42fd-a72e-645e1fa61a39, iface:ovirtmgmt, type:ovs-interface finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,350::context::157::root::(finish_async) Async action: Add profile: 3572b137-2091-4825-b418-4d6966430cc1, iface:ovirtmgmt, type:ovs-port finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,350::context::157::root::(finish_async) Async action: Add profile: bd45447d-f241-4d14-bf5b-28c3966c011d, iface:vdsmbr_6SMdIi3B, type:ovs-bridge finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,350::context::148::root::(register_async) Async action: Activate profile uuid:bd45447d-f241-4d14-bf5b-28c3966c011d iface:vdsmbr_6SMdIi3B type: ovs-bridge started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,352::active_connection::201::root::(_activate_profile_callback) Connection activation initiated: iface=vdsmbr_6SMdIi3B type=ovs-bridge con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,355::active_connection::339::root::(_activation_progress_check) Connection activation succeeded: iface=vdsmbr_6SMdIi3B, type=ovs-bridge, con_state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev_state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state_flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY of type NM.ActivationStateFlags> MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,355::context::157::root::(finish_async) Async action: Activate profile uuid:bd45447d-f241-4d14-bf5b-28c3966c011d iface:vdsmbr_6SMdIi3B type: ovs-bridge finished MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,355::context::148::root::(register_async) Async action: Reapply device config: bond0 bond d8c57758-f784-44f4-a33a-c050ec50b9b9 started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,358::device::83::root::(_reapply_callback) Device reapply failed on bond0 bond: error=nm-device-error-quark: Can't reapply changes to '802-3-ethernet.cloned-mac-address' setting (3), Fallback to device activation MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,358::context::148::root::(register_async) Async action: Activate profile uuid:d8c57758-f784-44f4-a33a-c050ec50b9b9 iface:bond0 type: bond started MainProcess|jsonrpc/3::DEBUG::2022-05-26 14:32:25,360::active_connection::201::root::(_activate_profile_callback) Connection activation initiated: iface=bond0 type=bond con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> Regards, Ravi

3 years, 1 month

1
0
0 / 0

Single-machine hosted-engine routing is not working

by Paul-Erik Törrönen

Hello, I have oVirt 4.4 (latest that can be installed on RockyLinux 8.5) running on a laptop with a self-hosted engine. The setup was working fine after installation, but once I rebooted (after having shut down all the VMs including the hosted engine), I can no longer reach the oVirt console from any other computer on the same subnet. The hosted engine does respond to ping from the host machine. Logging onto the hosted engine from the serial console and I can only ping the host machine. Any other address on the subnet is unreachable. This seems to be some internal oVirt routing issue between the host and the virtual machine since stopping the firewall service makes no difference, neither on the host nor the hosted engine. The host address is 192.168.42.2 and the hosted engine is 192.168.42.250. broker.log says: engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine cpu_load_no_engine::142::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) System load total=0.0250, engine=0.0028, non-engine=0.0222 network::88::network.Network::(action) Successfully verified network status mem_free::51::mem_free.MemFree::(action) memFree: 26884 mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt in up state engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine agent.log says: states::406::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400) ovn-controller.log says: reconnect|INFO|ssl:192.168.42.250:6642: connected ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected ovs-vswitchd.log says: connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" bridge|INFO|ovs-vswitchd (Open vSwitch) 2.11.8 memory|INFO|68900 kB peak resident set size after 10.0 seconds memory|INFO|handlers:5 ofconns:2 ports:1 revalidators:3 rules:9 connmgr|INFO|br-int<->unix#0: 6 flow_mods 10 s ago (5 adds, 1 deletes) The only actual error on the host is in ovsdb-server.log: jsonrpc|WARN|unix#13: receive error: Connection reset by peer reconnect|WARN|unix#13: connection dropped (Connection reset by peer) What else should I look at in order to figure out why the host no longer routes packets correctly from and to the hosted engine? Poltsi

3 years, 1 month

2
2
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users May 2022