Re: 4.3.x upgrade and issues with OVN
by Weber, Charles (NIH/NIA/IRP) [E]
Configuration with versions
Next email will have log files.
2 sites
First site: Bayview
4 nodes BL460 gen9 with 4 x 10G nics
Node 1-3 have not been changed since 4.3.2 upgrade. These nodes have the network sync issue and cannot migrate VMs.
OS Version:
RHEL - 7 - 6.1810.2.el7.centos
OS Description:
CentOS Linux 7 (Core)
Kernel Version:
3.10.0 - 957.10.1.el7.x86_64
KVM Version:
2.12.0 - 18.el7_6.3.1
LIBVIRT Version:
libvirt-4.5.0-10.el7_6.6
VDSM Version:
vdsm-4.30.11-1.el7
SPICE Version:
0.14.0 - 6.el7_6.1
CEPH Version:
librbd1-10.2.5-4.el7
Open vSwitch Version:
openvswitch-2.10.1-3.el7
Kernel Features:
PTI: 1, IBRS: 0, RETP: 1
VNC Encryption:
Disabled
I evacuated node 4 and did update to 4.3.3, still had issues so tried ISO OVNode install. It does not have network sync issue. However I cannot upgrade all the nodes without scheduling cluster downtime.
OS Version:
RHEL - 7 - 6.1810.2.el7.centos
OS Description:
oVirt Node 4.3.3.1
Kernel Version:
3.10.0 - 957.10.1.el7.x86_64
KVM Version:
2.12.0 - 18.el7_6.3.1
LIBVIRT Version:
libvirt-4.5.0-10.el7_6.6
VDSM Version:
vdsm-4.30.13-1.el7
SPICE Version:
0.14.0 - 6.el7_6.1
CEPH Version:
librbd1-10.2.5-4.el7
Open vSwitch Version:
openvswitch-2.10.1-3.el7
Kernel Features:
PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
VNC Encryption:
Disabled
1 engine, DL360g10. 4.3.3.5-1.el7 CentOS 7 patched up to today
Storage: Dell Unity ISCSI
What I see in the engine is that the first 3 nodes all have the public network listed as out of sync with the DC. I can migrate form node 4 to the other 3 nodes but cannot migrate off the other 3 nodes. I also cannot sync the network on the other three nodes. There have been no recent network changes.
After the 4.3.3 upgrade I initially found some curious OVN errors in the logfile on nodes 3 and 4. Nodes 1 and 2 do not have these errors. However the engine did have 2 extra OVN ports defined.
ovs-vsctl show
be10cd3d-85fe-4985-9635-f447bfbc5e25
Bridge br-int
fail_mode: secure
Port "ovn-877214-0"
Interface "ovn-877214-0"
type: geneve
options: {csum="true", key=flow, remote_ip="137.187.160.14"}
error: "could not add network device ovn-877214-0 to ofproto (File exists)"
Port "ovn-48e040-0"
Interface "ovn-48e040-0"
type: geneve
options: {csum="true", key=flow, remote_ip="137.187.160.18"}
Port "ovn-d6eaa1-0"
Interface "ovn-d6eaa1-0"
type: geneve
options: {csum="true", key=flow, remote_ip="137.187.160.13"}
error: "could not add network device ovn-d6eaa1-0 to ofproto (File exists)"
Port br-int
Interface br-int
type: internal
Port "ovn-f0f789-0"
Interface "ovn-f0f789-0"
type: geneve
options: {csum="true", key=flow, remote_ip="137.187.160.13"}
Second site:Harbor
I upgraded to 4.3.2 and stopped any upgrades until I could deal with the migration and restarting VDSMD issues.
Nodes are 3 supermicro 1U with 2x10G nics
All 3 nodes are the same and from the 4.3.2 update.
OS Version:
RHEL - 7 - 6.1810.2.el7.centos
OS Description:
CentOS Linux 7 (Core)
Kernel Version:
3.10.0 - 957.10.1.el7.x86_64
KVM Version:
2.12.0 - 18.el7_6.3.1
LIBVIRT Version:
libvirt-4.5.0-10.el7_6.6
VDSM Version:
vdsm-4.30.11-1.el7
SPICE Version:
0.14.0 - 6.el7_6.1
GlusterFS Version:
[N/A]
CEPH Version:
librbd1-10.2.5-4.el7
Open vSwitch Version:
openvswitch-2.10.1-3.el7
Kernel Features:
PTI: 1, IBRS: 0, RETP: 1
VNC Encryption:
Disabled
1 engine DL360 g10 with 2x10G nics running 4.3.2.1-1.el7
Storage: Dell Unity ISCSI
The only issue with Harbor site while running 4.3.2 is that when I migrate a node it never finishes in the engine until I restart VDSMD on the original host. It did not exhibit this issue with 4.2.x