I have a better solution.I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Do you agree with this?
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:Maybe because of a duplicated entry in the ovn sb db?Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikJust saw the below on host dc01-host02ovs-vsctl showf3b13557-dfb4-45a4-b6af-c995ccf68720Bridge br-intPort "ovn-95ccb0-0"Interface "ovn-95ccb0-0"type: geneveoptions: {csum="true", key=flow, remote_ip="dc01-host01"}Port "vnet10"Interface "vnet10"Port "vnet11"Interface "vnet11"Port "vnet0"Interface "vnet0"Port "vnet9"Interface "vnet9"Port "vnet8"Interface "vnet8"Port br-intInterface br-inttype: internalPort "vnet12"Interface "vnet12"Port "ovn-be3abc-0"Interface "ovn-be3abc-0"type: geneveoptions: {csum="true", key=flow, remote_ip="dc01-host02"}Port "vnet7"Interface "vnet7"Port "ovn-c4b238-0"Interface "ovn-c4b238-0"type: geneveoptions: {csum="true", key=flow, remote_ip="dc02-host01"}Port "vnet6"Interface "vnet6"ovs_version: "2.11.0"Why would this node establish a geneve tunnel to himself?Other nodes do not exhibit this behavior.On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikBelow is the output of the ovs-vsctl list interface_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2admin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"}ifindex : 34ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:02"mtu : 1442mtu_request : []name : "vnet6"ofport : 2ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : bad80911-3993-4085-a0b0-962b6c9156cdadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : []error : []external_ids : {}ifindex : 39ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 0link_speed : []link_state : uplldp : {}mac : []mac_in_use : "fe:37:52:c4:cb:03"mtu : []mtu_request : []name : "ovn-c4b238-0"ofport : 7ofport_request : []options : {csum="true", key=flow, remote_ip="192.168.121.164"}other_config : {}statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0}status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up}type : geneve_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27aadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"}ifindex : 28ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:0d"mtu : 1442mtu_request : []name : "vnet0"ofport : 1ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17admin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"}ifindex : 40ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:10"mtu : 1442mtu_request : []name : "vnet11"ofport : 10ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443deadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"}ifindex : 37ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:15"mtu : 1442mtu_request : []name : "vnet9"ofport : 5ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21admin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"}ifindex : 41ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:17"mtu : 1442mtu_request : []name : "vnet12"ofport : 11ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9badmin_state : downbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : []error : []external_ids : {}ifindex : 7ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 0link_speed : []link_state : downlldp : {}mac : []mac_in_use : "32:0a:69:67:07:4f"mtu : 1442mtu_request : []name : br-intofport : 65534ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0}status : {driver_name=openvswitch}type : internal_uuid : e2114584-8ceb-43d6-817b-e457738ead8aadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"}ifindex : 35ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:03"mtu : 1442mtu_request : []name : "vnet7"ofport : 3ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : ee16943e-d145-4080-893f-464098a6388fadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : []error : []external_ids : {}ifindex : 39ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 0link_speed : []link_state : uplldp : {}mac : []mac_in_use : "1e:50:3f:a8:42:d1"mtu : []mtu_request : []name : "ovn-be3abc-0"ofport : 8ofport_request : []options : {csum="true", key=flow, remote_ip="DC01-host02"}other_config : {}statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0}status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up}type : geneve_uuid : 86a229be-373e-4c43-b2f1-6190523ed73aadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"}ifindex : 38ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:1c"mtu : 1442mtu_request : []name : "vnet10"ofport : 6ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68acadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : []error : []external_ids : {}ifindex : 39ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 0link_speed : []link_state : uplldp : {}mac : []mac_in_use : "7a:28:24:eb:ec:d2"mtu : []mtu_request : []name : "ovn-95ccb0-0"ofport : 9ofport_request : []options : {csum="true", key=flow, remote_ip="DC01-host01"}other_config : {}statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029}status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up}type : geneve_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1dadmin_state : upbfd : {}bfd_status : {}cfm_fault : []cfm_fault_status : []cfm_flap_count : []cfm_health : []cfm_mpid : []cfm_remote_mpids : []cfm_remote_opstate : []duplex : fullerror : []external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"}ifindex : 36ingress_policing_burst: 0ingress_policing_rate: 0lacp_current : []link_resets : 1link_speed : 10000000link_state : uplldp : {}mac : []mac_in_use : "fe:6f:77:61:00:06"mtu : 1442mtu_request : []name : "vnet8"ofport : 4ofport_request : []options : {}other_config : {}statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540}status : {driver_name=tun, driver_version="1.6", firmware_version=""}type : ""I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.I really do not understand why these will create a conflict.On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02.I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know.If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.From the DC01-node02 i get the following errors:2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time.2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis.2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:062020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis.2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:032020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis.2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:152020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis.2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis.2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:022020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis.2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time.2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis.2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:062020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis.2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:032020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis.2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:152020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis.2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis.2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:022020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis.2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1cAnd this repeats forever.Looks like the southbound db is confused.Can you try to delete all chassis listed bysudo ovn-sbctl showviasudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0?if the script remove_chassis.sh is not installed, you can useinstead.Can you please also share the output ofovs-vsctl list Interfaceon the host which produced the logfile above?The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok.VMs still not able to ping each other.On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikFixed the issue.Thanks.I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also.The package is upgraded to the latest version.Once the provider was updated with the following it functioned perfectly:For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.Name: ovirt-provider-ovnDescription: oVirt network provider for OVNType: External Network ProviderNetwork Plugin: oVirt Network Provider for OVNAutomatic Synchronization: CheckedUnmanaged: UncheckedProvider URL: https:dc02-ovirt01.testdomain.com:9696Requires Authentication: CheckedUsername: admin@internalPassword: "The admin password"Protocol: HTTPSHost Name: dc02-ovirt01.testdomain.comAPI Port: 35357API Version: v2.0Tenant Name: "Empty"So now geneve tunnels are established.OVN provider is working.But VMs still do not communicated on the same VM network spanning different hosts.So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly.But traffic does not reach each other.Can you create a new external network, with port security disabled, and an IPv4 subnet?If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too.If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:Can you try again with:[OVN REMOTE]ovn-remote=ssl:127.0.0.1:6641[SSL]https-enabled=falsessl-cacert-file=/etc/pki/ovirt-engine/ca.pemssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cerssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass[OVIRT]ovirt-sso-client-secret=random_testovirt-host=https://dc02-ovirt01.testdomain.com:443ovirt-sso-client-id=ovirt-provider-ovnovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem[NETWORK]port-security-enabled-default=True[PROVIDER]provider-host=dc02-ovirt01.testdomain.comPlease note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine.So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should usehttps-enabled=falseOn Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:This is the updated one:# This file is automatically generated by engine-setup. Please do not edit manually[OVN REMOTE]ovn-remote=ssl:127.0.0.1:6641[SSL]https-enabled=truessl-cacert-file=/etc/pki/ovirt-engine/ca.pemssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cerssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass[OVIRT]ovirt-sso-client-secret=random_textovirt-host=https://dc02-ovirt01.testdomain.com:443ovirt-sso-client-id=ovirt-provider-ovnovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem[NETWORK]port-security-enabled-default=True[PROVIDER]provider-host=dc02-ovirt01.testdomain.com[AUTH]auth-plugin=auth.plugins.static_token:NoAuthPluginHowever, it still does not connect.It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.Yes we've got ovirt for about a year now from about version 4.1This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:There is a file with the below entriesImpressive, do you know when this config file was created and if it was manually modified?Is this an upgrade from oVirt 4.1?[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf# This file is automatically generated by engine-setup. Please do not edit manually[OVN REMOTE]ovn-remote=tcp:127.0.0.1:6641[SSL]https-enabled=falsessl-cacert-file=/etc/pki/ovirt-engine/ca.pemssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cerssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass[OVIRT]ovirt-sso-client-secret=random_testovirt-host=https://dc02-ovirt01.testdomain.com:443ovirt-sso-client-id=ovirt-provider-ovnovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem[NETWORK]port-security-enabled-default=True[PROVIDER]The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?provider-host=dc02-ovirt01.testdomain.comChanging the file should improve, but better create a backup into another diretory before modification.The only required change isfromovn-remote=tcp:127.0.0.1:6641toovn-remote=ssl:127.0.0.1:6641On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikThat immediately fixed the geneve tunnels between all hosts.thanks for the feedback.However, the ovn provider is not broken.After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.While checking the logs on the ovirt engine i noticed the below:Failed to synchronize networks of Provider ovirt-provider-ovn.The ovn-provider configure on the engine is the below:Name: ovirt-provider-ovnDescription: oVirt network provider for OVNType: External Network ProviderNetwork Plugin: oVirt Network Provider for OVNAutomatic Synchronization: CheckedUnmanaged: UncheckedProvider URL: http:localhost:9696Requires Authentication: CheckedUsername: admin@internalPassword: "The admin password"Protocol: hTTPHost Name: dc02-ovirt01API Port: 35357API Version: v2.0Tenant Name: "Empty"In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?Is there a file in /etc/ovirt-provider-ovn/conf.d/ ?engine-setup should have created one.If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf :[PROVIDER]
provider-host=REPLACE_WITH_FQDN
[SSL]
ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer
ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass
ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem
https-enabled=true
[OVN REMOTE]
ovn-remote=ssl:127.0.0.1:6641
[AUTH]
auth-plugin=auth.plugins.static_token:NoAuthPlugin
[NETWORK]
port-security-enabled-default=Trueand restart the ovirt-provider-ovn service.From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered2020-09-15 15:07:33,579 root Starting server2020-09-15 15:07:33,579 root Version: 1.2.29-12020-09-15 15:07:33,579 root Build date: 201912171252412020-09-15 15:07:33,579 root Githash: cb5a80d2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports2020-09-15 15:08:26,582 root Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1Traceback (most recent call last):File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_requestmethod, path_parts, contentFile "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_requestreturn self.call_response_handler(handler, content, parameters)File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handlerwith NeutronApi() as ovn_north:File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__self.ovsidl, self.idl = ovn_connection.connect()File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connectovnconst.OVN_NORTHBOUNDFile "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_serverhelper = idlutils.get_schema_helper(connection_string, schema_name)File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper'err': os.strerror(err)})Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikWhen these commands are used on the ovirt-engine host the output is the one depicted in your email.For your reference see also below:[root@ath01-ovirt01 certs]# ovn-nbctl get-sslPrivate key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopassCertificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cerCA Certificate: /etc/pki/ovirt-engine/ca.pemBootstrap: false[root@ath01-ovirt01 certs]# ovn-nbctl get-connectionptcp:6641[root@ath01-ovirt01 certs]# ovn-sbctl get-sslPrivate key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopassCertificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cerCA Certificate: /etc/pki/ovirt-engine/ca.pemBootstrap: false[root@ath01-ovirt01 certs]# ovn-sbctl get-connectionread-write role="" ptcp:6642^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl.engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows,that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen.This can be fixed manually by executing the following commands on engine's machine:ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem
ovn-nbctl set-connection pssl:6641
ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem
ovn-sbctl set-connection pssl:6642The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.When i try the above commands on the node hosts the following happens:[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-*-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass-rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass-rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12ovn-nbctl get-ssl / get-connectionovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory)The above i believe is expected since no northbound connections should be established from the host nodes.ovn-sbctl get-ssl /get-connectionThe output is stuck till i terminate it.Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central.On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.For the requested logs the below are found in the ovsdb-server-sb.log2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error)2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error)2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error)2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error)2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+00162020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error)2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error)2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error)2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error)2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+00162020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error)2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error)2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error)2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error)2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error)2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+00162020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error)2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)How can we fix these SSL errors?I addressed this above.I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?Can you please check the output ofovn-nbctl get-sslovn-nbctl get-connectionovn-sbctl get-sslovn-sbctl get-connectionls -l /etc/pki/ovirt-engine/keys/ovn-*it should be similar to[root@ovirt-43 ~]# ovn-nbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ovirt-43 ~]# ovn-nbctl get-connection
pssl:6641:[::]
[root@ovirt-43 ~]# ovn-sbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ovirt-43 ~]# ovn-sbctl get-connection
read-write role="" pssl:6642:[::][root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
-rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
-rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12
-rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
-rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:I did a restart of the ovn-controller, this is the output of the ovn-controller.log2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log
2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute.
2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting...
2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute.
2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error)
2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting...
2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error)
2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect
2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting...
2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error)
2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect
2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting...
2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error)
2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further loggingI have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DCThis is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01.Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
hostname: "dc02-host01"
Encap geneve
ip: "DC02-host01_IP"
options: {csum="true"}
Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
hostname: "DC01-host02"
Encap geneve
ip: "DC01-host02"
options: {csum="true"}I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controllerOn Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:Please include ovirt-users list in your reply, to share the knowledge and experience with the community!On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:Ok below the output per node and DCDC01node01[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote"ssl:OVIRT_ENGINE_IP:6642"[root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-typegeneve[root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip"OVIRTMGMT_IP_DC01-NODE01"node02[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote"ssl:OVIRT_ENGINE_IP:6642"[root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-typegeneve[root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ipDC02"OVIRTMGMT_IP_DC01-NODE02"node01[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote"ssl:OVIRT_ENGINE_IP:6642"[root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-typegeneve[root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip"OVIRTMGMT_IP_DC02-NODE01"Looks good.DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?You can also try restart the ovn configuration on all hosts by executingvdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IPon each host, this would triggerinternally.So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.Me too.On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:Hi DominikOVN is selected as the default network provider on the clusters and the hosts.sounds good.This configuration is required already during the host is added to oVirt Engine, because OVN is configured during this step.The "ovn-sbctl show" works on the ovirt engine and shows only two hosts, 1 per DC.Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"hostname: "dc01-node02"Encap geneveip: "X.X.X.X"options: {csum="true"}Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"hostname: "dc02-node1"Encap geneveip: "A.A.A.A"options: {csum="true"}The new node is not listed (dc01-node1).When executed on the nodes the same command (ovn-sbctl show) times-out on all nodes.....The output of the /var/log/openvswitch/ovn-conntroller.log lists on all logs2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: unexpected SSL connection closeCan you please compare the output ofovs-vsctl --no-wait get open . external-ids:ovn-remoteovs-vsctl --no-wait get open . external-ids:ovn-encap-typeovs-vsctl --no-wait get open . external-ids:ovn-encap-ipof the working hosts, e.g. dc01-node02, and the failing host dc01-node1?This should point us the relevant difference in the configuration.Please include ovirt-users list in your replay, to share the knowledge and experience with the community.Thank youBest regardsKonstantinos BetsisOn Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> wrote:On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> wrote:Hi all
We have a small installation based on OVIRT 4.3.
1 Cluster is based on Centos 7 and the other on OVIRT NG Node image.
The environment was stable till an upgrade took place a couple of months ago.
As such we had to re-install one of the Centos 7 node and start from scratch.To trigger the automatic configuration of the host, it is required to configure ovirt-provider-ovn as the default network provider for the cluster before adding the host to oVirt.Even though the installation completed successfully and VMs are created, the following are not working as expected:
1. ovn geneve tunnels are not established with the other Centos 7 node in the cluster.
2. Centos 7 node is configured by ovirt engine however no geneve tunnel is established when "ovn-sbctl show" is issued on the engine.Does "ovn-sbctl show" list the hosts?3. no flows are shown on the engine on port 6642 for the ovs db.
Does anyone have any experience on how to troubleshoot OVN on ovirt?
/var/log/openvswitch/ovncontroller.log on the host should contain a helpful hint.Thank you
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EKFITPR72LBPA5A43WWW/