Re: OVN Geneve tunnels not been established

I did a restart of the ovn-controller, this is the output of the ovn-controller.log 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan. After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"} I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Ok below the output per node and DC DC01 node01
[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE01*"
node02
[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE02*"
DC02 node01
[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC02-NODE01*"
Looks good.
DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.
Me too.
On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:
On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
OVN is selected as the default network provider on the clusters and the hosts.
sounds good. This configuration is required already during the host is added to oVirt Engine, because OVN is configured during this step.
The "ovn-sbctl show" works on the ovirt engine and shows only two hosts, 1 per DC.
Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc01-node02" Encap geneve ip: "X.X.X.X" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "dc02-node1" Encap geneve ip: "A.A.A.A" options: {csum="true"}
The new node is not listed (dc01-node1).
When executed on the nodes the same command (ovn-sbctl show) times-out on all nodes.....
The output of the /var/log/openvswitch/ovn-conntroller.log lists on all logs
2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
Can you please compare the output of
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
of the working hosts, e.g. dc01-node02, and the failing host dc01-node1? This should point us the relevant difference in the configuration.
Please include ovirt-users list in your replay, to share the knowledge and experience with the community.
Thank you Best regards Konstantinos Betsis
On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> wrote:
On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> wrote:
Hi all
We have a small installation based on OVIRT 4.3. 1 Cluster is based on Centos 7 and the other on OVIRT NG Node image.
The environment was stable till an upgrade took place a couple of months ago. As such we had to re-install one of the Centos 7 node and start from scratch.
To trigger the automatic configuration of the host, it is required to configure ovirt-provider-ovn as the default network provider for the cluster before adding the host to oVirt.
Even though the installation completed successfully and VMs are created, the following are not working as expected: 1. ovn geneve tunnels are not established with the other Centos 7 node in the cluster. 2. Centos 7 node is configured by ovirt engine however no geneve tunnel is established when "ovn-sbctl show" is issued on the engine.
Does "ovn-sbctl show" list the hosts?
3. no flows are shown on the engine on port 6642 for the ovs db.
Does anyone have any experience on how to troubleshoot OVN on ovirt?
/var/log/openvswitch/ovncontroller.log on the host should contain a helpful hint.
Thank you _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK...

Looks still like the ovn-controller on the host has problems communicating with ovn-southbound. Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ? Can you please check the output of ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-* it should be similar to [root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12 On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Ok below the output per node and DC DC01 node01
[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE01*"
node02
[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE02*"
DC02 node01
[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC02-NODE01*"
Looks good.
DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.
Me too.
On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:
On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
OVN is selected as the default network provider on the clusters and the hosts.
sounds good. This configuration is required already during the host is added to oVirt Engine, because OVN is configured during this step.
The "ovn-sbctl show" works on the ovirt engine and shows only two hosts, 1 per DC.
Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc01-node02" Encap geneve ip: "X.X.X.X" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "dc02-node1" Encap geneve ip: "A.A.A.A" options: {csum="true"}
The new node is not listed (dc01-node1).
When executed on the nodes the same command (ovn-sbctl show) times-out on all nodes.....
The output of the /var/log/openvswitch/ovn-conntroller.log lists on all logs
2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
Can you please compare the output of
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
of the working hosts, e.g. dc01-node02, and the failing host dc01-node1? This should point us the relevant difference in the configuration.
Please include ovirt-users list in your replay, to share the knowledge and experience with the community.
Thank you Best regards Konstantinos Betsis
On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> wrote:
On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> wrote:
> Hi all > > We have a small installation based on OVIRT 4.3. > 1 Cluster is based on Centos 7 and the other on OVIRT NG Node image. > > The environment was stable till an upgrade took place a couple of > months ago. > As such we had to re-install one of the Centos 7 node and start from > scratch. >
To trigger the automatic configuration of the host, it is required to configure ovirt-provider-ovn as the default network provider for the cluster before adding the host to oVirt.
> Even though the installation completed successfully and VMs are > created, the following are not working as expected: > 1. ovn geneve tunnels are not established with the other Centos 7 > node in the cluster. > 2. Centos 7 node is configured by ovirt engine however no geneve > tunnel is established when "ovn-sbctl show" is issued on the engine. >
Does "ovn-sbctl show" list the hosts?
> 3. no flows are shown on the engine on port 6642 for the ovs db. > > Does anyone have any experience on how to troubleshoot OVN on ovirt? > > /var/log/openvswitch/ovncontroller.log on the host should contain a helpful hint.
> Thank you > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >

Hi Dominik When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below: [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641 [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642 [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12 When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes. ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it. For the requested logs the below are found in the ovsdb-server-sb.log 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error) How can we fix these SSL errors? I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node. On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Ok below the output per node and DC DC01 node01
[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE01*"
node02
[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE02*"
DC02 node01
[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC02-NODE01*"
Looks good.
DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.
Me too.
On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:
On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
OVN is selected as the default network provider on the clusters and the hosts.
sounds good. This configuration is required already during the host is added to oVirt Engine, because OVN is configured during this step.
The "ovn-sbctl show" works on the ovirt engine and shows only two hosts, 1 per DC.
Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc01-node02" Encap geneve ip: "X.X.X.X" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "dc02-node1" Encap geneve ip: "A.A.A.A" options: {csum="true"}
The new node is not listed (dc01-node1).
When executed on the nodes the same command (ovn-sbctl show) times-out on all nodes.....
The output of the /var/log/openvswitch/ovn-conntroller.log lists on all logs
2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: unexpected SSL connection close
Can you please compare the output of
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
of the working hosts, e.g. dc01-node02, and the failing host dc01-node1? This should point us the relevant difference in the configuration.
Please include ovirt-users list in your replay, to share the knowledge and experience with the community.
Thank you Best regards Konstantinos Betsis
On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> wrote:
> > > On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> > wrote: > >> Hi all >> >> We have a small installation based on OVIRT 4.3. >> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node image. >> >> The environment was stable till an upgrade took place a couple of >> months ago. >> As such we had to re-install one of the Centos 7 node and start >> from scratch. >> > > To trigger the automatic configuration of the host, it is required > to configure ovirt-provider-ovn as the default network provider for the > cluster before adding the host to oVirt. > > >> Even though the installation completed successfully and VMs are >> created, the following are not working as expected: >> 1. ovn geneve tunnels are not established with the other Centos 7 >> node in the cluster. >> 2. Centos 7 node is configured by ovirt engine however no geneve >> tunnel is established when "ovn-sbctl show" is issued on the engine. >> > > Does "ovn-sbctl show" list the hosts? > > >> 3. no flows are shown on the engine on port 6642 for the ovs db. >> >> Does anyone have any experience on how to troubleshoot OVN on ovirt? >> >> > /var/log/openvswitch/ovncontroller.log on the host should contain a > helpful hint. > > > >> Thank you >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-leave@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >> >

On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642 The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Ok below the output per node and DC DC01 node01
[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE01*"
node02
[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE02*"
DC02 node01
[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC02-NODE01*"
Looks good.
DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.
Me too.
On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:
On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > OVN is selected as the default network provider on the clusters and > the hosts. > > sounds good. This configuration is required already during the host is added to oVirt Engine, because OVN is configured during this step.
> The "ovn-sbctl show" works on the ovirt engine and shows only two > hosts, 1 per DC. > > Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" > hostname: "dc01-node02" > Encap geneve > ip: "X.X.X.X" > options: {csum="true"} > Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" > hostname: "dc02-node1" > Encap geneve > ip: "A.A.A.A" > options: {csum="true"} > > > The new node is not listed (dc01-node1). > > When executed on the nodes the same command (ovn-sbctl show) > times-out on all nodes..... > > The output of the /var/log/openvswitch/ovn-conntroller.log lists on > all logs > > 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: > unexpected SSL connection close > > > Can you please compare the output of
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
of the working hosts, e.g. dc01-node02, and the failing host dc01-node1? This should point us the relevant difference in the configuration.
Please include ovirt-users list in your replay, to share the knowledge and experience with the community.
> Thank you > Best regards > Konstantinos Betsis > > > On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> >> wrote: >> >>> Hi all >>> >>> We have a small installation based on OVIRT 4.3. >>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>> image. >>> >>> The environment was stable till an upgrade took place a couple of >>> months ago. >>> As such we had to re-install one of the Centos 7 node and start >>> from scratch. >>> >> >> To trigger the automatic configuration of the host, it is required >> to configure ovirt-provider-ovn as the default network provider for the >> cluster before adding the host to oVirt. >> >> >>> Even though the installation completed successfully and VMs are >>> created, the following are not working as expected: >>> 1. ovn geneve tunnels are not established with the other Centos 7 >>> node in the cluster. >>> 2. Centos 7 node is configured by ovirt engine however no geneve >>> tunnel is established when "ovn-sbctl show" is issued on the engine. >>> >> >> Does "ovn-sbctl show" list the hosts? >> >> >>> 3. no flows are shown on the engine on port 6642 for the ovs db. >>> >>> Does anyone have any experience on how to troubleshoot OVN on >>> ovirt? >>> >>> >> /var/log/openvswitch/ovncontroller.log on the host should contain a >> helpful hint. >> >> >> >>> Thank you >>> _______________________________________________ >>> Users mailing list -- users@ovirt.org >>> To unsubscribe send an email to users-leave@ovirt.org >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>> >>

Hi Dominik That immediately fixed the geneve tunnels between all hosts. However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01. While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn. The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty" In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn? From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered 2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1 When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails. On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Ok below the output per node and DC DC01 node01
[root@dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE01*"
node02
[root@dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC01-NODE02*"
DC02 node01
[root@dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-remote "ssl:*OVIRT_ENGINE_IP*:6642" [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-type geneve [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
"*OVIRTMGMT_IP_DC02-NODE01*"
Looks good.
DC01 node01 and node02 share the same VM networks and VMs deployed on top of them cannot talk to VM on the other hypervisor.
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
So I would expect to see the same output for node01 to have a geneve tunnel to node02 and vice versa.
Me too.
On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Hi Dominik >> >> OVN is selected as the default network provider on the clusters and >> the hosts. >> >> > sounds good. > This configuration is required already during the host is added to > oVirt Engine, because OVN is configured during this step. > > >> The "ovn-sbctl show" works on the ovirt engine and shows only two >> hosts, 1 per DC. >> >> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >> hostname: "dc01-node02" >> Encap geneve >> ip: "X.X.X.X" >> options: {csum="true"} >> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >> hostname: "dc02-node1" >> Encap geneve >> ip: "A.A.A.A" >> options: {csum="true"} >> >> >> The new node is not listed (dc01-node1). >> >> When executed on the nodes the same command (ovn-sbctl show) >> times-out on all nodes..... >> >> The output of the /var/log/openvswitch/ovn-conntroller.log lists on >> all logs >> >> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >> unexpected SSL connection close >> >> >> > Can you please compare the output of > > ovs-vsctl --no-wait get open . external-ids:ovn-remote > ovs-vsctl --no-wait get open . external-ids:ovn-encap-type > ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip > > of the working hosts, e.g. dc01-node02, and the failing host > dc01-node1? > This should point us the relevant difference in the configuration. > > Please include ovirt-users list in your replay, to share > the knowledge and experience with the community. > > > >> Thank you >> Best regards >> Konstantinos Betsis >> >> >> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <k.betsis@gmail.com> >>> wrote: >>> >>>> Hi all >>>> >>>> We have a small installation based on OVIRT 4.3. >>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>> image. >>>> >>>> The environment was stable till an upgrade took place a couple of >>>> months ago. >>>> As such we had to re-install one of the Centos 7 node and start >>>> from scratch. >>>> >>> >>> To trigger the automatic configuration of the host, it is required >>> to configure ovirt-provider-ovn as the default network provider for the >>> cluster before adding the host to oVirt. >>> >>> >>>> Even though the installation completed successfully and VMs are >>>> created, the following are not working as expected: >>>> 1. ovn geneve tunnels are not established with the other Centos 7 >>>> node in the cluster. >>>> 2. Centos 7 node is configured by ovirt engine however no geneve >>>> tunnel is established when "ovn-sbctl show" is issued on the engine. >>>> >>> >>> Does "ovn-sbctl show" list the hosts? >>> >>> >>>> 3. no flows are shown on the engine on port 6642 for the ovs db. >>>> >>>> Does anyone have any experience on how to troubleshoot OVN on >>>> ovirt? >>>> >>>> >>> /var/log/openvswitch/ovncontroller.log on the host should contain >>> a helpful hint. >>> >>> >>> >>>> Thank you >>>> _______________________________________________ >>>> Users mailing list -- users@ovirt.org >>>> To unsubscribe send an email to users-leave@ovirt.org >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>> >>>

On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
Please include ovirt-users list in your reply, to share the knowledge and experience with the community!
On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Ok below the output per node and DC > DC01 > node01 > > [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-remote > "ssl:*OVIRT_ENGINE_IP*:6642" > [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-type > geneve > [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-ip > > "*OVIRTMGMT_IP_DC01-NODE01*" > > node02 > > [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-remote > "ssl:*OVIRT_ENGINE_IP*:6642" > [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-type > geneve > [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-ip > > "*OVIRTMGMT_IP_DC01-NODE02*" > > DC02 > node01 > > [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-remote > "ssl:*OVIRT_ENGINE_IP*:6642" > [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-type > geneve > [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . > external-ids:ovn-encap-ip > > "*OVIRTMGMT_IP_DC02-NODE01*" > > Looks good.
> DC01 node01 and node02 share the same VM networks and VMs deployed > on top of them cannot talk to VM on the other hypervisor. >
Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe restarting ovn-controller creates more helpful log messages?
You can also try restart the ovn configuration on all hosts by executing vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP on each host, this would trigger
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... internally.
> So I would expect to see the same output for node01 to have a geneve > tunnel to node02 and vice versa. > > Me too.
> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> Hi Dominik >>> >>> OVN is selected as the default network provider on the clusters >>> and the hosts. >>> >>> >> sounds good. >> This configuration is required already during the host is added to >> oVirt Engine, because OVN is configured during this step. >> >> >>> The "ovn-sbctl show" works on the ovirt engine and shows only two >>> hosts, 1 per DC. >>> >>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>> hostname: "dc01-node02" >>> Encap geneve >>> ip: "X.X.X.X" >>> options: {csum="true"} >>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>> hostname: "dc02-node1" >>> Encap geneve >>> ip: "A.A.A.A" >>> options: {csum="true"} >>> >>> >>> The new node is not listed (dc01-node1). >>> >>> When executed on the nodes the same command (ovn-sbctl show) >>> times-out on all nodes..... >>> >>> The output of the /var/log/openvswitch/ovn-conntroller.log lists >>> on all logs >>> >>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>> unexpected SSL connection close >>> >>> >>> >> Can you please compare the output of >> >> ovs-vsctl --no-wait get open . external-ids:ovn-remote >> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >> >> of the working hosts, e.g. dc01-node02, and the failing host >> dc01-node1? >> This should point us the relevant difference in the configuration. >> >> Please include ovirt-users list in your replay, to share >> the knowledge and experience with the community. >> >> >> >>> Thank you >>> Best regards >>> Konstantinos Betsis >>> >>> >>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>> dholler@redhat.com> wrote: >>> >>>> >>>> >>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Hi all >>>>> >>>>> We have a small installation based on OVIRT 4.3. >>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>>> image. >>>>> >>>>> The environment was stable till an upgrade took place a couple >>>>> of months ago. >>>>> As such we had to re-install one of the Centos 7 node and start >>>>> from scratch. >>>>> >>>> >>>> To trigger the automatic configuration of the host, it is >>>> required to configure ovirt-provider-ovn as the default network provider >>>> for the cluster before adding the host to oVirt. >>>> >>>> >>>>> Even though the installation completed successfully and VMs are >>>>> created, the following are not working as expected: >>>>> 1. ovn geneve tunnels are not established with the other Centos >>>>> 7 node in the cluster. >>>>> 2. Centos 7 node is configured by ovirt engine however no geneve >>>>> tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>> >>>> >>>> Does "ovn-sbctl show" list the hosts? >>>> >>>> >>>>> 3. no flows are shown on the engine on port 6642 for the ovs db. >>>>> >>>>> Does anyone have any experience on how to troubleshoot OVN on >>>>> ovirt? >>>>> >>>>> >>>> /var/log/openvswitch/ovncontroller.log on the host should contain >>>> a helpful hint. >>>> >>>> >>>> >>>>> Thank you >>>>> _______________________________________________ >>>>> Users mailing list -- users@ovirt.org >>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>> >>>>

There is a file with the below entries [root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything? On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
I did a restart of the ovn-controller, this is the output of the ovn-controller.log
2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 2 seconds before reconnect 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: waiting 4 seconds before reconnect 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connecting... 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected SSL connection close 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: connection attempt failed (Protocol error) 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: continuing to reconnect in the background but suppressing further logging
I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can redo it if you wan.
After the restart of the ovn-controller the OVIRT ENGINE still shows only two geneve connections one with DC01-host02 and DC02-host01. Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" hostname: "dc02-host01" Encap geneve ip: "DC02-host01_IP" options: {csum="true"} Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" hostname: "DC01-host02" Encap geneve ip: "DC01-host02" options: {csum="true"}
I've re-done the vdsm-tool command and nothing changed.... again....with the same errors as the systemctl restart ovn-controller
On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> wrote:
> Please include ovirt-users list in your reply, to share > the knowledge and experience with the community! > > On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Ok below the output per node and DC >> DC01 >> node01 >> >> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-remote >> "ssl:*OVIRT_ENGINE_IP*:6642" >> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-type >> geneve >> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-ip >> >> "*OVIRTMGMT_IP_DC01-NODE01*" >> >> node02 >> >> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-remote >> "ssl:*OVIRT_ENGINE_IP*:6642" >> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-type >> geneve >> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-ip >> >> "*OVIRTMGMT_IP_DC01-NODE02*" >> >> DC02 >> node01 >> >> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-remote >> "ssl:*OVIRT_ENGINE_IP*:6642" >> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-type >> geneve >> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >> external-ids:ovn-encap-ip >> >> "*OVIRTMGMT_IP_DC02-NODE01*" >> >> > Looks good. > > >> DC01 node01 and node02 share the same VM networks and VMs deployed >> on top of them cannot talk to VM on the other hypervisor. >> > > Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe > restarting ovn-controller creates more helpful log messages? > > You can also try restart the ovn configuration on all hosts by > executing > vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP > on each host, this would trigger > > https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... > internally. > > >> So I would expect to see the same output for node01 to have a >> geneve tunnel to node02 and vice versa. >> >> > Me too. > > >> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> Hi Dominik >>>> >>>> OVN is selected as the default network provider on the clusters >>>> and the hosts. >>>> >>>> >>> sounds good. >>> This configuration is required already during the host is added to >>> oVirt Engine, because OVN is configured during this step. >>> >>> >>>> The "ovn-sbctl show" works on the ovirt engine and shows only two >>>> hosts, 1 per DC. >>>> >>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>> hostname: "dc01-node02" >>>> Encap geneve >>>> ip: "X.X.X.X" >>>> options: {csum="true"} >>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>> hostname: "dc02-node1" >>>> Encap geneve >>>> ip: "A.A.A.A" >>>> options: {csum="true"} >>>> >>>> >>>> The new node is not listed (dc01-node1). >>>> >>>> When executed on the nodes the same command (ovn-sbctl show) >>>> times-out on all nodes..... >>>> >>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists >>>> on all logs >>>> >>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>> unexpected SSL connection close >>>> >>>> >>>> >>> Can you please compare the output of >>> >>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>> >>> of the working hosts, e.g. dc01-node02, and the failing host >>> dc01-node1? >>> This should point us the relevant difference in the configuration. >>> >>> Please include ovirt-users list in your replay, to share >>> the knowledge and experience with the community. >>> >>> >>> >>>> Thank you >>>> Best regards >>>> Konstantinos Betsis >>>> >>>> >>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Hi all >>>>>> >>>>>> We have a small installation based on OVIRT 4.3. >>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>>>> image. >>>>>> >>>>>> The environment was stable till an upgrade took place a couple >>>>>> of months ago. >>>>>> As such we had to re-install one of the Centos 7 node and start >>>>>> from scratch. >>>>>> >>>>> >>>>> To trigger the automatic configuration of the host, it is >>>>> required to configure ovirt-provider-ovn as the default network provider >>>>> for the cluster before adding the host to oVirt. >>>>> >>>>> >>>>>> Even though the installation completed successfully and VMs are >>>>>> created, the following are not working as expected: >>>>>> 1. ovn geneve tunnels are not established with the other Centos >>>>>> 7 node in the cluster. >>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>> >>>>> >>>>> Does "ovn-sbctl show" list the hosts? >>>>> >>>>> >>>>>> 3. no flows are shown on the engine on port 6642 for the ovs db. >>>>>> >>>>>> Does anyone have any experience on how to troubleshoot OVN on >>>>>> ovirt? >>>>>> >>>>>> >>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>> contain a helpful hint. >>>>> >>>>> >>>>> >>>>>> Thank you >>>>>> _______________________________________________ >>>>>> Users mailing list -- users@ovirt.org >>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>>> oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> List Archives: >>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>> >>>>>

On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
Looks still like the ovn-controller on the host has problems communicating with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log, especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-nbctl get-connection pssl:6641:[::] [root@ovirt-43 ~]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ovirt-43 ~]# ovn-sbctl get-connection read-write role="" pssl:6642:[::] [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2709 Oct 14 2019 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> I did a restart of the ovn-controller, this is the output of the > ovn-controller.log > > 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file > /var/log/openvswitch/ovn-controller.log > 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: > connecting... > 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: > connected > 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force > recompute. > 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connecting... > 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, > force recompute. > 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: > unexpected SSL connection close > 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connection attempt failed (Protocol error) > 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connecting... > 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: > unexpected SSL connection close > 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connection attempt failed (Protocol error) > 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > waiting 2 seconds before reconnect > 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connecting... > 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: > unexpected SSL connection close > 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connection attempt failed (Protocol error) > 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > waiting 4 seconds before reconnect > 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connecting... > 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: > unexpected SSL connection close > 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > connection attempt failed (Protocol error) > 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: > continuing to reconnect in the background but suppressing further logging > > > I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP > OVIRTMGMT_NETWORK_DC > This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i > can redo it if you wan. > > After the restart of the ovn-controller the OVIRT ENGINE still shows > only two geneve connections one with DC01-host02 and DC02-host01. > Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" > hostname: "dc02-host01" > Encap geneve > ip: "DC02-host01_IP" > options: {csum="true"} > Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" > hostname: "DC01-host02" > Encap geneve > ip: "DC01-host02" > options: {csum="true"} > > I've re-done the vdsm-tool command and nothing changed.... > again....with the same errors as the systemctl restart ovn-controller > > On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> > wrote: > >> Please include ovirt-users list in your reply, to share >> the knowledge and experience with the community! >> >> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> Ok below the output per node and DC >>> DC01 >>> node01 >>> >>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-remote >>> "ssl:*OVIRT_ENGINE_IP*:6642" >>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-type >>> geneve >>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-ip >>> >>> "*OVIRTMGMT_IP_DC01-NODE01*" >>> >>> node02 >>> >>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-remote >>> "ssl:*OVIRT_ENGINE_IP*:6642" >>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-type >>> geneve >>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-ip >>> >>> "*OVIRTMGMT_IP_DC01-NODE02*" >>> >>> DC02 >>> node01 >>> >>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-remote >>> "ssl:*OVIRT_ENGINE_IP*:6642" >>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-type >>> geneve >>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>> external-ids:ovn-encap-ip >>> >>> "*OVIRTMGMT_IP_DC02-NODE01*" >>> >>> >> Looks good. >> >> >>> DC01 node01 and node02 share the same VM networks and VMs deployed >>> on top of them cannot talk to VM on the other hypervisor. >>> >> >> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe >> restarting ovn-controller creates more helpful log messages? >> >> You can also try restart the ovn configuration on all hosts by >> executing >> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >> on each host, this would trigger >> >> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >> internally. >> >> >>> So I would expect to see the same output for node01 to have a >>> geneve tunnel to node02 and vice versa. >>> >>> >> Me too. >> >> >>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>> dholler@redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Hi Dominik >>>>> >>>>> OVN is selected as the default network provider on the clusters >>>>> and the hosts. >>>>> >>>>> >>>> sounds good. >>>> This configuration is required already during the host is added >>>> to oVirt Engine, because OVN is configured during this step. >>>> >>>> >>>>> The "ovn-sbctl show" works on the ovirt engine and shows only >>>>> two hosts, 1 per DC. >>>>> >>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>> hostname: "dc01-node02" >>>>> Encap geneve >>>>> ip: "X.X.X.X" >>>>> options: {csum="true"} >>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>> hostname: "dc02-node1" >>>>> Encap geneve >>>>> ip: "A.A.A.A" >>>>> options: {csum="true"} >>>>> >>>>> >>>>> The new node is not listed (dc01-node1). >>>>> >>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>> times-out on all nodes..... >>>>> >>>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists >>>>> on all logs >>>>> >>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>> unexpected SSL connection close >>>>> >>>>> >>>>> >>>> Can you please compare the output of >>>> >>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>> >>>> of the working hosts, e.g. dc01-node02, and the failing host >>>> dc01-node1? >>>> This should point us the relevant difference in the configuration. >>>> >>>> Please include ovirt-users list in your replay, to share >>>> the knowledge and experience with the community. >>>> >>>> >>>> >>>>> Thank you >>>>> Best regards >>>>> Konstantinos Betsis >>>>> >>>>> >>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> Hi all >>>>>>> >>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>>>>> image. >>>>>>> >>>>>>> The environment was stable till an upgrade took place a couple >>>>>>> of months ago. >>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>> start from scratch. >>>>>>> >>>>>> >>>>>> To trigger the automatic configuration of the host, it is >>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>> for the cluster before adding the host to oVirt. >>>>>> >>>>>> >>>>>>> Even though the installation completed successfully and VMs >>>>>>> are created, the following are not working as expected: >>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>> Centos 7 node in the cluster. >>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>> >>>>>> >>>>>> Does "ovn-sbctl show" list the hosts? >>>>>> >>>>>> >>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs >>>>>>> db. >>>>>>> >>>>>>> Does anyone have any experience on how to troubleshoot OVN on >>>>>>> ovirt? >>>>>>> >>>>>>> >>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>> contain a helpful hint. >>>>>> >>>>>> >>>>>> >>>>>>> Thank you >>>>>>> _______________________________________________ >>>>>>> Users mailing list -- users@ovirt.org >>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>>>> oVirt Code of Conduct: >>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>> List Archives: >>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>> >>>>>>

This is the updated one: # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything. Yes we've got ovirt for about a year now from about version 4.1 On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the one depicted in your email. For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-nbctl get-connection ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer CA Certificate: /etc/pki/ovirt-engine/ca.pem Bootstrap: false [root@ath01-ovirt01 certs]# ovn-sbctl get-connection read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-ndb.p12 -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass -rw-------. 1 root root 2893 Jun 25 11:08 /etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens: ovn-nbctl get-ssl / get-connection ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection failed (No such file or directory) The above i believe is expected since no northbound connections should be established from the host nodes.
ovn-sbctl get-ssl /get-connection The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: connection dropped (Protocol error) 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: connection dropped (Protocol error) 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: connection dropped (Protocol error) 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: connection dropped (Protocol error) 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: received SSL data on JSON-RPC channel 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: connection dropped (Protocol error) 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: connection dropped (Protocol error) 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: connection dropped (Protocol error) 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: connection dropped (Protocol error) 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last 8 seconds (most recently, 1 seconds ago) due to excessive rate 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: received SSL data on JSON-RPC channel 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: connection dropped (Protocol error) 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: connection dropped (Protocol error) 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: connection dropped (Protocol error) 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: connection dropped (Protocol error) 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: connection dropped (Protocol error) 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error parsing stream: line 0, column 0, byte 0: invalid character U+0016 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last 12 seconds (most recently, 4 seconds ago) due to excessive rate 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: received SSL data on JSON-RPC channel 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: connection dropped (Protocol error) 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as to communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> wrote:
> Looks still like the ovn-controller on the host has problems > communicating with ovn-southbound. > > Are there any hints in /var/log/openvswitch/*.log, > especially in /var/log/openvswitch/ovsdb-server-sb.log ? > > Can you please check the output of > > ovn-nbctl get-ssl > ovn-nbctl get-connection > ovn-sbctl get-ssl > ovn-sbctl get-connection > ls -l /etc/pki/ovirt-engine/keys/ovn-* > > it should be similar to > > [root@ovirt-43 ~]# ovn-nbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ovirt-43 ~]# ovn-nbctl get-connection > pssl:6641:[::] > [root@ovirt-43 ~]# ovn-sbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ovirt-43 ~]# ovn-sbctl get-connection > read-write role="" pssl:6642:[::] > [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* > -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 > /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > -rw-------. 1 root root 2709 Oct 14 2019 > /etc/pki/ovirt-engine/keys/ovn-ndb.p12 > -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 > /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > -rw-------. 1 root root 2709 Oct 14 2019 > /etc/pki/ovirt-engine/keys/ovn-sdb.p12 > > > > > On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> I did a restart of the ovn-controller, this is the output of the >> ovn-controller.log >> >> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >> /var/log/openvswitch/ovn-controller.log >> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >> connecting... >> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >> connected >> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force >> recompute. >> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connecting... >> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, >> force recompute. >> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >> unexpected SSL connection close >> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connection attempt failed (Protocol error) >> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connecting... >> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >> unexpected SSL connection close >> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connection attempt failed (Protocol error) >> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> waiting 2 seconds before reconnect >> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connecting... >> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >> unexpected SSL connection close >> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connection attempt failed (Protocol error) >> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> waiting 4 seconds before reconnect >> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connecting... >> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >> unexpected SSL connection close >> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> connection attempt failed (Protocol error) >> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >> continuing to reconnect in the background but suppressing further logging >> >> >> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >> OVIRTMGMT_NETWORK_DC >> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, >> i can redo it if you wan. >> >> After the restart of the ovn-controller the OVIRT ENGINE still >> shows only two geneve connections one with DC01-host02 and DC02-host01. >> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >> hostname: "dc02-host01" >> Encap geneve >> ip: "DC02-host01_IP" >> options: {csum="true"} >> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >> hostname: "DC01-host02" >> Encap geneve >> ip: "DC01-host02" >> options: {csum="true"} >> >> I've re-done the vdsm-tool command and nothing changed.... >> again....with the same errors as the systemctl restart ovn-controller >> >> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> Please include ovirt-users list in your reply, to share >>> the knowledge and experience with the community! >>> >>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> Ok below the output per node and DC >>>> DC01 >>>> node01 >>>> >>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-remote >>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-type >>>> geneve >>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-ip >>>> >>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>> >>>> node02 >>>> >>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-remote >>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-type >>>> geneve >>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-ip >>>> >>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>> >>>> DC02 >>>> node01 >>>> >>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-remote >>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-type >>>> geneve >>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>> external-ids:ovn-encap-ip >>>> >>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>> >>>> >>> Looks good. >>> >>> >>>> DC01 node01 and node02 share the same VM networks and VMs >>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>> >>> >>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe >>> restarting ovn-controller creates more helpful log messages? >>> >>> You can also try restart the ovn configuration on all hosts by >>> executing >>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>> on each host, this would trigger >>> >>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>> internally. >>> >>> >>>> So I would expect to see the same output for node01 to have a >>>> geneve tunnel to node02 and vice versa. >>>> >>>> >>> Me too. >>> >>> >>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Hi Dominik >>>>>> >>>>>> OVN is selected as the default network provider on the clusters >>>>>> and the hosts. >>>>>> >>>>>> >>>>> sounds good. >>>>> This configuration is required already during the host is added >>>>> to oVirt Engine, because OVN is configured during this step. >>>>> >>>>> >>>>>> The "ovn-sbctl show" works on the ovirt engine and shows only >>>>>> two hosts, 1 per DC. >>>>>> >>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>> hostname: "dc01-node02" >>>>>> Encap geneve >>>>>> ip: "X.X.X.X" >>>>>> options: {csum="true"} >>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>> hostname: "dc02-node1" >>>>>> Encap geneve >>>>>> ip: "A.A.A.A" >>>>>> options: {csum="true"} >>>>>> >>>>>> >>>>>> The new node is not listed (dc01-node1). >>>>>> >>>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>>> times-out on all nodes..... >>>>>> >>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>> lists on all logs >>>>>> >>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>> unexpected SSL connection close >>>>>> >>>>>> >>>>>> >>>>> Can you please compare the output of >>>>> >>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>> >>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>> dc01-node1? >>>>> This should point us the relevant difference in the >>>>> configuration. >>>>> >>>>> Please include ovirt-users list in your replay, to share >>>>> the knowledge and experience with the community. >>>>> >>>>> >>>>> >>>>>> Thank you >>>>>> Best regards >>>>>> Konstantinos Betsis >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> Hi all >>>>>>>> >>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>>>>>> image. >>>>>>>> >>>>>>>> The environment was stable till an upgrade took place a >>>>>>>> couple of months ago. >>>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>>> start from scratch. >>>>>>>> >>>>>>> >>>>>>> To trigger the automatic configuration of the host, it is >>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>> for the cluster before adding the host to oVirt. >>>>>>> >>>>>>> >>>>>>>> Even though the installation completed successfully and VMs >>>>>>>> are created, the following are not working as expected: >>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>> Centos 7 node in the cluster. >>>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>> >>>>>>> >>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>> >>>>>>> >>>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs >>>>>>>> db. >>>>>>>> >>>>>>>> Does anyone have any experience on how to troubleshoot OVN on >>>>>>>> ovirt? >>>>>>>> >>>>>>>> >>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>> contain a helpful hint. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thank you >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>>>>> oVirt Code of Conduct: >>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>> List Archives: >>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>> >>>>>>>

Can you try again with: [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > When these commands are used on the ovirt-engine host the output is > the one depicted in your email. > For your reference see also below: > > [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ath01-ovirt01 certs]# ovn-nbctl get-connection > ptcp:6641 > > [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ath01-ovirt01 certs]# ovn-sbctl get-connection > read-write role="" ptcp:6642 > > ^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now.
> [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* > -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > -rw-------. 1 root root 2893 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-ndb.p12 > -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > -rw-------. 1 root root 2893 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-sdb.p12 > > When i try the above commands on the node hosts the following > happens: > ovn-nbctl get-ssl / get-connection > ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database > connection failed (No such file or directory) > The above i believe is expected since no northbound connections > should be established from the host nodes. > > ovn-sbctl get-ssl /get-connection > The output is stuck till i terminate it. > > Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host.
> For the requested logs the below are found in the ovsdb-server-sb.log > > 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: > connection dropped (Protocol error) > 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: > connection dropped (Protocol error) > 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: > connection dropped (Protocol error) > 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: > connection dropped (Protocol error) > 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages > in last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: > error parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages > in last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: > received SSL data on JSON-RPC channel > 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: > connection dropped (Protocol error) > 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: > connection dropped (Protocol error) > 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: > connection dropped (Protocol error) > 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: > connection dropped (Protocol error) > 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages > in last 8 seconds (most recently, 1 seconds ago) due to excessive rate > 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: > error parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages > in last 8 seconds (most recently, 1 seconds ago) due to excessive rate > 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: > received SSL data on JSON-RPC channel > 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: > connection dropped (Protocol error) > 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: > connection dropped (Protocol error) > 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: > connection dropped (Protocol error) > 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: > connection dropped (Protocol error) > 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: > connection dropped (Protocol error) > 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages > in last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: > error parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages > in last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: > received SSL data on JSON-RPC channel > 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: > connection dropped (Protocol error) > 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: > connection dropped (Protocol error) > > > How can we fix these SSL errors? >
I addressed this above.
> I thought vdsm did the certificate provisioning on the host nodes as > to communicate to the engine host node. > > Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost.
> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> > wrote: > >> Looks still like the ovn-controller on the host has problems >> communicating with ovn-southbound. >> >> Are there any hints in /var/log/openvswitch/*.log, >> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >> >> Can you please check the output of >> >> ovn-nbctl get-ssl >> ovn-nbctl get-connection >> ovn-sbctl get-ssl >> ovn-sbctl get-connection >> ls -l /etc/pki/ovirt-engine/keys/ovn-* >> >> it should be similar to >> >> [root@ovirt-43 ~]# ovn-nbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ovirt-43 ~]# ovn-nbctl get-connection >> pssl:6641:[::] >> [root@ovirt-43 ~]# ovn-sbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ovirt-43 ~]# ovn-sbctl get-connection >> read-write role="" pssl:6642:[::] >> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> -rw-------. 1 root root 2709 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> -rw-------. 1 root root 2709 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >> >> >> >> >> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> I did a restart of the ovn-controller, this is the output of the >>> ovn-controller.log >>> >>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>> /var/log/openvswitch/ovn-controller.log >>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>> connecting... >>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>> connected >>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>> force recompute. >>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, >>> force recompute. >>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>> unexpected SSL connection close >>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>> unexpected SSL connection close >>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> waiting 2 seconds before reconnect >>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>> unexpected SSL connection close >>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> waiting 4 seconds before reconnect >>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>> unexpected SSL connection close >>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> continuing to reconnect in the background but suppressing further logging >>> >>> >>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>> OVIRTMGMT_NETWORK_DC >>> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, >>> i can redo it if you wan. >>> >>> After the restart of the ovn-controller the OVIRT ENGINE still >>> shows only two geneve connections one with DC01-host02 and DC02-host01. >>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>> hostname: "dc02-host01" >>> Encap geneve >>> ip: "DC02-host01_IP" >>> options: {csum="true"} >>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>> hostname: "DC01-host02" >>> Encap geneve >>> ip: "DC01-host02" >>> options: {csum="true"} >>> >>> I've re-done the vdsm-tool command and nothing changed.... >>> again....with the same errors as the systemctl restart ovn-controller >>> >>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> Please include ovirt-users list in your reply, to share >>>> the knowledge and experience with the community! >>>> >>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Ok below the output per node and DC >>>>> DC01 >>>>> node01 >>>>> >>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>> >>>>> node02 >>>>> >>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>> >>>>> DC02 >>>>> node01 >>>>> >>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>> >>>>> >>>> Looks good. >>>> >>>> >>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>> >>>> >>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>> Maybe restarting ovn-controller creates more helpful log messages? >>>> >>>> You can also try restart the ovn configuration on all hosts by >>>> executing >>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>> on each host, this would trigger >>>> >>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>> internally. >>>> >>>> >>>>> So I would expect to see the same output for node01 to have a >>>>> geneve tunnel to node02 and vice versa. >>>>> >>>>> >>>> Me too. >>>> >>>> >>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> Hi Dominik >>>>>>> >>>>>>> OVN is selected as the default network provider on the >>>>>>> clusters and the hosts. >>>>>>> >>>>>>> >>>>>> sounds good. >>>>>> This configuration is required already during the host is added >>>>>> to oVirt Engine, because OVN is configured during this step. >>>>>> >>>>>> >>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows only >>>>>>> two hosts, 1 per DC. >>>>>>> >>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>> hostname: "dc01-node02" >>>>>>> Encap geneve >>>>>>> ip: "X.X.X.X" >>>>>>> options: {csum="true"} >>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>> hostname: "dc02-node1" >>>>>>> Encap geneve >>>>>>> ip: "A.A.A.A" >>>>>>> options: {csum="true"} >>>>>>> >>>>>>> >>>>>>> The new node is not listed (dc01-node1). >>>>>>> >>>>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>>>> times-out on all nodes..... >>>>>>> >>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>>> lists on all logs >>>>>>> >>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> >>>>>>> >>>>>>> >>>>>> Can you please compare the output of >>>>>> >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>> >>>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>>> dc01-node1? >>>>>> This should point us the relevant difference in the >>>>>> configuration. >>>>>> >>>>>> Please include ovirt-users list in your replay, to share >>>>>> the knowledge and experience with the community. >>>>>> >>>>>> >>>>>> >>>>>>> Thank you >>>>>>> Best regards >>>>>>> Konstantinos Betsis >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG >>>>>>>>> Node image. >>>>>>>>> >>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>> couple of months ago. >>>>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>>>> start from scratch. >>>>>>>>> >>>>>>>> >>>>>>>> To trigger the automatic configuration of the host, it is >>>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>> >>>>>>>> >>>>>>>>> Even though the installation completed successfully and VMs >>>>>>>>> are created, the following are not working as expected: >>>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>>> Centos 7 node in the cluster. >>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>>> >>>>>>>> >>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>> >>>>>>>> >>>>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs >>>>>>>>> db. >>>>>>>>> >>>>>>>>> Does anyone have any experience on how to troubleshoot OVN >>>>>>>>> on ovirt? >>>>>>>>> >>>>>>>>> >>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>>> contain a helpful hint. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thank you >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>>>>>> oVirt Code of Conduct: >>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>> List Archives: >>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>> >>>>>>>>

Hi Dominik Fixed the issue. I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version. Once the provider was updated with the following it functioned perfectly: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty" For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry. So now geneve tunnels are established. OVN provider is working. But VMs still do not communicated on the same VM network spanning different hosts. So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other. On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
That immediately fixed the geneve tunnels between all hosts.
thanks for the feedback.
However, the ovn provider is not broken. After fixing the networks we tried to move a VM to the DC01-host01 so we powered it down and simply configured it to run on dc01-node01.
While checking the logs on the ovirt engine i noticed the below: Failed to synchronize networks of Provider ovirt-provider-ovn.
The ovn-provider configure on the engine is the below: Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: http:localhost:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: hTTP Host Name: dc02-ovirt01 API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
In the past this was deleted by an engineer and recreated as per the documentation, and it worked. Do we need to update something due to the SSL on the ovn?
Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
From the ovn-provider logs the below is generated after a service restart and when the start VM is triggered
2020-09-15 15:07:33,579 root Starting server 2020-09-15 15:07:33,579 root Version: 1.2.29-1 2020-09-15 15:07:33,579 root Build date: 20191217125241 2020-09-15 15:07:33,579 root Githash: cb5a80d 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: GET /v2.0/ports 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: 127.0.0.1:6641: Unknown error -1 Traceback (most recent call last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in _handle_request method, path_parts, content File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in handle_request return self.call_response_handler(handler, content, parameters) File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in call_response_handler with NeutronApi() as ovn_north: File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ self.ovsidl, self.idl = ovn_connection.connect() File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, in connect ovnconst.OVN_NORTHBOUND File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 127, in from_server helper = idlutils.get_schema_helper(connection_string, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6641: Unknown error -1
When i update the ovn provider from the GUI to have https://localhost:9696/ and HTTPS as the protocol the test fails.
On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Hi Dominik >> >> When these commands are used on the ovirt-engine host the output is >> the one depicted in your email. >> For your reference see also below: >> >> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >> ptcp:6641 >> >> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >> read-write role="" ptcp:6642 >> >> > ^^^ the line above points to the problem: ovn-central is configured > to use plain TCP without ssl. > engine-setup usually configures ovn-central to use SSL. That the > files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, > that engine-setup was triggered correctly. Looks like the ovn db was > dropped somehow, this should not happen. > This can be fixed manually by executing the following commands on > engine's machine: > ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem > ovn-nbctl set-connection pssl:6641 > ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem > ovn-sbctl set-connection pssl:6642 > > The /var/log/openvswitch/ovn-controller.log on the hosts should tell > that br-int.mgmt is connected now. > > > >> [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> -rw-------. 1 root root 2893 Jun 25 11:08 >> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> -rw-------. 1 root root 2893 Jun 25 11:08 >> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >> >> When i try the above commands on the node hosts the following >> happens: >> ovn-nbctl get-ssl / get-connection >> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >> connection failed (No such file or directory) >> The above i believe is expected since no northbound connections >> should be established from the host nodes. >> >> ovn-sbctl get-ssl /get-connection >> The output is stuck till i terminate it. >> >> > Yes, the ovn-* commands works only on engine's machine, which has > the role ovn-central. > On the hosts, there is only the ovn-controller, which connects the > ovn southbound to openvswitch on the host. > > >> For the requested logs the below are found in the >> ovsdb-server-sb.log >> >> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >> connection dropped (Protocol error) >> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >> connection dropped (Protocol error) >> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >> connection dropped (Protocol error) >> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >> connection dropped (Protocol error) >> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages >> in last 12 seconds (most recently, 4 seconds ago) due to excessive rate >> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages >> in last 12 seconds (most recently, 4 seconds ago) due to excessive rate >> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >> received SSL data on JSON-RPC channel >> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >> connection dropped (Protocol error) >> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >> connection dropped (Protocol error) >> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >> connection dropped (Protocol error) >> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >> connection dropped (Protocol error) >> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages >> in last 8 seconds (most recently, 1 seconds ago) due to excessive rate >> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages >> in last 8 seconds (most recently, 1 seconds ago) due to excessive rate >> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >> received SSL data on JSON-RPC channel >> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >> connection dropped (Protocol error) >> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >> connection dropped (Protocol error) >> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >> connection dropped (Protocol error) >> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >> connection dropped (Protocol error) >> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >> connection dropped (Protocol error) >> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages >> in last 12 seconds (most recently, 4 seconds ago) due to excessive rate >> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages >> in last 12 seconds (most recently, 4 seconds ago) due to excessive rate >> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >> received SSL data on JSON-RPC channel >> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >> connection dropped (Protocol error) >> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >> connection dropped (Protocol error) >> >> >> How can we fix these SSL errors? >> > > I addressed this above. > > >> I thought vdsm did the certificate provisioning on the host nodes >> as to communicate to the engine host node. >> >> > Yes, this seems to work in your scenario, just the SSL configuration > on the ovn-central was lost. > > >> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> Looks still like the ovn-controller on the host has problems >>> communicating with ovn-southbound. >>> >>> Are there any hints in /var/log/openvswitch/*.log, >>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>> >>> Can you please check the output of >>> >>> ovn-nbctl get-ssl >>> ovn-nbctl get-connection >>> ovn-sbctl get-ssl >>> ovn-sbctl get-connection >>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>> >>> it should be similar to >>> >>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>> Bootstrap: false >>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>> pssl:6641:[::] >>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>> Bootstrap: false >>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>> read-write role="" pssl:6642:[::] >>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>> -rw-------. 1 root root 2709 Oct 14 2019 >>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>> -rw-------. 1 root root 2709 Oct 14 2019 >>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>> >>> >>> >>> >>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> I did a restart of the ovn-controller, this is the output of the >>>> ovn-controller.log >>>> >>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>> /var/log/openvswitch/ovn-controller.log >>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>> connecting... >>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>> connected >>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>>> force recompute. >>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connecting... >>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, >>>> force recompute. >>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>> unexpected SSL connection close >>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connection attempt failed (Protocol error) >>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connecting... >>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>> unexpected SSL connection close >>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connection attempt failed (Protocol error) >>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> waiting 2 seconds before reconnect >>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connecting... >>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>> unexpected SSL connection close >>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connection attempt failed (Protocol error) >>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> waiting 4 seconds before reconnect >>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connecting... >>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>> unexpected SSL connection close >>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> connection attempt failed (Protocol error) >>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>> continuing to reconnect in the background but suppressing further logging >>>> >>>> >>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>> OVIRTMGMT_NETWORK_DC >>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>> controller, i can redo it if you wan. >>>> >>>> After the restart of the ovn-controller the OVIRT ENGINE still >>>> shows only two geneve connections one with DC01-host02 and DC02-host01. >>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>> hostname: "dc02-host01" >>>> Encap geneve >>>> ip: "DC02-host01_IP" >>>> options: {csum="true"} >>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>> hostname: "DC01-host02" >>>> Encap geneve >>>> ip: "DC01-host02" >>>> options: {csum="true"} >>>> >>>> I've re-done the vdsm-tool command and nothing changed.... >>>> again....with the same errors as the systemctl restart ovn-controller >>>> >>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> Please include ovirt-users list in your reply, to share >>>>> the knowledge and experience with the community! >>>>> >>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Ok below the output per node and DC >>>>>> DC01 >>>>>> node01 >>>>>> >>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-remote >>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-type >>>>>> geneve >>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-ip >>>>>> >>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>> >>>>>> node02 >>>>>> >>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-remote >>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-type >>>>>> geneve >>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-ip >>>>>> >>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>> >>>>>> DC02 >>>>>> node01 >>>>>> >>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-remote >>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-type >>>>>> geneve >>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>> external-ids:ovn-encap-ip >>>>>> >>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>> >>>>>> >>>>> Looks good. >>>>> >>>>> >>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>> >>>>> >>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>>> Maybe restarting ovn-controller creates more helpful log messages? >>>>> >>>>> You can also try restart the ovn configuration on all hosts by >>>>> executing >>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>> on each host, this would trigger >>>>> >>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>> internally. >>>>> >>>>> >>>>>> So I would expect to see the same output for node01 to have a >>>>>> geneve tunnel to node02 and vice versa. >>>>>> >>>>>> >>>>> Me too. >>>>> >>>>> >>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Dominik >>>>>>>> >>>>>>>> OVN is selected as the default network provider on the >>>>>>>> clusters and the hosts. >>>>>>>> >>>>>>>> >>>>>>> sounds good. >>>>>>> This configuration is required already during the host is >>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>> >>>>>>> >>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows only >>>>>>>> two hosts, 1 per DC. >>>>>>>> >>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>> hostname: "dc01-node02" >>>>>>>> Encap geneve >>>>>>>> ip: "X.X.X.X" >>>>>>>> options: {csum="true"} >>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>> hostname: "dc02-node1" >>>>>>>> Encap geneve >>>>>>>> ip: "A.A.A.A" >>>>>>>> options: {csum="true"} >>>>>>>> >>>>>>>> >>>>>>>> The new node is not listed (dc01-node1). >>>>>>>> >>>>>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>>>>> times-out on all nodes..... >>>>>>>> >>>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>>>> lists on all logs >>>>>>>> >>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>> unexpected SSL connection close >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Can you please compare the output of >>>>>>> >>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>> >>>>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>>>> dc01-node1? >>>>>>> This should point us the relevant difference in the >>>>>>> configuration. >>>>>>> >>>>>>> Please include ovirt-users list in your replay, to share >>>>>>> the knowledge and experience with the community. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thank you >>>>>>>> Best regards >>>>>>>> Konstantinos Betsis >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi all >>>>>>>>>> >>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG >>>>>>>>>> Node image. >>>>>>>>>> >>>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>>> couple of months ago. >>>>>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>>>>> start from scratch. >>>>>>>>>> >>>>>>>>> >>>>>>>>> To trigger the automatic configuration of the host, it is >>>>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Even though the installation completed successfully and VMs >>>>>>>>>> are created, the following are not working as expected: >>>>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>>>> Centos 7 node in the cluster. >>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>> >>>>>>>>> >>>>>>>>>> 3. no flows are shown on the engine on port 6642 for the >>>>>>>>>> ovs db. >>>>>>>>>> >>>>>>>>>> Does anyone have any experience on how to troubleshoot OVN >>>>>>>>>> on ovirt? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>>>> contain a helpful hint. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> _______________________________________________ >>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>> Privacy Statement: >>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>> List Archives: >>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>> >>>>>>>>>

On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > That immediately fixed the geneve tunnels between all hosts. > > thanks for the feedback.
> However, the ovn provider is not broken. > After fixing the networks we tried to move a VM to the DC01-host01 > so we powered it down and simply configured it to run on dc01-node01. > > While checking the logs on the ovirt engine i noticed the below: > Failed to synchronize networks of Provider ovirt-provider-ovn. > > The ovn-provider configure on the engine is the below: > Name: ovirt-provider-ovn > Description: oVirt network provider for OVN > Type: External Network Provider > Network Plugin: oVirt Network Provider for OVN > Automatic Synchronization: Checked > Unmanaged: Unchecked > Provider URL: http:localhost:9696 > Requires Authentication: Checked > Username: admin@internal > Password: "The admin password" > Protocol: hTTP > Host Name: dc02-ovirt01 > API Port: 35357 > API Version: v2.0 > Tenant Name: "Empty" > > In the past this was deleted by an engineer and recreated as per the > documentation, and it worked. Do we need to update something due to the SSL > on the ovn? > > Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? engine-setup should have created one. If the file is missing, for testing purposes, you can create a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : [PROVIDER] provider-host=REPLACE_WITH_FQDN [SSL] ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem https-enabled=true [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin [NETWORK] port-security-enabled-default=True
and restart the ovirt-provider-ovn service.
> From the ovn-provider logs the below is generated after a service > restart and when the start VM is triggered > > 2020-09-15 15:07:33,579 root Starting server > 2020-09-15 15:07:33,579 root Version: 1.2.29-1 > 2020-09-15 15:07:33,579 root Build date: 20191217125241 > 2020-09-15 15:07:33,579 root Githash: cb5a80d > 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: > GET /v2.0/ports > 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: > 127.0.0.1:6641: Unknown error -1 > Traceback (most recent call last): > File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", > line 138, in _handle_request > method, path_parts, content > File > "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in > handle_request > return self.call_response_handler(handler, content, parameters) > File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, > in call_response_handler > with NeutronApi() as ovn_north: > File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line > 95, in __init__ > self.ovsidl, self.idl = ovn_connection.connect() > File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, > in connect > ovnconst.OVN_NORTHBOUND > File > "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", > line 127, in from_server > helper = idlutils.get_schema_helper(connection_string, > schema_name) > File > "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", > line 128, in get_schema_helper > 'err': os.strerror(err)}) > Exception: Could not retrieve schema from tcp:127.0.0.1:6641: > Unknown error -1 > > > When i update the ovn provider from the GUI to have > https://localhost:9696/ and HTTPS as the protocol the test fails. > > On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> Hi Dominik >>> >>> When these commands are used on the ovirt-engine host the output >>> is the one depicted in your email. >>> For your reference see also below: >>> >>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>> Bootstrap: false >>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>> ptcp:6641 >>> >>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>> Bootstrap: false >>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>> read-write role="" ptcp:6642 >>> >>> >> ^^^ the line above points to the problem: ovn-central is configured >> to use plain TCP without ssl. >> engine-setup usually configures ovn-central to use SSL. That the >> files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >> that engine-setup was triggered correctly. Looks like the ovn db >> was dropped somehow, this should not happen. >> This can be fixed manually by executing the following commands on >> engine's machine: >> ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >> ovn-nbctl set-connection pssl:6641 >> ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >> ovn-sbctl set-connection pssl:6642 >> >> The /var/log/openvswitch/ovn-controller.log on the hosts should >> tell that br-int.mgmt is connected now. >> >> >> >>> [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>> -rw-------. 1 root root 2893 Jun 25 11:08 >>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>> -rw-------. 1 root root 2893 Jun 25 11:08 >>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>> >>> When i try the above commands on the node hosts the following >>> happens: >>> ovn-nbctl get-ssl / get-connection >>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>> connection failed (No such file or directory) >>> The above i believe is expected since no northbound connections >>> should be established from the host nodes. >>> >>> ovn-sbctl get-ssl /get-connection >>> The output is stuck till i terminate it. >>> >>> >> Yes, the ovn-* commands works only on engine's machine, which has >> the role ovn-central. >> On the hosts, there is only the ovn-controller, which connects the >> ovn southbound to openvswitch on the host. >> >> >>> For the requested logs the below are found in the >>> ovsdb-server-sb.log >>> >>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>> received SSL data on JSON-RPC channel >>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>> connection dropped (Protocol error) >>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>> received SSL data on JSON-RPC channel >>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>> rate >>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>> received SSL data on JSON-RPC channel >>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>> connection dropped (Protocol error) >>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>> connection dropped (Protocol error) >>> >>> >>> How can we fix these SSL errors? >>> >> >> I addressed this above. >> >> >>> I thought vdsm did the certificate provisioning on the host nodes >>> as to communicate to the engine host node. >>> >>> >> Yes, this seems to work in your scenario, just the SSL >> configuration on the ovn-central was lost. >> >> >>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> Looks still like the ovn-controller on the host has problems >>>> communicating with ovn-southbound. >>>> >>>> Are there any hints in /var/log/openvswitch/*.log, >>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>> >>>> Can you please check the output of >>>> >>>> ovn-nbctl get-ssl >>>> ovn-nbctl get-connection >>>> ovn-sbctl get-ssl >>>> ovn-sbctl get-connection >>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>> >>>> it should be similar to >>>> >>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>> Bootstrap: false >>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>> pssl:6641:[::] >>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>> Bootstrap: false >>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>> read-write role="" pssl:6642:[::] >>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>> >>>> >>>> >>>> >>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> I did a restart of the ovn-controller, this is the output of the >>>>> ovn-controller.log >>>>> >>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>> /var/log/openvswitch/ovn-controller.log >>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>> connecting... >>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>> connected >>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>>>> force recompute. >>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connecting... >>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, >>>>> force recompute. >>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>> unexpected SSL connection close >>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connection attempt failed (Protocol error) >>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connecting... >>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>> unexpected SSL connection close >>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connection attempt failed (Protocol error) >>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> waiting 2 seconds before reconnect >>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connecting... >>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>> unexpected SSL connection close >>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connection attempt failed (Protocol error) >>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> waiting 4 seconds before reconnect >>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connecting... >>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>> unexpected SSL connection close >>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> connection attempt failed (Protocol error) >>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>> continuing to reconnect in the background but suppressing further logging >>>>> >>>>> >>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>> OVIRTMGMT_NETWORK_DC >>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>> controller, i can redo it if you wan. >>>>> >>>>> After the restart of the ovn-controller the OVIRT ENGINE still >>>>> shows only two geneve connections one with DC01-host02 and DC02-host01. >>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>> hostname: "dc02-host01" >>>>> Encap geneve >>>>> ip: "DC02-host01_IP" >>>>> options: {csum="true"} >>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>> hostname: "DC01-host02" >>>>> Encap geneve >>>>> ip: "DC01-host02" >>>>> options: {csum="true"} >>>>> >>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>> again....with the same errors as the systemctl restart ovn-controller >>>>> >>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> Please include ovirt-users list in your reply, to share >>>>>> the knowledge and experience with the community! >>>>>> >>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> Ok below the output per node and DC >>>>>>> DC01 >>>>>>> node01 >>>>>>> >>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-remote >>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-type >>>>>>> geneve >>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-ip >>>>>>> >>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>> >>>>>>> node02 >>>>>>> >>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-remote >>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-type >>>>>>> geneve >>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-ip >>>>>>> >>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>> >>>>>>> DC02 >>>>>>> node01 >>>>>>> >>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-remote >>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-type >>>>>>> geneve >>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>> external-ids:ovn-encap-ip >>>>>>> >>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>> >>>>>>> >>>>>> Looks good. >>>>>> >>>>>> >>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>> >>>>>> >>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>>>> Maybe restarting ovn-controller creates more helpful log messages? >>>>>> >>>>>> You can also try restart the ovn configuration on all hosts by >>>>>> executing >>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>> on each host, this would trigger >>>>>> >>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>> internally. >>>>>> >>>>>> >>>>>>> So I would expect to see the same output for node01 to have a >>>>>>> geneve tunnel to node02 and vice versa. >>>>>>> >>>>>>> >>>>>> Me too. >>>>>> >>>>>> >>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Dominik >>>>>>>>> >>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>> clusters and the hosts. >>>>>>>>> >>>>>>>>> >>>>>>>> sounds good. >>>>>>>> This configuration is required already during the host is >>>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>>> >>>>>>>> >>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows >>>>>>>>> only two hosts, 1 per DC. >>>>>>>>> >>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>> hostname: "dc01-node02" >>>>>>>>> Encap geneve >>>>>>>>> ip: "X.X.X.X" >>>>>>>>> options: {csum="true"} >>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>> hostname: "dc02-node1" >>>>>>>>> Encap geneve >>>>>>>>> ip: "A.A.A.A" >>>>>>>>> options: {csum="true"} >>>>>>>>> >>>>>>>>> >>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>> >>>>>>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>>>>>> times-out on all nodes..... >>>>>>>>> >>>>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>>>>> lists on all logs >>>>>>>>> >>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>> unexpected SSL connection close >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Can you please compare the output of >>>>>>>> >>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>> >>>>>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>>>>> dc01-node1? >>>>>>>> This should point us the relevant difference in the >>>>>>>> configuration. >>>>>>>> >>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>> the knowledge and experience with the community. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thank you >>>>>>>>> Best regards >>>>>>>>> Konstantinos Betsis >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all >>>>>>>>>>> >>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG >>>>>>>>>>> Node image. >>>>>>>>>>> >>>>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>>>> couple of months ago. >>>>>>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>>>>>> start from scratch. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> To trigger the automatic configuration of the host, it is >>>>>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Even though the installation completed successfully and >>>>>>>>>>> VMs are created, the following are not working as expected: >>>>>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>>>>> Centos 7 node in the cluster. >>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for the >>>>>>>>>>> ovs db. >>>>>>>>>>> >>>>>>>>>>> Does anyone have any experience on how to troubleshoot OVN >>>>>>>>>>> on ovirt? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>>>>> contain a helpful hint. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>> Privacy Statement: >>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>> List Archives: >>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>> >>>>>>>>>>

So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both. From the DC01-node02 i get the following errors: 2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c And this repeats forever. The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other. On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
There is a file with the below entries
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
[root@dc02-ovirt01 log]# cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf # This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=tcp:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
The only entry missing is the [AUTH] and under [SSL] the https-enabled is false. Should I edit this in this file or is this going to break everything?
Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Hi Dominik >> >> That immediately fixed the geneve tunnels between all hosts. >> >> > thanks for the feedback. > > >> However, the ovn provider is not broken. >> After fixing the networks we tried to move a VM to the DC01-host01 >> so we powered it down and simply configured it to run on dc01-node01. >> >> While checking the logs on the ovirt engine i noticed the below: >> Failed to synchronize networks of Provider ovirt-provider-ovn. >> >> The ovn-provider configure on the engine is the below: >> Name: ovirt-provider-ovn >> Description: oVirt network provider for OVN >> Type: External Network Provider >> Network Plugin: oVirt Network Provider for OVN >> Automatic Synchronization: Checked >> Unmanaged: Unchecked >> Provider URL: http:localhost:9696 >> Requires Authentication: Checked >> Username: admin@internal >> Password: "The admin password" >> Protocol: hTTP >> Host Name: dc02-ovirt01 >> API Port: 35357 >> API Version: v2.0 >> Tenant Name: "Empty" >> >> In the past this was deleted by an engineer and recreated as per >> the documentation, and it worked. Do we need to update something due to the >> SSL on the ovn? >> >> > Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? > engine-setup should have created one. > If the file is missing, for testing purposes, you can create a > file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : > [PROVIDER] > provider-host=REPLACE_WITH_FQDN > [SSL] > ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer > ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass > ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem > https-enabled=true > [OVN REMOTE] > ovn-remote=ssl:127.0.0.1:6641 > [AUTH] > auth-plugin=auth.plugins.static_token:NoAuthPlugin > [NETWORK] > port-security-enabled-default=True > > and restart the ovirt-provider-ovn service. > > > > >> From the ovn-provider logs the below is generated after a service >> restart and when the start VM is triggered >> >> 2020-09-15 15:07:33,579 root Starting server >> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >> 2020-09-15 15:07:33,579 root Githash: cb5a80d >> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 Request: >> GET /v2.0/ports >> 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: >> 127.0.0.1:6641: Unknown error -1 >> Traceback (most recent call last): >> File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", >> line 138, in _handle_request >> method, path_parts, content >> File >> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >> handle_request >> return self.call_response_handler(handler, content, parameters) >> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line >> 35, in call_response_handler >> with NeutronApi() as ovn_north: >> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line >> 95, in __init__ >> self.ovsidl, self.idl = ovn_connection.connect() >> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, >> in connect >> ovnconst.OVN_NORTHBOUND >> File >> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >> line 127, in from_server >> helper = idlutils.get_schema_helper(connection_string, >> schema_name) >> File >> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >> line 128, in get_schema_helper >> 'err': os.strerror(err)}) >> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >> Unknown error -1 >> >> >> When i update the ovn provider from the GUI to have >> https://localhost:9696/ and HTTPS as the protocol the test fails. >> >> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> Hi Dominik >>>> >>>> When these commands are used on the ovirt-engine host the output >>>> is the one depicted in your email. >>>> For your reference see also below: >>>> >>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>> Bootstrap: false >>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>> ptcp:6641 >>>> >>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>> Bootstrap: false >>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>> read-write role="" ptcp:6642 >>>> >>>> >>> ^^^ the line above points to the problem: ovn-central is >>> configured to use plain TCP without ssl. >>> engine-setup usually configures ovn-central to use SSL. That the >>> files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>> that engine-setup was triggered correctly. Looks like the ovn db >>> was dropped somehow, this should not happen. >>> This can be fixed manually by executing the following commands on >>> engine's machine: >>> ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>> ovn-nbctl set-connection pssl:6641 >>> ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>> ovn-sbctl set-connection pssl:6642 >>> >>> The /var/log/openvswitch/ovn-controller.log on the hosts should >>> tell that br-int.mgmt is connected now. >>> >>> >>> >>>> [root@ath01-ovirt01 certs]# ls -l >>>> /etc/pki/ovirt-engine/keys/ovn-* >>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>> >>>> When i try the above commands on the node hosts the following >>>> happens: >>>> ovn-nbctl get-ssl / get-connection >>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>> connection failed (No such file or directory) >>>> The above i believe is expected since no northbound connections >>>> should be established from the host nodes. >>>> >>>> ovn-sbctl get-ssl /get-connection >>>> The output is stuck till i terminate it. >>>> >>>> >>> Yes, the ovn-* commands works only on engine's machine, which has >>> the role ovn-central. >>> On the hosts, there is only the ovn-controller, which connects the >>> ovn southbound to openvswitch on the host. >>> >>> >>>> For the requested logs the below are found in the >>>> ovsdb-server-sb.log >>>> >>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>> received SSL data on JSON-RPC channel >>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>> received SSL data on JSON-RPC channel >>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>> rate >>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>> received SSL data on JSON-RPC channel >>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>> connection dropped (Protocol error) >>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>> connection dropped (Protocol error) >>>> >>>> >>>> How can we fix these SSL errors? >>>> >>> >>> I addressed this above. >>> >>> >>>> I thought vdsm did the certificate provisioning on the host nodes >>>> as to communicate to the engine host node. >>>> >>>> >>> Yes, this seems to work in your scenario, just the SSL >>> configuration on the ovn-central was lost. >>> >>> >>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> Looks still like the ovn-controller on the host has problems >>>>> communicating with ovn-southbound. >>>>> >>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>> >>>>> Can you please check the output of >>>>> >>>>> ovn-nbctl get-ssl >>>>> ovn-nbctl get-connection >>>>> ovn-sbctl get-ssl >>>>> ovn-sbctl get-connection >>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>> >>>>> it should be similar to >>>>> >>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>> Bootstrap: false >>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>> pssl:6641:[::] >>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>> Bootstrap: false >>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>> read-write role="" pssl:6642:[::] >>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> I did a restart of the ovn-controller, this is the output of >>>>>> the ovn-controller.log >>>>>> >>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>> /var/log/openvswitch/ovn-controller.log >>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>> connecting... >>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>> connected >>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>>>>> force recompute. >>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connecting... >>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, >>>>>> force recompute. >>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>> unexpected SSL connection close >>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connection attempt failed (Protocol error) >>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connecting... >>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>> unexpected SSL connection close >>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connection attempt failed (Protocol error) >>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> waiting 2 seconds before reconnect >>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connecting... >>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>> unexpected SSL connection close >>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connection attempt failed (Protocol error) >>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> waiting 4 seconds before reconnect >>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connecting... >>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>> unexpected SSL connection close >>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> connection attempt failed (Protocol error) >>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>> continuing to reconnect in the background but suppressing further logging >>>>>> >>>>>> >>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>> OVIRTMGMT_NETWORK_DC >>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>> controller, i can redo it if you wan. >>>>>> >>>>>> After the restart of the ovn-controller the OVIRT ENGINE still >>>>>> shows only two geneve connections one with DC01-host02 and DC02-host01. >>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>> hostname: "dc02-host01" >>>>>> Encap geneve >>>>>> ip: "DC02-host01_IP" >>>>>> options: {csum="true"} >>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>> hostname: "DC01-host02" >>>>>> Encap geneve >>>>>> ip: "DC01-host02" >>>>>> options: {csum="true"} >>>>>> >>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>> >>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> Please include ovirt-users list in your reply, to share >>>>>>> the knowledge and experience with the community! >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> Ok below the output per node and DC >>>>>>>> DC01 >>>>>>>> node01 >>>>>>>> >>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-remote >>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-type >>>>>>>> geneve >>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-ip >>>>>>>> >>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>> >>>>>>>> node02 >>>>>>>> >>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-remote >>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-type >>>>>>>> geneve >>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-ip >>>>>>>> >>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>> >>>>>>>> DC02 >>>>>>>> node01 >>>>>>>> >>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-remote >>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-type >>>>>>>> geneve >>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>> external-ids:ovn-encap-ip >>>>>>>> >>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>> >>>>>>>> >>>>>>> Looks good. >>>>>>> >>>>>>> >>>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>> >>>>>>> >>>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>>>>> Maybe restarting ovn-controller creates more helpful log messages? >>>>>>> >>>>>>> You can also try restart the ovn configuration on all hosts by >>>>>>> executing >>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>> on each host, this would trigger >>>>>>> >>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>> internally. >>>>>>> >>>>>>> >>>>>>>> So I would expect to see the same output for node01 to have a >>>>>>>> geneve tunnel to node02 and vice versa. >>>>>>>> >>>>>>>> >>>>>>> Me too. >>>>>>> >>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Dominik >>>>>>>>>> >>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>> clusters and the hosts. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> sounds good. >>>>>>>>> This configuration is required already during the host is >>>>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>>>> >>>>>>>>> >>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows >>>>>>>>>> only two hosts, 1 per DC. >>>>>>>>>> >>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>> Encap geneve >>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>> options: {csum="true"} >>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>> Encap geneve >>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>> options: {csum="true"} >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>> >>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>> >>>>>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>>>>>> lists on all logs >>>>>>>>>> >>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>> unexpected SSL connection close >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Can you please compare the output of >>>>>>>>> >>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>> >>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>>>>>> dc01-node1? >>>>>>>>> This should point us the relevant difference in the >>>>>>>>> configuration. >>>>>>>>> >>>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>>> the knowledge and experience with the community. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> Best regards >>>>>>>>>> Konstantinos Betsis >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all >>>>>>>>>>>> >>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG >>>>>>>>>>>> Node image. >>>>>>>>>>>> >>>>>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>>>>> couple of months ago. >>>>>>>>>>>> As such we had to re-install one of the Centos 7 node and >>>>>>>>>>>> start from scratch. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> To trigger the automatic configuration of the host, it is >>>>>>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Even though the installation completed successfully and >>>>>>>>>>>> VMs are created, the following are not working as expected: >>>>>>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>>>>>> Centos 7 node in the cluster. >>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no >>>>>>>>>>>> geneve tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for the >>>>>>>>>>>> ovs db. >>>>>>>>>>>> >>>>>>>>>>>> Does anyone have any experience on how to troubleshoot >>>>>>>>>>>> OVN on ovirt? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>>>>>> contain a helpful hint. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>> List Archives: >>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>> >>>>>>>>>>>

On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused. Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead. Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> There is a file with the below entries >
Impressive, do you know when this config file was created and if it was manually modified? Is this an upgrade from oVirt 4.1?
> [root@dc02-ovirt01 log]# cat > /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf > # This file is automatically generated by engine-setup. Please do > not edit manually > [OVN REMOTE] > ovn-remote=tcp:127.0.0.1:6641 > [SSL] > https-enabled=false > ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem > ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer > ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass > [OVIRT] > ovirt-sso-client-secret=*random_test* > ovirt-host=https://dc02-ovirt01.testdomain.com:443 > ovirt-sso-client-id=ovirt-provider-ovn > ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem > [NETWORK] > port-security-enabled-default=True > [PROVIDER] > > provider-host=dc02-ovirt01.testdomain.com > > The only entry missing is the [AUTH] and under [SSL] the > https-enabled is false. Should I edit this in this file or is this going to > break everything? > > Changing the file should improve, but better create a backup into another diretory before modification. The only required change is from ovn-remote=tcp:127.0.0.1:6641 to ovn-remote=ssl:127.0.0.1:6641
> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> Hi Dominik >>> >>> That immediately fixed the geneve tunnels between all hosts. >>> >>> >> thanks for the feedback. >> >> >>> However, the ovn provider is not broken. >>> After fixing the networks we tried to move a VM to the DC01-host01 >>> so we powered it down and simply configured it to run on dc01-node01. >>> >>> While checking the logs on the ovirt engine i noticed the below: >>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>> >>> The ovn-provider configure on the engine is the below: >>> Name: ovirt-provider-ovn >>> Description: oVirt network provider for OVN >>> Type: External Network Provider >>> Network Plugin: oVirt Network Provider for OVN >>> Automatic Synchronization: Checked >>> Unmanaged: Unchecked >>> Provider URL: http:localhost:9696 >>> Requires Authentication: Checked >>> Username: admin@internal >>> Password: "The admin password" >>> Protocol: hTTP >>> Host Name: dc02-ovirt01 >>> API Port: 35357 >>> API Version: v2.0 >>> Tenant Name: "Empty" >>> >>> In the past this was deleted by an engineer and recreated as per >>> the documentation, and it worked. Do we need to update something due to the >>> SSL on the ovn? >>> >>> >> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >> engine-setup should have created one. >> If the file is missing, for testing purposes, you can create a >> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >> [PROVIDER] >> provider-host=REPLACE_WITH_FQDN >> [SSL] >> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >> >> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >> https-enabled=true >> [OVN REMOTE] >> ovn-remote=ssl:127.0.0.1:6641 >> [AUTH] >> auth-plugin=auth.plugins.static_token:NoAuthPlugin >> [NETWORK] >> port-security-enabled-default=True >> >> and restart the ovirt-provider-ovn service. >> >> >> >> >>> From the ovn-provider logs the below is generated after a service >>> restart and when the start VM is triggered >>> >>> 2020-09-15 15:07:33,579 root Starting server >>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>> Request: GET /v2.0/ports >>> 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: >>> 127.0.0.1:6641: Unknown error -1 >>> Traceback (most recent call last): >>> File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", >>> line 138, in _handle_request >>> method, path_parts, content >>> File >>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>> handle_request >>> return self.call_response_handler(handler, content, parameters) >>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line >>> 35, in call_response_handler >>> with NeutronApi() as ovn_north: >>> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", >>> line 95, in __init__ >>> self.ovsidl, self.idl = ovn_connection.connect() >>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line 46, >>> in connect >>> ovnconst.OVN_NORTHBOUND >>> File >>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>> line 127, in from_server >>> helper = idlutils.get_schema_helper(connection_string, >>> schema_name) >>> File >>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>> line 128, in get_schema_helper >>> 'err': os.strerror(err)}) >>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>> Unknown error -1 >>> >>> >>> When i update the ovn provider from the GUI to have >>> https://localhost:9696/ and HTTPS as the protocol the test fails. >>> >>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Hi Dominik >>>>> >>>>> When these commands are used on the ovirt-engine host the output >>>>> is the one depicted in your email. >>>>> For your reference see also below: >>>>> >>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>> Bootstrap: false >>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>> ptcp:6641 >>>>> >>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>> Bootstrap: false >>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>> read-write role="" ptcp:6642 >>>>> >>>>> >>>> ^^^ the line above points to the problem: ovn-central is >>>> configured to use plain TCP without ssl. >>>> engine-setup usually configures ovn-central to use SSL. That the >>>> files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>> that engine-setup was triggered correctly. Looks like the ovn db >>>> was dropped somehow, this should not happen. >>>> This can be fixed manually by executing the following commands on >>>> engine's machine: >>>> ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>> ovn-nbctl set-connection pssl:6641 >>>> ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>> ovn-sbctl set-connection pssl:6642 >>>> >>>> The /var/log/openvswitch/ovn-controller.log on the hosts should >>>> tell that br-int.mgmt is connected now. >>>> >>>> >>>> >>>>> [root@ath01-ovirt01 certs]# ls -l >>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>> >>>>> When i try the above commands on the node hosts the following >>>>> happens: >>>>> ovn-nbctl get-ssl / get-connection >>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>>> connection failed (No such file or directory) >>>>> The above i believe is expected since no northbound connections >>>>> should be established from the host nodes. >>>>> >>>>> ovn-sbctl get-ssl /get-connection >>>>> The output is stuck till i terminate it. >>>>> >>>>> >>>> Yes, the ovn-* commands works only on engine's machine, which has >>>> the role ovn-central. >>>> On the hosts, there is only the ovn-controller, which connects >>>> the ovn southbound to openvswitch on the host. >>>> >>>> >>>>> For the requested logs the below are found in the >>>>> ovsdb-server-sb.log >>>>> >>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>> received SSL data on JSON-RPC channel >>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>> received SSL data on JSON-RPC channel >>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>> rate >>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>> received SSL data on JSON-RPC channel >>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>> connection dropped (Protocol error) >>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>> connection dropped (Protocol error) >>>>> >>>>> >>>>> How can we fix these SSL errors? >>>>> >>>> >>>> I addressed this above. >>>> >>>> >>>>> I thought vdsm did the certificate provisioning on the host >>>>> nodes as to communicate to the engine host node. >>>>> >>>>> >>>> Yes, this seems to work in your scenario, just the SSL >>>> configuration on the ovn-central was lost. >>>> >>>> >>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> Looks still like the ovn-controller on the host has problems >>>>>> communicating with ovn-southbound. >>>>>> >>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>> >>>>>> Can you please check the output of >>>>>> >>>>>> ovn-nbctl get-ssl >>>>>> ovn-nbctl get-connection >>>>>> ovn-sbctl get-ssl >>>>>> ovn-sbctl get-connection >>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>> >>>>>> it should be similar to >>>>>> >>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>> Bootstrap: false >>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>> pssl:6641:[::] >>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>> Bootstrap: false >>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>> read-write role="" pssl:6642:[::] >>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> I did a restart of the ovn-controller, this is the output of >>>>>>> the ovn-controller.log >>>>>>> >>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>> connecting... >>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>> connected >>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>>>>>> force recompute. >>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connecting... >>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>> reconnected, force recompute. >>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connection attempt failed (Protocol error) >>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connecting... >>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connection attempt failed (Protocol error) >>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> waiting 2 seconds before reconnect >>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connecting... >>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connection attempt failed (Protocol error) >>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> waiting 4 seconds before reconnect >>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connecting... >>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> connection attempt failed (Protocol error) >>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>> >>>>>>> >>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>> controller, i can redo it if you wan. >>>>>>> >>>>>>> After the restart of the ovn-controller the OVIRT ENGINE still >>>>>>> shows only two geneve connections one with DC01-host02 and DC02-host01. >>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>> hostname: "dc02-host01" >>>>>>> Encap geneve >>>>>>> ip: "DC02-host01_IP" >>>>>>> options: {csum="true"} >>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>> hostname: "DC01-host02" >>>>>>> Encap geneve >>>>>>> ip: "DC01-host02" >>>>>>> options: {csum="true"} >>>>>>> >>>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>> the knowledge and experience with the community! >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Ok below the output per node and DC >>>>>>>>> DC01 >>>>>>>>> node01 >>>>>>>>> >>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-remote >>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-type >>>>>>>>> geneve >>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>> >>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>> >>>>>>>>> node02 >>>>>>>>> >>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-remote >>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-type >>>>>>>>> geneve >>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>> >>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>> >>>>>>>>> DC02 >>>>>>>>> node01 >>>>>>>>> >>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-remote >>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-type >>>>>>>>> geneve >>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>> >>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>> >>>>>>>>> >>>>>>>> Looks good. >>>>>>>> >>>>>>>> >>>>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>> >>>>>>>> >>>>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>>>>>> Maybe restarting ovn-controller creates more helpful log messages? >>>>>>>> >>>>>>>> You can also try restart the ovn configuration on all hosts >>>>>>>> by executing >>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>> on each host, this would trigger >>>>>>>> >>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>> internally. >>>>>>>> >>>>>>>> >>>>>>>>> So I would expect to see the same output for node01 to have >>>>>>>>> a geneve tunnel to node02 and vice versa. >>>>>>>>> >>>>>>>>> >>>>>>>> Me too. >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dominik >>>>>>>>>>> >>>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>>> clusters and the hosts. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> sounds good. >>>>>>>>>> This configuration is required already during the host is >>>>>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows >>>>>>>>>>> only two hosts, 1 per DC. >>>>>>>>>>> >>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>> Encap geneve >>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>> options: {csum="true"} >>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>> Encap geneve >>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>> options: {csum="true"} >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>> >>>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>>> >>>>>>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log >>>>>>>>>>> lists on all logs >>>>>>>>>>> >>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Can you please compare the output of >>>>>>>>>> >>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>>> >>>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing >>>>>>>>>> host dc01-node1? >>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>> configuration. >>>>>>>>>> >>>>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>>>> the knowledge and experience with the community. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> Best regards >>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all >>>>>>>>>>>>> >>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG >>>>>>>>>>>>> Node image. >>>>>>>>>>>>> >>>>>>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>>>>>> couple of months ago. >>>>>>>>>>>>> As such we had to re-install one of the Centos 7 node >>>>>>>>>>>>> and start from scratch. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> To trigger the automatic configuration of the host, it is >>>>>>>>>>>> required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Even though the installation completed successfully and >>>>>>>>>>>>> VMs are created, the following are not working as expected: >>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the other >>>>>>>>>>>>> Centos 7 node in the cluster. >>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however >>>>>>>>>>>>> no geneve tunnel is established when "ovn-sbctl show" is issued on the >>>>>>>>>>>>> engine. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for the >>>>>>>>>>>>> ovs db. >>>>>>>>>>>>> >>>>>>>>>>>>> Does anyone have any experience on how to troubleshoot >>>>>>>>>>>>> OVN on ovirt? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should >>>>>>>>>>>> contain a helpful hint. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>> List Archives: >>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>> >>>>>>>>>>>>

Hi Dominik Below is the output of the ovs-vsctl list interface _uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve _uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal _uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve _uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" _uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve _uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : "" I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address. I really do not understand why these will create a conflict. On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
This is the updated one:
# This file is automatically generated by engine-setup. Please do not edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=true ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_text* ovirt-host=https://dc02-ovirt01.testdomain.com:443 ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER] provider-host=dc02-ovirt01.testdomain.com [AUTH] auth-plugin=auth.plugins.static_token:NoAuthPlugin
However, it still does not connect. It prompts for the certificate but then fails and prompts to see the log but the ovirt-provider-ovn.log does not list anything.
Yes we've got ovirt for about a year now from about version 4.1
This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> There is a file with the below entries >> > > Impressive, do you know when this config file was created and if it > was manually modified? > Is this an upgrade from oVirt 4.1? > > >> [root@dc02-ovirt01 log]# cat >> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >> # This file is automatically generated by engine-setup. Please do >> not edit manually >> [OVN REMOTE] >> ovn-remote=tcp:127.0.0.1:6641 >> [SSL] >> https-enabled=false >> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >> >> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >> [OVIRT] >> ovirt-sso-client-secret=*random_test* >> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >> ovirt-sso-client-id=ovirt-provider-ovn >> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >> [NETWORK] >> port-security-enabled-default=True >> [PROVIDER] >> >> provider-host=dc02-ovirt01.testdomain.com >> >> The only entry missing is the [AUTH] and under [SSL] the >> https-enabled is false. Should I edit this in this file or is this going to >> break everything? >> >> > Changing the file should improve, but better create a backup into > another diretory before modification. > The only required change is > from > ovn-remote=tcp:127.0.0.1:6641 > to > ovn-remote=ssl:127.0.0.1:6641 > > > > >> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> Hi Dominik >>>> >>>> That immediately fixed the geneve tunnels between all hosts. >>>> >>>> >>> thanks for the feedback. >>> >>> >>>> However, the ovn provider is not broken. >>>> After fixing the networks we tried to move a VM to the >>>> DC01-host01 so we powered it down and simply configured it to run on >>>> dc01-node01. >>>> >>>> While checking the logs on the ovirt engine i noticed the below: >>>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>>> >>>> The ovn-provider configure on the engine is the below: >>>> Name: ovirt-provider-ovn >>>> Description: oVirt network provider for OVN >>>> Type: External Network Provider >>>> Network Plugin: oVirt Network Provider for OVN >>>> Automatic Synchronization: Checked >>>> Unmanaged: Unchecked >>>> Provider URL: http:localhost:9696 >>>> Requires Authentication: Checked >>>> Username: admin@internal >>>> Password: "The admin password" >>>> Protocol: hTTP >>>> Host Name: dc02-ovirt01 >>>> API Port: 35357 >>>> API Version: v2.0 >>>> Tenant Name: "Empty" >>>> >>>> In the past this was deleted by an engineer and recreated as per >>>> the documentation, and it worked. Do we need to update something due to the >>>> SSL on the ovn? >>>> >>>> >>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>> engine-setup should have created one. >>> If the file is missing, for testing purposes, you can create a >>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>> [PROVIDER] >>> provider-host=REPLACE_WITH_FQDN >>> [SSL] >>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>> >>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>> https-enabled=true >>> [OVN REMOTE] >>> ovn-remote=ssl:127.0.0.1:6641 >>> [AUTH] >>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>> [NETWORK] >>> port-security-enabled-default=True >>> >>> and restart the ovirt-provider-ovn service. >>> >>> >>> >>> >>>> From the ovn-provider logs the below is generated after a service >>>> restart and when the start VM is triggered >>>> >>>> 2020-09-15 15:07:33,579 root Starting server >>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>> Request: GET /v2.0/ports >>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: >>>> 127.0.0.1:6641: Unknown error -1 >>>> Traceback (most recent call last): >>>> File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", >>>> line 138, in _handle_request >>>> method, path_parts, content >>>> File >>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>> handle_request >>>> return self.call_response_handler(handler, content, >>>> parameters) >>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line >>>> 35, in call_response_handler >>>> with NeutronApi() as ovn_north: >>>> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", >>>> line 95, in __init__ >>>> self.ovsidl, self.idl = ovn_connection.connect() >>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line >>>> 46, in connect >>>> ovnconst.OVN_NORTHBOUND >>>> File >>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>> line 127, in from_server >>>> helper = idlutils.get_schema_helper(connection_string, >>>> schema_name) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>> line 128, in get_schema_helper >>>> 'err': os.strerror(err)}) >>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>> Unknown error -1 >>>> >>>> >>>> When i update the ovn provider from the GUI to have >>>> https://localhost:9696/ and HTTPS as the protocol the test fails. >>>> >>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Hi Dominik >>>>>> >>>>>> When these commands are used on the ovirt-engine host the >>>>>> output is the one depicted in your email. >>>>>> For your reference see also below: >>>>>> >>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>> Bootstrap: false >>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>> ptcp:6641 >>>>>> >>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>> Bootstrap: false >>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>> read-write role="" ptcp:6642 >>>>>> >>>>>> >>>>> ^^^ the line above points to the problem: ovn-central is >>>>> configured to use plain TCP without ssl. >>>>> engine-setup usually configures ovn-central to use SSL. That the >>>>> files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>> that engine-setup was triggered correctly. Looks like the ovn db >>>>> was dropped somehow, this should not happen. >>>>> This can be fixed manually by executing the following commands >>>>> on engine's machine: >>>>> ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>> ovn-nbctl set-connection pssl:6641 >>>>> ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>> ovn-sbctl set-connection pssl:6642 >>>>> >>>>> The /var/log/openvswitch/ovn-controller.log on the hosts should >>>>> tell that br-int.mgmt is connected now. >>>>> >>>>> >>>>> >>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>> >>>>>> When i try the above commands on the node hosts the following >>>>>> happens: >>>>>> ovn-nbctl get-ssl / get-connection >>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>>>> connection failed (No such file or directory) >>>>>> The above i believe is expected since no northbound connections >>>>>> should be established from the host nodes. >>>>>> >>>>>> ovn-sbctl get-ssl /get-connection >>>>>> The output is stuck till i terminate it. >>>>>> >>>>>> >>>>> Yes, the ovn-* commands works only on engine's machine, which >>>>> has the role ovn-central. >>>>> On the hosts, there is only the ovn-controller, which connects >>>>> the ovn southbound to openvswitch on the host. >>>>> >>>>> >>>>>> For the requested logs the below are found in the >>>>>> ovsdb-server-sb.log >>>>>> >>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>> received SSL data on JSON-RPC channel >>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>> received SSL data on JSON-RPC channel >>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>> rate >>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>> received SSL data on JSON-RPC channel >>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>> connection dropped (Protocol error) >>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>> connection dropped (Protocol error) >>>>>> >>>>>> >>>>>> How can we fix these SSL errors? >>>>>> >>>>> >>>>> I addressed this above. >>>>> >>>>> >>>>>> I thought vdsm did the certificate provisioning on the host >>>>>> nodes as to communicate to the engine host node. >>>>>> >>>>>> >>>>> Yes, this seems to work in your scenario, just the SSL >>>>> configuration on the ovn-central was lost. >>>>> >>>>> >>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> Looks still like the ovn-controller on the host has problems >>>>>>> communicating with ovn-southbound. >>>>>>> >>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>> >>>>>>> Can you please check the output of >>>>>>> >>>>>>> ovn-nbctl get-ssl >>>>>>> ovn-nbctl get-connection >>>>>>> ovn-sbctl get-ssl >>>>>>> ovn-sbctl get-connection >>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>> >>>>>>> it should be similar to >>>>>>> >>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>> Bootstrap: false >>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>> pssl:6641:[::] >>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>> Bootstrap: false >>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>> read-write role="" pssl:6642:[::] >>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> I did a restart of the ovn-controller, this is the output of >>>>>>>> the ovn-controller.log >>>>>>>> >>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>> connecting... >>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>> connected >>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, >>>>>>>> force recompute. >>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connecting... >>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>> reconnected, force recompute. >>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>> unexpected SSL connection close >>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connection attempt failed (Protocol error) >>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connecting... >>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>> unexpected SSL connection close >>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connection attempt failed (Protocol error) >>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> waiting 2 seconds before reconnect >>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connecting... >>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>> unexpected SSL connection close >>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connection attempt failed (Protocol error) >>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> waiting 4 seconds before reconnect >>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connecting... >>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>> unexpected SSL connection close >>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> connection attempt failed (Protocol error) >>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>> >>>>>>>> >>>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>> controller, i can redo it if you wan. >>>>>>>> >>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>> DC02-host01. >>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>> hostname: "dc02-host01" >>>>>>>> Encap geneve >>>>>>>> ip: "DC02-host01_IP" >>>>>>>> options: {csum="true"} >>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>> hostname: "DC01-host02" >>>>>>>> Encap geneve >>>>>>>> ip: "DC01-host02" >>>>>>>> options: {csum="true"} >>>>>>>> >>>>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>> the knowledge and experience with the community! >>>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Ok below the output per node and DC >>>>>>>>>> DC01 >>>>>>>>>> node01 >>>>>>>>>> >>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-remote >>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>> geneve >>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>> >>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>> >>>>>>>>>> node02 >>>>>>>>>> >>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-remote >>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>> geneve >>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>> >>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>> >>>>>>>>>> DC02 >>>>>>>>>> node01 >>>>>>>>>> >>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-remote >>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>> geneve >>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>> >>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Looks good. >>>>>>>>> >>>>>>>>> >>>>>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? >>>>>>>>> Maybe restarting ovn-controller creates more helpful log messages? >>>>>>>>> >>>>>>>>> You can also try restart the ovn configuration on all hosts >>>>>>>>> by executing >>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>> on each host, this would trigger >>>>>>>>> >>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>> internally. >>>>>>>>> >>>>>>>>> >>>>>>>>>> So I would expect to see the same output for node01 to have >>>>>>>>>> a geneve tunnel to node02 and vice versa. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Me too. >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Dominik >>>>>>>>>>>> >>>>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>>>> clusters and the hosts. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> sounds good. >>>>>>>>>>> This configuration is required already during the host is >>>>>>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows >>>>>>>>>>>> only two hosts, 1 per DC. >>>>>>>>>>>> >>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>> Encap geneve >>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>> Encap geneve >>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>> >>>>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>>>> >>>>>>>>>>>> The output of the >>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>> >>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Can you please compare the output of >>>>>>>>>>> >>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>>>> >>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing >>>>>>>>>>> host dc01-node1? >>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>> configuration. >>>>>>>>>>> >>>>>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>>>>> the knowledge and experience with the community. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> Best regards >>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT >>>>>>>>>>>>>> NG Node image. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The environment was stable till an upgrade took place a >>>>>>>>>>>>>> couple of months ago. >>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 node >>>>>>>>>>>>>> and start from scratch. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> To trigger the automatic configuration of the host, it >>>>>>>>>>>>> is required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Even though the installation completed successfully and >>>>>>>>>>>>>> VMs are created, the following are not working as expected: >>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however >>>>>>>>>>>>>> no geneve tunnel is established when "ovn-sbctl show" is issued on the >>>>>>>>>>>>>> engine. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for >>>>>>>>>>>>>> the ovs db. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does anyone have any experience on how to troubleshoot >>>>>>>>>>>>>> OVN on ovirt? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>> >>>>>>>>>>>>>

Hi Dominik Just saw the below on host dc01-host02 ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0" Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior. On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
Can you try again with:
[OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641 [SSL] https-enabled=false ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass [OVIRT] ovirt-sso-client-secret=*random_test* ovirt-host=https://dc02-ovirt01.testdomain.com:443 <https://dc02-ovirt01.testdomain.com/> ovirt-sso-client-id=ovirt-provider-ovn ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem [NETWORK] port-security-enabled-default=True [PROVIDER]
provider-host=dc02-ovirt01.testdomain.com
Please note that the should match the HTTP or HTTPS in the of the ovirt-prover-ovn configuration in oVirt Engine. So if the ovirt-provider-ovn entity in Engine is on HTTP, the config file should use https-enabled=false
On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> This is the updated one: > > # This file is automatically generated by engine-setup. Please do > not edit manually > [OVN REMOTE] > ovn-remote=ssl:127.0.0.1:6641 > [SSL] > https-enabled=true > ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem > ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer > ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass > [OVIRT] > ovirt-sso-client-secret=*random_text* > ovirt-host=https://dc02-ovirt01.testdomain.com:443 > ovirt-sso-client-id=ovirt-provider-ovn > ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem > [NETWORK] > port-security-enabled-default=True > [PROVIDER] > provider-host=dc02-ovirt01.testdomain.com > [AUTH] > auth-plugin=auth.plugins.static_token:NoAuthPlugin > > > However, it still does not connect. > It prompts for the certificate but then fails and prompts to see the > log but the ovirt-provider-ovn.log does not list anything. > > Yes we've got ovirt for about a year now from about version 4.1 > > This might explain the trouble. Upgrade of ovirt-provider-ovn should work flawlessly starting from oVirt 4.2.
> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> There is a file with the below entries >>> >> >> Impressive, do you know when this config file was created and if it >> was manually modified? >> Is this an upgrade from oVirt 4.1? >> >> >>> [root@dc02-ovirt01 log]# cat >>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>> # This file is automatically generated by engine-setup. Please do >>> not edit manually >>> [OVN REMOTE] >>> ovn-remote=tcp:127.0.0.1:6641 >>> [SSL] >>> https-enabled=false >>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>> >>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>> [OVIRT] >>> ovirt-sso-client-secret=*random_test* >>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>> ovirt-sso-client-id=ovirt-provider-ovn >>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>> [NETWORK] >>> port-security-enabled-default=True >>> [PROVIDER] >>> >>> provider-host=dc02-ovirt01.testdomain.com >>> >>> The only entry missing is the [AUTH] and under [SSL] the >>> https-enabled is false. Should I edit this in this file or is this going to >>> break everything? >>> >>> >> Changing the file should improve, but better create a backup into >> another diretory before modification. >> The only required change is >> from >> ovn-remote=tcp:127.0.0.1:6641 >> to >> ovn-remote=ssl:127.0.0.1:6641 >> >> >> >> >>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Hi Dominik >>>>> >>>>> That immediately fixed the geneve tunnels between all hosts. >>>>> >>>>> >>>> thanks for the feedback. >>>> >>>> >>>>> However, the ovn provider is not broken. >>>>> After fixing the networks we tried to move a VM to the >>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>> dc01-node01. >>>>> >>>>> While checking the logs on the ovirt engine i noticed the below: >>>>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>>>> >>>>> The ovn-provider configure on the engine is the below: >>>>> Name: ovirt-provider-ovn >>>>> Description: oVirt network provider for OVN >>>>> Type: External Network Provider >>>>> Network Plugin: oVirt Network Provider for OVN >>>>> Automatic Synchronization: Checked >>>>> Unmanaged: Unchecked >>>>> Provider URL: http:localhost:9696 >>>>> Requires Authentication: Checked >>>>> Username: admin@internal >>>>> Password: "The admin password" >>>>> Protocol: hTTP >>>>> Host Name: dc02-ovirt01 >>>>> API Port: 35357 >>>>> API Version: v2.0 >>>>> Tenant Name: "Empty" >>>>> >>>>> In the past this was deleted by an engineer and recreated as per >>>>> the documentation, and it worked. Do we need to update something due to the >>>>> SSL on the ovn? >>>>> >>>>> >>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>> engine-setup should have created one. >>>> If the file is missing, for testing purposes, you can create a >>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>> [PROVIDER] >>>> provider-host=REPLACE_WITH_FQDN >>>> [SSL] >>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>> >>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>> https-enabled=true >>>> [OVN REMOTE] >>>> ovn-remote=ssl:127.0.0.1:6641 >>>> [AUTH] >>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>> [NETWORK] >>>> port-security-enabled-default=True >>>> >>>> and restart the ovirt-provider-ovn service. >>>> >>>> >>>> >>>> >>>>> From the ovn-provider logs the below is generated after a >>>>> service restart and when the start VM is triggered >>>>> >>>>> 2020-09-15 15:07:33,579 root Starting server >>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>> Request: GET /v2.0/ports >>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: >>>>> 127.0.0.1:6641: Unknown error -1 >>>>> Traceback (most recent call last): >>>>> File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", >>>>> line 138, in _handle_request >>>>> method, path_parts, content >>>>> File >>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>> handle_request >>>>> return self.call_response_handler(handler, content, >>>>> parameters) >>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line >>>>> 35, in call_response_handler >>>>> with NeutronApi() as ovn_north: >>>>> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", >>>>> line 95, in __init__ >>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line >>>>> 46, in connect >>>>> ovnconst.OVN_NORTHBOUND >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>> line 127, in from_server >>>>> helper = idlutils.get_schema_helper(connection_string, >>>>> schema_name) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>> line 128, in get_schema_helper >>>>> 'err': os.strerror(err)}) >>>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>>> Unknown error -1 >>>>> >>>>> >>>>> When i update the ovn provider from the GUI to have >>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>> fails. >>>>> >>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> Hi Dominik >>>>>>> >>>>>>> When these commands are used on the ovirt-engine host the >>>>>>> output is the one depicted in your email. >>>>>>> For your reference see also below: >>>>>>> >>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>> Bootstrap: false >>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>> ptcp:6641 >>>>>>> >>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>> Bootstrap: false >>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>> read-write role="" ptcp:6642 >>>>>>> >>>>>>> >>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>> configured to use plain TCP without ssl. >>>>>> engine-setup usually configures ovn-central to use SSL. That >>>>>> the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>> that engine-setup was triggered correctly. Looks like the ovn >>>>>> db was dropped somehow, this should not happen. >>>>>> This can be fixed manually by executing the following commands >>>>>> on engine's machine: >>>>>> ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>> ovn-nbctl set-connection pssl:6641 >>>>>> ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>> ovn-sbctl set-connection pssl:6642 >>>>>> >>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts should >>>>>> tell that br-int.mgmt is connected now. >>>>>> >>>>>> >>>>>> >>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>> >>>>>>> When i try the above commands on the node hosts the following >>>>>>> happens: >>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>>>>> connection failed (No such file or directory) >>>>>>> The above i believe is expected since no northbound >>>>>>> connections should be established from the host nodes. >>>>>>> >>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>> The output is stuck till i terminate it. >>>>>>> >>>>>>> >>>>>> Yes, the ovn-* commands works only on engine's machine, which >>>>>> has the role ovn-central. >>>>>> On the hosts, there is only the ovn-controller, which connects >>>>>> the ovn southbound to openvswitch on the host. >>>>>> >>>>>> >>>>>>> For the requested logs the below are found in the >>>>>>> ovsdb-server-sb.log >>>>>>> >>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>> received SSL data on JSON-RPC channel >>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>> received SSL data on JSON-RPC channel >>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>> rate >>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>> received SSL data on JSON-RPC channel >>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>> connection dropped (Protocol error) >>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>> connection dropped (Protocol error) >>>>>>> >>>>>>> >>>>>>> How can we fix these SSL errors? >>>>>>> >>>>>> >>>>>> I addressed this above. >>>>>> >>>>>> >>>>>>> I thought vdsm did the certificate provisioning on the host >>>>>>> nodes as to communicate to the engine host node. >>>>>>> >>>>>>> >>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>> configuration on the ovn-central was lost. >>>>>> >>>>>> >>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> Looks still like the ovn-controller on the host has problems >>>>>>>> communicating with ovn-southbound. >>>>>>>> >>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>> >>>>>>>> Can you please check the output of >>>>>>>> >>>>>>>> ovn-nbctl get-ssl >>>>>>>> ovn-nbctl get-connection >>>>>>>> ovn-sbctl get-ssl >>>>>>>> ovn-sbctl get-connection >>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>> >>>>>>>> it should be similar to >>>>>>>> >>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>> Bootstrap: false >>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>> pssl:6641:[::] >>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>> Bootstrap: false >>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> I did a restart of the ovn-controller, this is the output of >>>>>>>>> the ovn-controller.log >>>>>>>>> >>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>> connecting... >>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>> connected >>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>> reconnected, force recompute. >>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connecting... >>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>> reconnected, force recompute. >>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>> unexpected SSL connection close >>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connecting... >>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>> unexpected SSL connection close >>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connecting... >>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>> unexpected SSL connection close >>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connecting... >>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>> unexpected SSL connection close >>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>> >>>>>>>>> >>>>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>> controller, i can redo it if you wan. >>>>>>>>> >>>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>>> DC02-host01. >>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>> hostname: "dc02-host01" >>>>>>>>> Encap geneve >>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>> options: {csum="true"} >>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>> hostname: "DC01-host02" >>>>>>>>> Encap geneve >>>>>>>>> ip: "DC01-host02" >>>>>>>>> options: {csum="true"} >>>>>>>>> >>>>>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>>> the knowledge and experience with the community! >>>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>> DC01 >>>>>>>>>>> node01 >>>>>>>>>>> >>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>> geneve >>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>> >>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>> >>>>>>>>>>> node02 >>>>>>>>>>> >>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>> geneve >>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>> >>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>> >>>>>>>>>>> DC02 >>>>>>>>>>> node01 >>>>>>>>>>> >>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>> geneve >>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>> >>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Looks good. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 >>>>>>>>>> ? Maybe restarting ovn-controller creates more helpful log messages? >>>>>>>>>> >>>>>>>>>> You can also try restart the ovn configuration on all hosts >>>>>>>>>> by executing >>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>> on each host, this would trigger >>>>>>>>>> >>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>> internally. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> So I would expect to see the same output for node01 to >>>>>>>>>>> have a geneve tunnel to node02 and vice versa. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Me too. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>> >>>>>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>>>>> clusters and the hosts. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> sounds good. >>>>>>>>>>>> This configuration is required already during the host is >>>>>>>>>>>> added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows >>>>>>>>>>>>> only two hosts, 1 per DC. >>>>>>>>>>>>> >>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>> >>>>>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>>>>> >>>>>>>>>>>>> The output of the >>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>> >>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>> >>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>>>>> >>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing >>>>>>>>>>>> host dc01-node1? >>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>> configuration. >>>>>>>>>>>> >>>>>>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>>>>>> the knowledge and experience with the community. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> Best regards >>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT >>>>>>>>>>>>>>> NG Node image. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The environment was stable till an upgrade took place >>>>>>>>>>>>>>> a couple of months ago. >>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 node >>>>>>>>>>>>>>> and start from scratch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> To trigger the automatic configuration of the host, it >>>>>>>>>>>>>> is required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Even though the installation completed successfully >>>>>>>>>>>>>>> and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however >>>>>>>>>>>>>>> no geneve tunnel is established when "ovn-sbctl show" is issued on the >>>>>>>>>>>>>>> engine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for >>>>>>>>>>>>>>> the ovs db. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does anyone have any experience on how to troubleshoot >>>>>>>>>>>>>>> OVN on ovirt? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>

Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host? On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
Fixed the issue.
Thanks.
I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf needed update also. The package is upgraded to the latest version.
Once the provider was updated with the following it functioned perfectly:
Name: ovirt-provider-ovn Description: oVirt network provider for OVN Type: External Network Provider Network Plugin: oVirt Network Provider for OVN Automatic Synchronization: Checked Unmanaged: Unchecked Provider URL: https:dc02-ovirt01.testdomain.com:9696 Requires Authentication: Checked Username: admin@internal Password: "The admin password" Protocol: HTTPS Host Name: dc02-ovirt01.testdomain.com API Port: 35357 API Version: v2.0 Tenant Name: "Empty"
For some reason the TLS certificate was in conflict with the ovn provider details, i would bet the "host" entry.
So now geneve tunnels are established. OVN provider is working.
But VMs still do not communicated on the same VM network spanning different hosts.
So if we have a VM network test-net on both dc01-host01 and dc01-host02 and each host has a VM with IP addresses on the same network, VMs on the same VM network should communicate directly. But traffic does not reach each other.
Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> wrote:
> Can you try again with: > > [OVN REMOTE] > ovn-remote=ssl:127.0.0.1:6641 > [SSL] > https-enabled=false > ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem > ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer > ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass > [OVIRT] > ovirt-sso-client-secret=*random_test* > ovirt-host=https://dc02-ovirt01.testdomain.com:443 > <https://dc02-ovirt01.testdomain.com/> > ovirt-sso-client-id=ovirt-provider-ovn > ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem > [NETWORK] > port-security-enabled-default=True > [PROVIDER] > > provider-host=dc02-ovirt01.testdomain.com > > > > Please note that the should match the HTTP or HTTPS in the of the > ovirt-prover-ovn configuration in oVirt Engine. > So if the ovirt-provider-ovn entity in Engine is on HTTP, the config > file should use > https-enabled=false > > > On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> This is the updated one: >> >> # This file is automatically generated by engine-setup. Please do >> not edit manually >> [OVN REMOTE] >> ovn-remote=ssl:127.0.0.1:6641 >> [SSL] >> https-enabled=true >> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >> >> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >> [OVIRT] >> ovirt-sso-client-secret=*random_text* >> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >> ovirt-sso-client-id=ovirt-provider-ovn >> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >> [NETWORK] >> port-security-enabled-default=True >> [PROVIDER] >> provider-host=dc02-ovirt01.testdomain.com >> [AUTH] >> auth-plugin=auth.plugins.static_token:NoAuthPlugin >> >> >> However, it still does not connect. >> It prompts for the certificate but then fails and prompts to see >> the log but the ovirt-provider-ovn.log does not list anything. >> >> Yes we've got ovirt for about a year now from about version 4.1 >> >> > This might explain the trouble. Upgrade of ovirt-provider-ovn should > work flawlessly starting from oVirt 4.2. > > >> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> There is a file with the below entries >>>> >>> >>> Impressive, do you know when this config file was created and if >>> it was manually modified? >>> Is this an upgrade from oVirt 4.1? >>> >>> >>>> [root@dc02-ovirt01 log]# cat >>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>> # This file is automatically generated by engine-setup. Please do >>>> not edit manually >>>> [OVN REMOTE] >>>> ovn-remote=tcp:127.0.0.1:6641 >>>> [SSL] >>>> https-enabled=false >>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>> >>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>> [OVIRT] >>>> ovirt-sso-client-secret=*random_test* >>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>> ovirt-sso-client-id=ovirt-provider-ovn >>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>> [NETWORK] >>>> port-security-enabled-default=True >>>> [PROVIDER] >>>> >>>> provider-host=dc02-ovirt01.testdomain.com >>>> >>>> The only entry missing is the [AUTH] and under [SSL] the >>>> https-enabled is false. Should I edit this in this file or is this going to >>>> break everything? >>>> >>>> >>> Changing the file should improve, but better create a backup into >>> another diretory before modification. >>> The only required change is >>> from >>> ovn-remote=tcp:127.0.0.1:6641 >>> to >>> ovn-remote=ssl:127.0.0.1:6641 >>> >>> >>> >>> >>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Hi Dominik >>>>>> >>>>>> That immediately fixed the geneve tunnels between all hosts. >>>>>> >>>>>> >>>>> thanks for the feedback. >>>>> >>>>> >>>>>> However, the ovn provider is not broken. >>>>>> After fixing the networks we tried to move a VM to the >>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>> dc01-node01. >>>>>> >>>>>> While checking the logs on the ovirt engine i noticed the below: >>>>>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>>>>> >>>>>> The ovn-provider configure on the engine is the below: >>>>>> Name: ovirt-provider-ovn >>>>>> Description: oVirt network provider for OVN >>>>>> Type: External Network Provider >>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>> Automatic Synchronization: Checked >>>>>> Unmanaged: Unchecked >>>>>> Provider URL: http:localhost:9696 >>>>>> Requires Authentication: Checked >>>>>> Username: admin@internal >>>>>> Password: "The admin password" >>>>>> Protocol: hTTP >>>>>> Host Name: dc02-ovirt01 >>>>>> API Port: 35357 >>>>>> API Version: v2.0 >>>>>> Tenant Name: "Empty" >>>>>> >>>>>> In the past this was deleted by an engineer and recreated as >>>>>> per the documentation, and it worked. Do we need to update something due to >>>>>> the SSL on the ovn? >>>>>> >>>>>> >>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>> engine-setup should have created one. >>>>> If the file is missing, for testing purposes, you can create a >>>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>>> [PROVIDER] >>>>> provider-host=REPLACE_WITH_FQDN >>>>> [SSL] >>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>> >>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>> https-enabled=true >>>>> [OVN REMOTE] >>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>> [AUTH] >>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>> [NETWORK] >>>>> port-security-enabled-default=True >>>>> >>>>> and restart the ovirt-provider-ovn service. >>>>> >>>>> >>>>> >>>>> >>>>>> From the ovn-provider logs the below is generated after a >>>>>> service restart and when the start VM is triggered >>>>>> >>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>> Request: GET /v2.0/ports >>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from tcp: >>>>>> 127.0.0.1:6641: Unknown error -1 >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>> _handle_request >>>>>> method, path_parts, content >>>>>> File >>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>> handle_request >>>>>> return self.call_response_handler(handler, content, >>>>>> parameters) >>>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", >>>>>> line 35, in call_response_handler >>>>>> with NeutronApi() as ovn_north: >>>>>> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", >>>>>> line 95, in __init__ >>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line >>>>>> 46, in connect >>>>>> ovnconst.OVN_NORTHBOUND >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>> line 127, in from_server >>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>> schema_name) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>> line 128, in get_schema_helper >>>>>> 'err': os.strerror(err)}) >>>>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>>>> Unknown error -1 >>>>>> >>>>>> >>>>>> When i update the ovn provider from the GUI to have >>>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>>> fails. >>>>>> >>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Dominik >>>>>>>> >>>>>>>> When these commands are used on the ovirt-engine host the >>>>>>>> output is the one depicted in your email. >>>>>>>> For your reference see also below: >>>>>>>> >>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>> Bootstrap: false >>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>> ptcp:6641 >>>>>>>> >>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>> Bootstrap: false >>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>> read-write role="" ptcp:6642 >>>>>>>> >>>>>>>> >>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>> configured to use plain TCP without ssl. >>>>>>> engine-setup usually configures ovn-central to use SSL. That >>>>>>> the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>> that engine-setup was triggered correctly. Looks like the ovn >>>>>>> db was dropped somehow, this should not happen. >>>>>>> This can be fixed manually by executing the following commands >>>>>>> on engine's machine: >>>>>>> ovn-nbctl set-ssl >>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>> ovn-sbctl set-ssl >>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>> >>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>> should tell that br-int.mgmt is connected now. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>> >>>>>>>> When i try the above commands on the node hosts the following >>>>>>>> happens: >>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>>>>>> connection failed (No such file or directory) >>>>>>>> The above i believe is expected since no northbound >>>>>>>> connections should be established from the host nodes. >>>>>>>> >>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>> The output is stuck till i terminate it. >>>>>>>> >>>>>>>> >>>>>>> Yes, the ovn-* commands works only on engine's machine, which >>>>>>> has the role ovn-central. >>>>>>> On the hosts, there is only the ovn-controller, which connects >>>>>>> the ovn southbound to openvswitch on the host. >>>>>>> >>>>>>> >>>>>>>> For the requested logs the below are found in the >>>>>>>> ovsdb-server-sb.log >>>>>>>> >>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>> received SSL data on JSON-RPC channel >>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>> received SSL data on JSON-RPC channel >>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>> rate >>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>> received SSL data on JSON-RPC channel >>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>> connection dropped (Protocol error) >>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>> connection dropped (Protocol error) >>>>>>>> >>>>>>>> >>>>>>>> How can we fix these SSL errors? >>>>>>>> >>>>>>> >>>>>>> I addressed this above. >>>>>>> >>>>>>> >>>>>>>> I thought vdsm did the certificate provisioning on the host >>>>>>>> nodes as to communicate to the engine host node. >>>>>>>> >>>>>>>> >>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>> configuration on the ovn-central was lost. >>>>>>> >>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> Looks still like the ovn-controller on the host has problems >>>>>>>>> communicating with ovn-southbound. >>>>>>>>> >>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>> >>>>>>>>> Can you please check the output of >>>>>>>>> >>>>>>>>> ovn-nbctl get-ssl >>>>>>>>> ovn-nbctl get-connection >>>>>>>>> ovn-sbctl get-ssl >>>>>>>>> ovn-sbctl get-connection >>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>> >>>>>>>>> it should be similar to >>>>>>>>> >>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>> Bootstrap: false >>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>> pssl:6641:[::] >>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>> Bootstrap: false >>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I did a restart of the ovn-controller, this is the output >>>>>>>>>> of the ovn-controller.log >>>>>>>>>> >>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>> connecting... >>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>> connected >>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>> reconnected, force recompute. >>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connecting... >>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>> reconnected, force recompute. >>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>> unexpected SSL connection close >>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connecting... >>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>> unexpected SSL connection close >>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connecting... >>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>> unexpected SSL connection close >>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connecting... >>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>> unexpected SSL connection close >>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>> >>>>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>>>> DC02-host01. >>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>> Encap geneve >>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>> options: {csum="true"} >>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>> Encap geneve >>>>>>>>>> ip: "DC01-host02" >>>>>>>>>> options: {csum="true"} >>>>>>>>>> >>>>>>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>>>> the knowledge and experience with the community! >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>> DC01 >>>>>>>>>>>> node01 >>>>>>>>>>>> >>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>> geneve >>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>> >>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>> >>>>>>>>>>>> node02 >>>>>>>>>>>> >>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>> geneve >>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>> >>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>> >>>>>>>>>>>> DC02 >>>>>>>>>>>> node01 >>>>>>>>>>>> >>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>> geneve >>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>> >>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Looks good. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> DC01 node01 and node02 share the same VM networks and VMs >>>>>>>>>>>> deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 >>>>>>>>>>> ? Maybe restarting ovn-controller creates more helpful log messages? >>>>>>>>>>> >>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>> hosts by executing >>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>>> on each host, this would trigger >>>>>>>>>>> >>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>> internally. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> So I would expect to see the same output for node01 to >>>>>>>>>>>> have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Me too. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>> >>>>>>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>>>>>> clusters and the hosts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> sounds good. >>>>>>>>>>>>> This configuration is required already during the host >>>>>>>>>>>>> is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>> >>>>>>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>>>>>> >>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>> >>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>>>>>> >>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing >>>>>>>>>>>>> host dc01-node1? >>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>> configuration. >>>>>>>>>>>>> >>>>>>>>>>>>> Please include ovirt-users list in your replay, to share >>>>>>>>>>>>> the knowledge and experience with the community. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT >>>>>>>>>>>>>>>> NG Node image. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The environment was stable till an upgrade took place >>>>>>>>>>>>>>>> a couple of months ago. >>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 node >>>>>>>>>>>>>>>> and start from scratch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To trigger the automatic configuration of the host, it >>>>>>>>>>>>>>> is required to configure ovirt-provider-ovn as the default network provider >>>>>>>>>>>>>>> for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Even though the installation completed successfully >>>>>>>>>>>>>>>> and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for >>>>>>>>>>>>>>>> the ovs db. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>> To unsubscribe send an email to users-leave@ovirt.org >>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>

I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well. In theory the ovs sbdb will be fixed once it is re-installed.... If not we can then check if there is a stale entry in the ovirt host where the sb db is managed. Do you agree with this? On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > Fixed the issue. >
Thanks.
> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf > needed update also. > The package is upgraded to the latest version. > > Once the provider was updated with the following it functioned > perfectly: > > Name: ovirt-provider-ovn > Description: oVirt network provider for OVN > Type: External Network Provider > Network Plugin: oVirt Network Provider for OVN > Automatic Synchronization: Checked > Unmanaged: Unchecked > Provider URL: https:dc02-ovirt01.testdomain.com:9696 > Requires Authentication: Checked > Username: admin@internal > Password: "The admin password" > Protocol: HTTPS > Host Name: dc02-ovirt01.testdomain.com > API Port: 35357 > API Version: v2.0 > Tenant Name: "Empty" > > For some reason the TLS certificate was in conflict with the ovn > provider details, i would bet the "host" entry. > > So now geneve tunnels are established. > OVN provider is working. > > But VMs still do not communicated on the same VM network spanning > different hosts. > > So if we have a VM network test-net on both dc01-host01 and > dc01-host02 and each host has a VM with IP addresses on the same network, > VMs on the same VM network should communicate directly. > But traffic does not reach each other. > > Can you create a new external network, with port security disabled, and an IPv4 subnet? If the VMs get an IP address via DHCP, ovn is working, and should be able to ping each other, too. If not, there should be a helpful entry in the ovn-controller.log of the host the VM is running.
> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> > wrote: > >> Can you try again with: >> >> [OVN REMOTE] >> ovn-remote=ssl:127.0.0.1:6641 >> [SSL] >> https-enabled=false >> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >> >> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >> [OVIRT] >> ovirt-sso-client-secret=*random_test* >> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >> <https://dc02-ovirt01.testdomain.com/> >> ovirt-sso-client-id=ovirt-provider-ovn >> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >> [NETWORK] >> port-security-enabled-default=True >> [PROVIDER] >> >> provider-host=dc02-ovirt01.testdomain.com >> >> >> >> Please note that the should match the HTTP or HTTPS in the of the >> ovirt-prover-ovn configuration in oVirt Engine. >> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >> config file should use >> https-enabled=false >> >> >> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> This is the updated one: >>> >>> # This file is automatically generated by engine-setup. Please do >>> not edit manually >>> [OVN REMOTE] >>> ovn-remote=ssl:127.0.0.1:6641 >>> [SSL] >>> https-enabled=true >>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>> >>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>> [OVIRT] >>> ovirt-sso-client-secret=*random_text* >>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>> ovirt-sso-client-id=ovirt-provider-ovn >>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>> [NETWORK] >>> port-security-enabled-default=True >>> [PROVIDER] >>> provider-host=dc02-ovirt01.testdomain.com >>> [AUTH] >>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>> >>> >>> However, it still does not connect. >>> It prompts for the certificate but then fails and prompts to see >>> the log but the ovirt-provider-ovn.log does not list anything. >>> >>> Yes we've got ovirt for about a year now from about version 4.1 >>> >>> >> This might explain the trouble. Upgrade of ovirt-provider-ovn >> should work flawlessly starting from oVirt 4.2. >> >> >>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> There is a file with the below entries >>>>> >>>> >>>> Impressive, do you know when this config file was created and if >>>> it was manually modified? >>>> Is this an upgrade from oVirt 4.1? >>>> >>>> >>>>> [root@dc02-ovirt01 log]# cat >>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>> # This file is automatically generated by engine-setup. Please >>>>> do not edit manually >>>>> [OVN REMOTE] >>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>> [SSL] >>>>> https-enabled=false >>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>> >>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>> [OVIRT] >>>>> ovirt-sso-client-secret=*random_test* >>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>> [NETWORK] >>>>> port-security-enabled-default=True >>>>> [PROVIDER] >>>>> >>>>> provider-host=dc02-ovirt01.testdomain.com >>>>> >>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>> break everything? >>>>> >>>>> >>>> Changing the file should improve, but better create a backup into >>>> another diretory before modification. >>>> The only required change is >>>> from >>>> ovn-remote=tcp:127.0.0.1:6641 >>>> to >>>> ovn-remote=ssl:127.0.0.1:6641 >>>> >>>> >>>> >>>> >>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> Hi Dominik >>>>>>> >>>>>>> That immediately fixed the geneve tunnels between all hosts. >>>>>>> >>>>>>> >>>>>> thanks for the feedback. >>>>>> >>>>>> >>>>>>> However, the ovn provider is not broken. >>>>>>> After fixing the networks we tried to move a VM to the >>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>> dc01-node01. >>>>>>> >>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>> below: >>>>>>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>>>>>> >>>>>>> The ovn-provider configure on the engine is the below: >>>>>>> Name: ovirt-provider-ovn >>>>>>> Description: oVirt network provider for OVN >>>>>>> Type: External Network Provider >>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>> Automatic Synchronization: Checked >>>>>>> Unmanaged: Unchecked >>>>>>> Provider URL: http:localhost:9696 >>>>>>> Requires Authentication: Checked >>>>>>> Username: admin@internal >>>>>>> Password: "The admin password" >>>>>>> Protocol: hTTP >>>>>>> Host Name: dc02-ovirt01 >>>>>>> API Port: 35357 >>>>>>> API Version: v2.0 >>>>>>> Tenant Name: "Empty" >>>>>>> >>>>>>> In the past this was deleted by an engineer and recreated as >>>>>>> per the documentation, and it worked. Do we need to update something due to >>>>>>> the SSL on the ovn? >>>>>>> >>>>>>> >>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>> engine-setup should have created one. >>>>>> If the file is missing, for testing purposes, you can create a >>>>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>>>> [PROVIDER] >>>>>> provider-host=REPLACE_WITH_FQDN >>>>>> [SSL] >>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>> >>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>> https-enabled=true >>>>>> [OVN REMOTE] >>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>> [AUTH] >>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>> [NETWORK] >>>>>> port-security-enabled-default=True >>>>>> >>>>>> and restart the ovirt-provider-ovn service. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> From the ovn-provider logs the below is generated after a >>>>>>> service restart and when the start VM is triggered >>>>>>> >>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>> Request: GET /v2.0/ports >>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from >>>>>>> tcp:127.0.0.1:6641: Unknown error -1 >>>>>>> Traceback (most recent call last): >>>>>>> File >>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>> _handle_request >>>>>>> method, path_parts, content >>>>>>> File >>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>> handle_request >>>>>>> return self.call_response_handler(handler, content, >>>>>>> parameters) >>>>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", >>>>>>> line 35, in call_response_handler >>>>>>> with NeutronApi() as ovn_north: >>>>>>> File "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", >>>>>>> line 95, in __init__ >>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", line >>>>>>> 46, in connect >>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>> line 127, in from_server >>>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>>> schema_name) >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>> line 128, in get_schema_helper >>>>>>> 'err': os.strerror(err)}) >>>>>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>>>>> Unknown error -1 >>>>>>> >>>>>>> >>>>>>> When i update the ovn provider from the GUI to have >>>>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>>>> fails. >>>>>>> >>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Dominik >>>>>>>>> >>>>>>>>> When these commands are used on the ovirt-engine host the >>>>>>>>> output is the one depicted in your email. >>>>>>>>> For your reference see also below: >>>>>>>>> >>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>> Bootstrap: false >>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>> ptcp:6641 >>>>>>>>> >>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>> Bootstrap: false >>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>> >>>>>>>>> >>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>> configured to use plain TCP without ssl. >>>>>>>> engine-setup usually configures ovn-central to use SSL. That >>>>>>>> the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>> that engine-setup was triggered correctly. Looks like the ovn >>>>>>>> db was dropped somehow, this should not happen. >>>>>>>> This can be fixed manually by executing the following >>>>>>>> commands on engine's machine: >>>>>>>> ovn-nbctl set-ssl >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>> ovn-sbctl set-ssl >>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>> >>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>> >>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>> following happens: >>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database >>>>>>>>> connection failed (No such file or directory) >>>>>>>>> The above i believe is expected since no northbound >>>>>>>>> connections should be established from the host nodes. >>>>>>>>> >>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>> The output is stuck till i terminate it. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, the ovn-* commands works only on engine's machine, which >>>>>>>> has the role ovn-central. >>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>> >>>>>>>> >>>>>>>>> For the requested logs the below are found in the >>>>>>>>> ovsdb-server-sb.log >>>>>>>>> >>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>> rate >>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>> connection dropped (Protocol error) >>>>>>>>> >>>>>>>>> >>>>>>>>> How can we fix these SSL errors? >>>>>>>>> >>>>>>>> >>>>>>>> I addressed this above. >>>>>>>> >>>>>>>> >>>>>>>>> I thought vdsm did the certificate provisioning on the host >>>>>>>>> nodes as to communicate to the engine host node. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>> configuration on the ovn-central was lost. >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>> >>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>> >>>>>>>>>> Can you please check the output of >>>>>>>>>> >>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>> >>>>>>>>>> it should be similar to >>>>>>>>>> >>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> Bootstrap: false >>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>> pssl:6641:[::] >>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> Bootstrap: false >>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I did a restart of the ovn-controller, this is the output >>>>>>>>>>> of the ovn-controller.log >>>>>>>>>>> >>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>> connecting... >>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>> connected >>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connecting... >>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connecting... >>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connecting... >>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connecting... >>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>>> >>>>>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>> DC02-host01. >>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>> Encap geneve >>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>> options: {csum="true"} >>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>> Encap geneve >>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>> options: {csum="true"} >>>>>>>>>>> >>>>>>>>>>> I've re-done the vdsm-tool command and nothing changed.... >>>>>>>>>>> again....with the same errors as the systemctl restart ovn-controller >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>>>>> the knowledge and experience with the community! >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>> DC01 >>>>>>>>>>>>> node01 >>>>>>>>>>>>> >>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>> geneve >>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>> >>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>> >>>>>>>>>>>>> node02 >>>>>>>>>>>>> >>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>> geneve >>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>> >>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>> >>>>>>>>>>>>> DC02 >>>>>>>>>>>>> node01 >>>>>>>>>>>>> >>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>> geneve >>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>> >>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Looks good. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks and >>>>>>>>>>>>> VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>> messages? >>>>>>>>>>>> >>>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>>> hosts by executing >>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>> internally. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> So I would expect to see the same output for node01 to >>>>>>>>>>>>> have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Me too. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OVN is selected as the default network provider on the >>>>>>>>>>>>>>> clusters and the hosts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>> This configuration is required already during the host >>>>>>>>>>>>>> is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When executed on the nodes the same command (ovn-sbctl >>>>>>>>>>>>>>> show) times-out on all nodes..... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>> >>>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>>>>>>>>>> >>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the failing >>>>>>>>>>>>>> host dc01-node1? >>>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 >>>>>>>>>>>>>>>>> node and start from scratch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To trigger the automatic configuration of the host, >>>>>>>>>>>>>>>> it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Even though the installation completed successfully >>>>>>>>>>>>>>>>> and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 for >>>>>>>>>>>>>>>>> the ovs db. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>

On Wed, Sep 16, 2020 at 12:15 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....
If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Maybe you could ensure that there is no entry anymore during dc01-host02 is reinstalling, before the host is added to oVirt again?
Do you agree with this?
Sounds good, but OVN should be not the reason to reinstall.
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
So a new test-net was created under DC01 and was depicted in the networks tab under both DC01 and DC02. I believe for some reason networks are duplicated in DCs, maybe for future use??? Don't know. If one tries to delete the network from the other DC it gets an error, while if deleted from the once initially created it gets deleted from both.
In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
From the DC01-node02 i get the following errors:
2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: Claiming 56:6f:77:61:00:06 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: Claiming 56:6f:77:61:00:03 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: Claiming 56:6f:77:61:00:15 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: Claiming 56:6f:77:61:00:0d 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: Claiming 56:6f:77:61:00:02 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: Claiming 56:6f:77:61:00:1c
And this repeats forever.
Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
The connections to ovn-sbctl is ok and the geneve tunnels are depicted under ovs-vsctl ok. VMs still not able to ping each other.
On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Hi Dominik >> >> Fixed the issue. >> > > Thanks. > > >> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >> needed update also. >> The package is upgraded to the latest version. >> >> Once the provider was updated with the following it functioned >> perfectly: >> >> Name: ovirt-provider-ovn >> Description: oVirt network provider for OVN >> Type: External Network Provider >> Network Plugin: oVirt Network Provider for OVN >> Automatic Synchronization: Checked >> Unmanaged: Unchecked >> Provider URL: https:dc02-ovirt01.testdomain.com:9696 >> Requires Authentication: Checked >> Username: admin@internal >> Password: "The admin password" >> Protocol: HTTPS >> Host Name: dc02-ovirt01.testdomain.com >> API Port: 35357 >> API Version: v2.0 >> Tenant Name: "Empty" >> >> For some reason the TLS certificate was in conflict with the ovn >> provider details, i would bet the "host" entry. >> >> So now geneve tunnels are established. >> OVN provider is working. >> >> But VMs still do not communicated on the same VM network spanning >> different hosts. >> >> So if we have a VM network test-net on both dc01-host01 and >> dc01-host02 and each host has a VM with IP addresses on the same network, >> VMs on the same VM network should communicate directly. >> But traffic does not reach each other. >> >> > Can you create a new external network, with port security disabled, > and an IPv4 subnet? > If the VMs get an IP address via DHCP, ovn is working, and should be > able to ping each other, too. > If not, there should be a helpful entry in the ovn-controller.log of > the host the VM is running. > > >> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> Can you try again with: >>> >>> [OVN REMOTE] >>> ovn-remote=ssl:127.0.0.1:6641 >>> [SSL] >>> https-enabled=false >>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>> >>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>> [OVIRT] >>> ovirt-sso-client-secret=*random_test* >>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>> <https://dc02-ovirt01.testdomain.com/> >>> ovirt-sso-client-id=ovirt-provider-ovn >>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>> [NETWORK] >>> port-security-enabled-default=True >>> [PROVIDER] >>> >>> provider-host=dc02-ovirt01.testdomain.com >>> >>> >>> >>> Please note that the should match the HTTP or HTTPS in the of the >>> ovirt-prover-ovn configuration in oVirt Engine. >>> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >>> config file should use >>> https-enabled=false >>> >>> >>> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> This is the updated one: >>>> >>>> # This file is automatically generated by engine-setup. Please do >>>> not edit manually >>>> [OVN REMOTE] >>>> ovn-remote=ssl:127.0.0.1:6641 >>>> [SSL] >>>> https-enabled=true >>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>> >>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>> [OVIRT] >>>> ovirt-sso-client-secret=*random_text* >>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>> ovirt-sso-client-id=ovirt-provider-ovn >>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>> [NETWORK] >>>> port-security-enabled-default=True >>>> [PROVIDER] >>>> provider-host=dc02-ovirt01.testdomain.com >>>> [AUTH] >>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>> >>>> >>>> However, it still does not connect. >>>> It prompts for the certificate but then fails and prompts to see >>>> the log but the ovirt-provider-ovn.log does not list anything. >>>> >>>> Yes we've got ovirt for about a year now from about version 4.1 >>>> >>>> >>> This might explain the trouble. Upgrade of ovirt-provider-ovn >>> should work flawlessly starting from oVirt 4.2. >>> >>> >>>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> There is a file with the below entries >>>>>> >>>>> >>>>> Impressive, do you know when this config file was created and if >>>>> it was manually modified? >>>>> Is this an upgrade from oVirt 4.1? >>>>> >>>>> >>>>>> [root@dc02-ovirt01 log]# cat >>>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>> # This file is automatically generated by engine-setup. Please >>>>>> do not edit manually >>>>>> [OVN REMOTE] >>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>> [SSL] >>>>>> https-enabled=false >>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>> >>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>> [OVIRT] >>>>>> ovirt-sso-client-secret=*random_test* >>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>> [NETWORK] >>>>>> port-security-enabled-default=True >>>>>> [PROVIDER] >>>>>> >>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>> >>>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>>> break everything? >>>>>> >>>>>> >>>>> Changing the file should improve, but better create a backup >>>>> into another diretory before modification. >>>>> The only required change is >>>>> from >>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>> to >>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>> >>>>> >>>>> >>>>> >>>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Dominik >>>>>>>> >>>>>>>> That immediately fixed the geneve tunnels between all hosts. >>>>>>>> >>>>>>>> >>>>>>> thanks for the feedback. >>>>>>> >>>>>>> >>>>>>>> However, the ovn provider is not broken. >>>>>>>> After fixing the networks we tried to move a VM to the >>>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>>> dc01-node01. >>>>>>>> >>>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>>> below: >>>>>>>> Failed to synchronize networks of Provider ovirt-provider-ovn. >>>>>>>> >>>>>>>> The ovn-provider configure on the engine is the below: >>>>>>>> Name: ovirt-provider-ovn >>>>>>>> Description: oVirt network provider for OVN >>>>>>>> Type: External Network Provider >>>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>>> Automatic Synchronization: Checked >>>>>>>> Unmanaged: Unchecked >>>>>>>> Provider URL: http:localhost:9696 >>>>>>>> Requires Authentication: Checked >>>>>>>> Username: admin@internal >>>>>>>> Password: "The admin password" >>>>>>>> Protocol: hTTP >>>>>>>> Host Name: dc02-ovirt01 >>>>>>>> API Port: 35357 >>>>>>>> API Version: v2.0 >>>>>>>> Tenant Name: "Empty" >>>>>>>> >>>>>>>> In the past this was deleted by an engineer and recreated as >>>>>>>> per the documentation, and it worked. Do we need to update something due to >>>>>>>> the SSL on the ovn? >>>>>>>> >>>>>>>> >>>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>>> engine-setup should have created one. >>>>>>> If the file is missing, for testing purposes, you can create a >>>>>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>>>>> [PROVIDER] >>>>>>> provider-host=REPLACE_WITH_FQDN >>>>>>> [SSL] >>>>>>> >>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>> >>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>> https-enabled=true >>>>>>> [OVN REMOTE] >>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>> [AUTH] >>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>> [NETWORK] >>>>>>> port-security-enabled-default=True >>>>>>> >>>>>>> and restart the ovirt-provider-ovn service. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> From the ovn-provider logs the below is generated after a >>>>>>>> service restart and when the start VM is triggered >>>>>>>> >>>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>>> Request: GET /v2.0/ports >>>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from >>>>>>>> tcp:127.0.0.1:6641: Unknown error -1 >>>>>>>> Traceback (most recent call last): >>>>>>>> File >>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>>> _handle_request >>>>>>>> method, path_parts, content >>>>>>>> File >>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>>> handle_request >>>>>>>> return self.call_response_handler(handler, content, >>>>>>>> parameters) >>>>>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", >>>>>>>> line 35, in call_response_handler >>>>>>>> with NeutronApi() as ovn_north: >>>>>>>> File >>>>>>>> "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ >>>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", >>>>>>>> line 46, in connect >>>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>>> line 127, in from_server >>>>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>>>> schema_name) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>>> line 128, in get_schema_helper >>>>>>>> 'err': os.strerror(err)}) >>>>>>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>>>>>> Unknown error -1 >>>>>>>> >>>>>>>> >>>>>>>> When i update the ovn provider from the GUI to have >>>>>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>>>>> fails. >>>>>>>> >>>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Dominik >>>>>>>>>> >>>>>>>>>> When these commands are used on the ovirt-engine host the >>>>>>>>>> output is the one depicted in your email. >>>>>>>>>> For your reference see also below: >>>>>>>>>> >>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> Bootstrap: false >>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>>> ptcp:6641 >>>>>>>>>> >>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> Bootstrap: false >>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>>> >>>>>>>>>> >>>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>>> configured to use plain TCP without ssl. >>>>>>>>> engine-setup usually configures ovn-central to use SSL. That >>>>>>>>> the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>>> that engine-setup was triggered correctly. Looks like the >>>>>>>>> ovn db was dropped somehow, this should not happen. >>>>>>>>> This can be fixed manually by executing the following >>>>>>>>> commands on engine's machine: >>>>>>>>> ovn-nbctl set-ssl >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>>> ovn-sbctl set-ssl >>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>>> >>>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>> >>>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>>> following happens: >>>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: >>>>>>>>>> database connection failed (No such file or directory) >>>>>>>>>> The above i believe is expected since no northbound >>>>>>>>>> connections should be established from the host nodes. >>>>>>>>>> >>>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>>> The output is stuck till i terminate it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, the ovn-* commands works only on engine's machine, >>>>>>>>> which has the role ovn-central. >>>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>>> >>>>>>>>> >>>>>>>>>> For the requested logs the below are found in the >>>>>>>>>> ovsdb-server-sb.log >>>>>>>>>> >>>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>> rate >>>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> How can we fix these SSL errors? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I addressed this above. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I thought vdsm did the certificate provisioning on the host >>>>>>>>>> nodes as to communicate to the engine host node. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>>> configuration on the ovn-central was lost. >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>>> >>>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>>> >>>>>>>>>>> Can you please check the output of >>>>>>>>>>> >>>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>> >>>>>>>>>>> it should be similar to >>>>>>>>>>> >>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> Bootstrap: false >>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>>> pssl:6641:[::] >>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> Bootstrap: false >>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I did a restart of the ovn-controller, this is the output >>>>>>>>>>>> of the ovn-controller.log >>>>>>>>>>>> >>>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>> connecting... >>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>> connected >>>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connecting... >>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connecting... >>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connecting... >>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connecting... >>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>>>>> OVIRTMGMT_NETWORK_DC >>>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>>>> >>>>>>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>>> DC02-host01. >>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>>> Encap geneve >>>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>>> Encap geneve >>>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>> >>>>>>>>>>>> I've re-done the vdsm-tool command and nothing >>>>>>>>>>>> changed.... again....with the same errors as the systemctl restart >>>>>>>>>>>> ovn-controller >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>>>>>> the knowledge and experience with the community! >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>>> DC01 >>>>>>>>>>>>>> node01 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>> geneve >>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>> >>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>>> >>>>>>>>>>>>>> node02 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>> geneve >>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>> >>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>>> >>>>>>>>>>>>>> DC02 >>>>>>>>>>>>>> node01 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>> geneve >>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>> >>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Looks good. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks and >>>>>>>>>>>>>> VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>>> messages? >>>>>>>>>>>>> >>>>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>>>> hosts by executing >>>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>>> internally. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> So I would expect to see the same output for node01 to >>>>>>>>>>>>>> have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Me too. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OVN is selected as the default network provider on >>>>>>>>>>>>>>>> the clusters and the hosts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>>> This configuration is required already during the host >>>>>>>>>>>>>>> is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When executed on the nodes the same command >>>>>>>>>>>>>>>> (ovn-sbctl show) times-out on all nodes..... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the >>>>>>>>>>>>>>> failing host dc01-node1? >>>>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 >>>>>>>>>>>>>>>>>> node and start from scratch. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To trigger the automatic configuration of the host, >>>>>>>>>>>>>>>>> it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Even though the installation completed successfully >>>>>>>>>>>>>>>>>> and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 >>>>>>>>>>>>>>>>>> for the ovs db. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>

Hi Dominik The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP. [root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0" Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there. Thank you Best Regards Konstantinos Betsis On Wed, Sep 16, 2020, 13:25 Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 16, 2020 at 12:15 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....
If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Maybe you could ensure that there is no entry anymore during dc01-host02 is reinstalling, before the host is added to oVirt again?
Do you agree with this?
Sounds good, but OVN should be not the reason to reinstall.
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> So a new test-net was created under DC01 and was depicted in the > networks tab under both DC01 and DC02. > I believe for some reason networks are duplicated in DCs, maybe for > future use??? Don't know. > If one tries to delete the network from the other DC it gets an > error, while if deleted from the once initially created it gets deleted > from both. > > In oVirt a logical network is an entity in a data center. If the automatic synchronization is enabled on the ovirt-provider-ovn entity in oVirt Engine, the OVN networks are reflected to all data centers. If you do not like this, you can disable the automatic synchronization of the ovirt-provider-ovn in Admin Portal.
> From the DC01-node02 i get the following errors: > > 2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force > recompute next time. > 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport > 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. > 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: > Claiming 56:6f:77:61:00:06 > 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport > 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. > 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: > Claiming 56:6f:77:61:00:03 > 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport > b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. > 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: > Claiming 56:6f:77:61:00:15 > 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport > b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. > 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: > Claiming 56:6f:77:61:00:0d > 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport > 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. > 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: > Claiming 56:6f:77:61:00:02 > 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport > 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. > 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: > Claiming 56:6f:77:61:00:1c > 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force > recompute next time. > 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport > 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. > 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: > Claiming 56:6f:77:61:00:06 > 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport > 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. > 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: > Claiming 56:6f:77:61:00:03 > 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport > b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. > 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: > Claiming 56:6f:77:61:00:15 > 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport > b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. > 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: > Claiming 56:6f:77:61:00:0d > 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport > 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. > 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: > Claiming 56:6f:77:61:00:02 > 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport > 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. > 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: > Claiming 56:6f:77:61:00:1c > > > And this repeats forever. > > Looks like the southbound db is confused.
Can you try to delete all chassis listed by sudo ovn-sbctl show via sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh dev-host0 ? if the script remove_chassis.sh is not installed, you can use
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... instead.
Can you please also share the output of ovs-vsctl list Interface on the host which produced the logfile above?
> The connections to ovn-sbctl is ok and the geneve tunnels are > depicted under ovs-vsctl ok. > VMs still not able to ping each other. > > On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> Hi Dominik >>> >>> Fixed the issue. >>> >> >> Thanks. >> >> >>> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>> needed update also. >>> The package is upgraded to the latest version. >>> >>> Once the provider was updated with the following it functioned >>> perfectly: >>> >>> Name: ovirt-provider-ovn >>> Description: oVirt network provider for OVN >>> Type: External Network Provider >>> Network Plugin: oVirt Network Provider for OVN >>> Automatic Synchronization: Checked >>> Unmanaged: Unchecked >>> Provider URL: https:dc02-ovirt01.testdomain.com:9696 >>> Requires Authentication: Checked >>> Username: admin@internal >>> Password: "The admin password" >>> Protocol: HTTPS >>> Host Name: dc02-ovirt01.testdomain.com >>> API Port: 35357 >>> API Version: v2.0 >>> Tenant Name: "Empty" >>> >>> For some reason the TLS certificate was in conflict with the ovn >>> provider details, i would bet the "host" entry. >>> >>> So now geneve tunnels are established. >>> OVN provider is working. >>> >>> But VMs still do not communicated on the same VM network spanning >>> different hosts. >>> >>> So if we have a VM network test-net on both dc01-host01 and >>> dc01-host02 and each host has a VM with IP addresses on the same network, >>> VMs on the same VM network should communicate directly. >>> But traffic does not reach each other. >>> >>> >> Can you create a new external network, with port security disabled, >> and an IPv4 subnet? >> If the VMs get an IP address via DHCP, ovn is working, and should >> be able to ping each other, too. >> If not, there should be a helpful entry in the ovn-controller.log >> of the host the VM is running. >> >> >>> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> Can you try again with: >>>> >>>> [OVN REMOTE] >>>> ovn-remote=ssl:127.0.0.1:6641 >>>> [SSL] >>>> https-enabled=false >>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>> >>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>> [OVIRT] >>>> ovirt-sso-client-secret=*random_test* >>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>> <https://dc02-ovirt01.testdomain.com/> >>>> ovirt-sso-client-id=ovirt-provider-ovn >>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>> [NETWORK] >>>> port-security-enabled-default=True >>>> [PROVIDER] >>>> >>>> provider-host=dc02-ovirt01.testdomain.com >>>> >>>> >>>> >>>> Please note that the should match the HTTP or HTTPS in the of the >>>> ovirt-prover-ovn configuration in oVirt Engine. >>>> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >>>> config file should use >>>> https-enabled=false >>>> >>>> >>>> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> This is the updated one: >>>>> >>>>> # This file is automatically generated by engine-setup. Please >>>>> do not edit manually >>>>> [OVN REMOTE] >>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>> [SSL] >>>>> https-enabled=true >>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>> >>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>> [OVIRT] >>>>> ovirt-sso-client-secret=*random_text* >>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>> [NETWORK] >>>>> port-security-enabled-default=True >>>>> [PROVIDER] >>>>> provider-host=dc02-ovirt01.testdomain.com >>>>> [AUTH] >>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>> >>>>> >>>>> However, it still does not connect. >>>>> It prompts for the certificate but then fails and prompts to see >>>>> the log but the ovirt-provider-ovn.log does not list anything. >>>>> >>>>> Yes we've got ovirt for about a year now from about version 4.1 >>>>> >>>>> >>>> This might explain the trouble. Upgrade of ovirt-provider-ovn >>>> should work flawlessly starting from oVirt 4.2. >>>> >>>> >>>>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> There is a file with the below entries >>>>>>> >>>>>> >>>>>> Impressive, do you know when this config file was created and >>>>>> if it was manually modified? >>>>>> Is this an upgrade from oVirt 4.1? >>>>>> >>>>>> >>>>>>> [root@dc02-ovirt01 log]# cat >>>>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>>> # This file is automatically generated by engine-setup. Please >>>>>>> do not edit manually >>>>>>> [OVN REMOTE] >>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>> [SSL] >>>>>>> https-enabled=false >>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>> >>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>> >>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>> [OVIRT] >>>>>>> ovirt-sso-client-secret=*random_test* >>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>> [NETWORK] >>>>>>> port-security-enabled-default=True >>>>>>> [PROVIDER] >>>>>>> >>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>> >>>>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>>>> break everything? >>>>>>> >>>>>>> >>>>>> Changing the file should improve, but better create a backup >>>>>> into another diretory before modification. >>>>>> The only required change is >>>>>> from >>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>> to >>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Dominik >>>>>>>>> >>>>>>>>> That immediately fixed the geneve tunnels between all hosts. >>>>>>>>> >>>>>>>>> >>>>>>>> thanks for the feedback. >>>>>>>> >>>>>>>> >>>>>>>>> However, the ovn provider is not broken. >>>>>>>>> After fixing the networks we tried to move a VM to the >>>>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>>>> dc01-node01. >>>>>>>>> >>>>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>>>> below: >>>>>>>>> Failed to synchronize networks of Provider >>>>>>>>> ovirt-provider-ovn. >>>>>>>>> >>>>>>>>> The ovn-provider configure on the engine is the below: >>>>>>>>> Name: ovirt-provider-ovn >>>>>>>>> Description: oVirt network provider for OVN >>>>>>>>> Type: External Network Provider >>>>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>>>> Automatic Synchronization: Checked >>>>>>>>> Unmanaged: Unchecked >>>>>>>>> Provider URL: http:localhost:9696 >>>>>>>>> Requires Authentication: Checked >>>>>>>>> Username: admin@internal >>>>>>>>> Password: "The admin password" >>>>>>>>> Protocol: hTTP >>>>>>>>> Host Name: dc02-ovirt01 >>>>>>>>> API Port: 35357 >>>>>>>>> API Version: v2.0 >>>>>>>>> Tenant Name: "Empty" >>>>>>>>> >>>>>>>>> In the past this was deleted by an engineer and recreated as >>>>>>>>> per the documentation, and it worked. Do we need to update something due to >>>>>>>>> the SSL on the ovn? >>>>>>>>> >>>>>>>>> >>>>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>>>> engine-setup should have created one. >>>>>>>> If the file is missing, for testing purposes, you can create >>>>>>>> a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf >>>>>>>> : >>>>>>>> [PROVIDER] >>>>>>>> provider-host=REPLACE_WITH_FQDN >>>>>>>> [SSL] >>>>>>>> >>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>> >>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>> https-enabled=true >>>>>>>> [OVN REMOTE] >>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>> [AUTH] >>>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>>> [NETWORK] >>>>>>>> port-security-enabled-default=True >>>>>>>> >>>>>>>> and restart the ovirt-provider-ovn service. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> From the ovn-provider logs the below is generated after a >>>>>>>>> service restart and when the start VM is triggered >>>>>>>>> >>>>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>>>> Request: GET /v2.0/ports >>>>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from >>>>>>>>> tcp:127.0.0.1:6641: Unknown error -1 >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File >>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>>>> _handle_request >>>>>>>>> method, path_parts, content >>>>>>>>> File >>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>>>> handle_request >>>>>>>>> return self.call_response_handler(handler, content, >>>>>>>>> parameters) >>>>>>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", >>>>>>>>> line 35, in call_response_handler >>>>>>>>> with NeutronApi() as ovn_north: >>>>>>>>> File >>>>>>>>> "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ >>>>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", >>>>>>>>> line 46, in connect >>>>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>>>> line 127, in from_server >>>>>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>>>>> schema_name) >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>>>> line 128, in get_schema_helper >>>>>>>>> 'err': os.strerror(err)}) >>>>>>>>> Exception: Could not retrieve schema from tcp:127.0.0.1:6641: >>>>>>>>> Unknown error -1 >>>>>>>>> >>>>>>>>> >>>>>>>>> When i update the ovn provider from the GUI to have >>>>>>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>>>>>> fails. >>>>>>>>> >>>>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dominik >>>>>>>>>>> >>>>>>>>>>> When these commands are used on the ovirt-engine host the >>>>>>>>>>> output is the one depicted in your email. >>>>>>>>>>> For your reference see also below: >>>>>>>>>>> >>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> Bootstrap: false >>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>>>> ptcp:6641 >>>>>>>>>>> >>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> Bootstrap: false >>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>>>> configured to use plain TCP without ssl. >>>>>>>>>> engine-setup usually configures ovn-central to use SSL. >>>>>>>>>> That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>>>> that engine-setup was triggered correctly. Looks like the >>>>>>>>>> ovn db was dropped somehow, this should not happen. >>>>>>>>>> This can be fixed manually by executing the following >>>>>>>>>> commands on engine's machine: >>>>>>>>>> ovn-nbctl set-ssl >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>>>> ovn-sbctl set-ssl >>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>>>> >>>>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>> >>>>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>>>> following happens: >>>>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: >>>>>>>>>>> database connection failed (No such file or directory) >>>>>>>>>>> The above i believe is expected since no northbound >>>>>>>>>>> connections should be established from the host nodes. >>>>>>>>>>> >>>>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>>>> The output is stuck till i terminate it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, the ovn-* commands works only on engine's machine, >>>>>>>>>> which has the role ovn-central. >>>>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> For the requested logs the below are found in the >>>>>>>>>>> ovsdb-server-sb.log >>>>>>>>>>> >>>>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log >>>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log >>>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log >>>>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log >>>>>>>>>>> messages in last 8 seconds (most recently, 1 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log >>>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log >>>>>>>>>>> messages in last 12 seconds (most recently, 4 seconds ago) due to excessive >>>>>>>>>>> rate >>>>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> How can we fix these SSL errors? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I addressed this above. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I thought vdsm did the certificate provisioning on the >>>>>>>>>>> host nodes as to communicate to the engine host node. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>>>> configuration on the ovn-central was lost. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>>>> >>>>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>>>> >>>>>>>>>>>> Can you please check the output of >>>>>>>>>>>> >>>>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>> >>>>>>>>>>>> it should be similar to >>>>>>>>>>>> >>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>>>> pssl:6641:[::] >>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>>>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I did a restart of the ovn-controller, this is the >>>>>>>>>>>>> output of the ovn-controller.log >>>>>>>>>>>>> >>>>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>>>>>>>>>>>> /var/log/openvswitch/ovn-controller.log >>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>> connecting... >>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>> connected >>>>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connecting... >>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connecting... >>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connecting... >>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connecting... >>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I have also done the vdsm-tool ovn-config >>>>>>>>>>>>> OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC >>>>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>>>>> >>>>>>>>>>>>> After the restart of the ovn-controller the OVIRT ENGINE >>>>>>>>>>>>> still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>>>> DC02-host01. >>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>> >>>>>>>>>>>>> I've re-done the vdsm-tool command and nothing >>>>>>>>>>>>> changed.... again....with the same errors as the systemctl restart >>>>>>>>>>>>> ovn-controller >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Please include ovirt-users list in your reply, to share >>>>>>>>>>>>>> the knowledge and experience with the community! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>>>> DC01 >>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> node02 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> DC02 >>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Looks good. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks and >>>>>>>>>>>>>>> VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>>>> messages? >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>>>>> hosts by executing >>>>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>>>> internally. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I would expect to see the same output for node01 to >>>>>>>>>>>>>>> have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Me too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> OVN is selected as the default network provider on >>>>>>>>>>>>>>>>> the clusters and the hosts. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>>>> This configuration is required already during the >>>>>>>>>>>>>>>> host is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When executed on the nodes the same command >>>>>>>>>>>>>>>>> (ovn-sbctl show) times-out on all nodes..... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the >>>>>>>>>>>>>>>> failing host dc01-node1? >>>>>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 >>>>>>>>>>>>>>>>>>> node and start from scratch. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To trigger the automatic configuration of the host, >>>>>>>>>>>>>>>>>> it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Even though the installation completed >>>>>>>>>>>>>>>>>>> successfully and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with the >>>>>>>>>>>>>>>>>>> other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 >>>>>>>>>>>>>>>>>>> for the ovs db. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>

On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Thank you Best Regards Konstantinos Betsis
On Wed, Sep 16, 2020, 13:25 Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 16, 2020 at 12:15 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....
If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Maybe you could ensure that there is no entry anymore during dc01-host02 is reinstalling, before the host is added to oVirt again?
Do you agree with this?
Sounds good, but OVN should be not the reason to reinstall.
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
Below is the output of the ovs-vsctl list interface
_uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:02", iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 34 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:02" mtu : 1442 mtu_request : [] name : "vnet6" ofport : 2 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=10828495, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, tx_packets=106954} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : bad80911-3993-4085-a0b0-962b6c9156cd admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "fe:37:52:c4:cb:03" mtu : [] mtu_request : [] name : "ovn-c4b238-0" ofport : 7 ofport_request : [] options : {csum="true", key=flow, remote_ip="192.168.121.164"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:0d", iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} ifindex : 28 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:0d" mtu : 1442 mtu_request : [] name : "vnet0" ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=20609787, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, tx_packets=117735} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:10", iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} ifindex : 40 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:10" mtu : 1442 mtu_request : [] name : "vnet11" ofport : 10 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:15", iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 37 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:15" mtu : 1442 mtu_request : [] name : "vnet9" ofport : 5 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:17", iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} ifindex : 41 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:17" mtu : 1442 mtu_request : [] name : "vnet12" ofport : 11 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, tx_packets=51018} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 7 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "32:0a:69:67:07:4f" mtu : 1442 mtu_request : [] name : br-int ofport : 65534 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal
_uuid : e2114584-8ceb-43d6-817b-e457738ead8a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:03", iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 35 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:03" mtu : 1442 mtu_request : [] name : "vnet7" ofport : 3 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : ee16943e-d145-4080-893f-464098a6388f admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1e:50:3f:a8:42:d1" mtu : [] mtu_request : [] name : "ovn-be3abc-0" ofport : 8 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host02"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 86a229be-373e-4c43-b2f1-6190523ed73a admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:1c", iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 38 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:1c" mtu : 1442 mtu_request : [] name : "vnet10" ofport : 6 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
_uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 39 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "7a:28:24:eb:ec:d2" mtu : [] mtu_request : [] name : "ovn-95ccb0-0" ofport : 9 ofport_request : [] options : {csum="true", key=flow, remote_ip="DC01-host01"} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, tx_packets=224029} status : {tunnel_egress_iface="ovirtmgmt-ams03", tunnel_egress_iface_carrier=up} type : geneve
_uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : full error : [] external_ids : {attached-mac="56:6f:77:61:00:06", iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} ifindex : 36 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : 10000000 link_state : up lldp : {} mac : [] mac_in_use : "fe:6f:77:61:00:06" mtu : 1442 mtu_request : [] name : "vnet8" ofport : 4 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} status : {driver_name=tun, driver_version="1.6", firmware_version=""} type : ""
I've identified which VMs have these MAC addresses but i do not see any "conflict" with any other VM's MAC address.
I really do not understand why these will create a conflict.
On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> wrote:
> > > On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> So a new test-net was created under DC01 and was depicted in the >> networks tab under both DC01 and DC02. >> I believe for some reason networks are duplicated in DCs, maybe for >> future use??? Don't know. >> If one tries to delete the network from the other DC it gets an >> error, while if deleted from the once initially created it gets deleted >> from both. >> >> > In oVirt a logical network is an entity in a data center. If the > automatic synchronization is enabled on the ovirt-provider-ovn entity in > oVirt Engine, the OVN networks are reflected to all data centers. If you do > not like this, you can disable the automatic synchronization of the > ovirt-provider-ovn in Admin Portal. > > >> From the DC01-node02 i get the following errors: >> >> 2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, force >> recompute next time. >> 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport >> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >> 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >> Claiming 56:6f:77:61:00:06 >> 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport >> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >> 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >> Claiming 56:6f:77:61:00:03 >> 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport >> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >> 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >> Claiming 56:6f:77:61:00:15 >> 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport >> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >> 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >> Claiming 56:6f:77:61:00:0d >> 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport >> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >> 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >> Claiming 56:6f:77:61:00:02 >> 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport >> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >> 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >> Claiming 56:6f:77:61:00:1c >> 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, force >> recompute next time. >> 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport >> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >> 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >> Claiming 56:6f:77:61:00:06 >> 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport >> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >> 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >> Claiming 56:6f:77:61:00:03 >> 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport >> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >> 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >> Claiming 56:6f:77:61:00:15 >> 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport >> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >> 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >> Claiming 56:6f:77:61:00:0d >> 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport >> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >> 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >> Claiming 56:6f:77:61:00:02 >> 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport >> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >> 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >> Claiming 56:6f:77:61:00:1c >> >> >> And this repeats forever. >> >> > Looks like the southbound db is confused. > > Can you try to delete all chassis listed by > sudo ovn-sbctl show > via > sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh > dev-host0 > ? > if the script remove_chassis.sh is not installed, you can use > > https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... > instead. > > Can you please also share the output of > ovs-vsctl list Interface > on the host which produced the logfile above? > > > > >> The connections to ovn-sbctl is ok and the geneve tunnels are >> depicted under ovs-vsctl ok. >> VMs still not able to ping each other. >> >> On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> Hi Dominik >>>> >>>> Fixed the issue. >>>> >>> >>> Thanks. >>> >>> >>>> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>> needed update also. >>>> The package is upgraded to the latest version. >>>> >>>> Once the provider was updated with the following it functioned >>>> perfectly: >>>> >>>> Name: ovirt-provider-ovn >>>> Description: oVirt network provider for OVN >>>> Type: External Network Provider >>>> Network Plugin: oVirt Network Provider for OVN >>>> Automatic Synchronization: Checked >>>> Unmanaged: Unchecked >>>> Provider URL: https:dc02-ovirt01.testdomain.com:9696 >>>> Requires Authentication: Checked >>>> Username: admin@internal >>>> Password: "The admin password" >>>> Protocol: HTTPS >>>> Host Name: dc02-ovirt01.testdomain.com >>>> API Port: 35357 >>>> API Version: v2.0 >>>> Tenant Name: "Empty" >>>> >>>> For some reason the TLS certificate was in conflict with the ovn >>>> provider details, i would bet the "host" entry. >>>> >>>> So now geneve tunnels are established. >>>> OVN provider is working. >>>> >>>> But VMs still do not communicated on the same VM network spanning >>>> different hosts. >>>> >>>> So if we have a VM network test-net on both dc01-host01 and >>>> dc01-host02 and each host has a VM with IP addresses on the same network, >>>> VMs on the same VM network should communicate directly. >>>> But traffic does not reach each other. >>>> >>>> >>> Can you create a new external network, with port security >>> disabled, and an IPv4 subnet? >>> If the VMs get an IP address via DHCP, ovn is working, and should >>> be able to ping each other, too. >>> If not, there should be a helpful entry in the ovn-controller.log >>> of the host the VM is running. >>> >>> >>>> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> Can you try again with: >>>>> >>>>> [OVN REMOTE] >>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>> [SSL] >>>>> https-enabled=false >>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>> >>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>> [OVIRT] >>>>> ovirt-sso-client-secret=*random_test* >>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>> <https://dc02-ovirt01.testdomain.com/> >>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>> [NETWORK] >>>>> port-security-enabled-default=True >>>>> [PROVIDER] >>>>> >>>>> provider-host=dc02-ovirt01.testdomain.com >>>>> >>>>> >>>>> >>>>> Please note that the should match the HTTP or HTTPS in the of >>>>> the ovirt-prover-ovn configuration in oVirt Engine. >>>>> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >>>>> config file should use >>>>> https-enabled=false >>>>> >>>>> >>>>> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> This is the updated one: >>>>>> >>>>>> # This file is automatically generated by engine-setup. Please >>>>>> do not edit manually >>>>>> [OVN REMOTE] >>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>> [SSL] >>>>>> https-enabled=true >>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>> >>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>> [OVIRT] >>>>>> ovirt-sso-client-secret=*random_text* >>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>> [NETWORK] >>>>>> port-security-enabled-default=True >>>>>> [PROVIDER] >>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>> [AUTH] >>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>> >>>>>> >>>>>> However, it still does not connect. >>>>>> It prompts for the certificate but then fails and prompts to >>>>>> see the log but the ovirt-provider-ovn.log does not list anything. >>>>>> >>>>>> Yes we've got ovirt for about a year now from about version 4.1 >>>>>> >>>>>> >>>>> This might explain the trouble. Upgrade of ovirt-provider-ovn >>>>> should work flawlessly starting from oVirt 4.2. >>>>> >>>>> >>>>>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> There is a file with the below entries >>>>>>>> >>>>>>> >>>>>>> Impressive, do you know when this config file was created and >>>>>>> if it was manually modified? >>>>>>> Is this an upgrade from oVirt 4.1? >>>>>>> >>>>>>> >>>>>>>> [root@dc02-ovirt01 log]# cat >>>>>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>>>> # This file is automatically generated by engine-setup. >>>>>>>> Please do not edit manually >>>>>>>> [OVN REMOTE] >>>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>>> [SSL] >>>>>>>> https-enabled=false >>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>> >>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>> >>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>> [OVIRT] >>>>>>>> ovirt-sso-client-secret=*random_test* >>>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>>> [NETWORK] >>>>>>>> port-security-enabled-default=True >>>>>>>> [PROVIDER] >>>>>>>> >>>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>>> >>>>>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>>>>> break everything? >>>>>>>> >>>>>>>> >>>>>>> Changing the file should improve, but better create a backup >>>>>>> into another diretory before modification. >>>>>>> The only required change is >>>>>>> from >>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>> to >>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Dominik >>>>>>>>>> >>>>>>>>>> That immediately fixed the geneve tunnels between all hosts. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> thanks for the feedback. >>>>>>>>> >>>>>>>>> >>>>>>>>>> However, the ovn provider is not broken. >>>>>>>>>> After fixing the networks we tried to move a VM to the >>>>>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>>>>> dc01-node01. >>>>>>>>>> >>>>>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>>>>> below: >>>>>>>>>> Failed to synchronize networks of Provider >>>>>>>>>> ovirt-provider-ovn. >>>>>>>>>> >>>>>>>>>> The ovn-provider configure on the engine is the below: >>>>>>>>>> Name: ovirt-provider-ovn >>>>>>>>>> Description: oVirt network provider for OVN >>>>>>>>>> Type: External Network Provider >>>>>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>>>>> Automatic Synchronization: Checked >>>>>>>>>> Unmanaged: Unchecked >>>>>>>>>> Provider URL: http:localhost:9696 >>>>>>>>>> Requires Authentication: Checked >>>>>>>>>> Username: admin@internal >>>>>>>>>> Password: "The admin password" >>>>>>>>>> Protocol: hTTP >>>>>>>>>> Host Name: dc02-ovirt01 >>>>>>>>>> API Port: 35357 >>>>>>>>>> API Version: v2.0 >>>>>>>>>> Tenant Name: "Empty" >>>>>>>>>> >>>>>>>>>> In the past this was deleted by an engineer and recreated >>>>>>>>>> as per the documentation, and it worked. Do we need to update something due >>>>>>>>>> to the SSL on the ovn? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>>>>> engine-setup should have created one. >>>>>>>>> If the file is missing, for testing purposes, you can create >>>>>>>>> a file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf >>>>>>>>> : >>>>>>>>> [PROVIDER] >>>>>>>>> provider-host=REPLACE_WITH_FQDN >>>>>>>>> [SSL] >>>>>>>>> >>>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>>> >>>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>>> https-enabled=true >>>>>>>>> [OVN REMOTE] >>>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>>> [AUTH] >>>>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>>>> [NETWORK] >>>>>>>>> port-security-enabled-default=True >>>>>>>>> >>>>>>>>> and restart the ovirt-provider-ovn service. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> From the ovn-provider logs the below is generated after a >>>>>>>>>> service restart and when the start VM is triggered >>>>>>>>>> >>>>>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>>>>> Request: GET /v2.0/ports >>>>>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema from >>>>>>>>>> tcp:127.0.0.1:6641: Unknown error -1 >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File >>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>>>>> _handle_request >>>>>>>>>> method, path_parts, content >>>>>>>>>> File >>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>>>>> handle_request >>>>>>>>>> return self.call_response_handler(handler, content, >>>>>>>>>> parameters) >>>>>>>>>> File "/usr/share/ovirt-provider-ovn/handlers/neutron.py", >>>>>>>>>> line 35, in call_response_handler >>>>>>>>>> with NeutronApi() as ovn_north: >>>>>>>>>> File >>>>>>>>>> "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ >>>>>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", >>>>>>>>>> line 46, in connect >>>>>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>>>>> line 127, in from_server >>>>>>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>>>>>> schema_name) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>>>>> line 128, in get_schema_helper >>>>>>>>>> 'err': os.strerror(err)}) >>>>>>>>>> Exception: Could not retrieve schema from tcp: >>>>>>>>>> 127.0.0.1:6641: Unknown error -1 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> When i update the ovn provider from the GUI to have >>>>>>>>>> https://localhost:9696/ and HTTPS as the protocol the test >>>>>>>>>> fails. >>>>>>>>>> >>>>>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Dominik >>>>>>>>>>>> >>>>>>>>>>>> When these commands are used on the ovirt-engine host the >>>>>>>>>>>> output is the one depicted in your email. >>>>>>>>>>>> For your reference see also below: >>>>>>>>>>>> >>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>>>>> ptcp:6641 >>>>>>>>>>>> >>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>>>>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>>>>> configured to use plain TCP without ssl. >>>>>>>>>>> engine-setup usually configures ovn-central to use SSL. >>>>>>>>>>> That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>>>>> that engine-setup was triggered correctly. Looks like the >>>>>>>>>>> ovn db was dropped somehow, this should not happen. >>>>>>>>>>> This can be fixed manually by executing the following >>>>>>>>>>> commands on engine's machine: >>>>>>>>>>> ovn-nbctl set-ssl >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>>>>> ovn-sbctl set-ssl >>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>>>>> >>>>>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>> >>>>>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>>>>> following happens: >>>>>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: >>>>>>>>>>>> database connection failed (No such file or directory) >>>>>>>>>>>> The above i believe is expected since no northbound >>>>>>>>>>>> connections should be established from the host nodes. >>>>>>>>>>>> >>>>>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>>>>> The output is stuck till i terminate it. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, the ovn-* commands works only on engine's machine, >>>>>>>>>>> which has the role ovn-central. >>>>>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> For the requested logs the below are found in the >>>>>>>>>>>> ovsdb-server-sb.log >>>>>>>>>>>> >>>>>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>> excessive rate >>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> How can we fix these SSL errors? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I addressed this above. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I thought vdsm did the certificate provisioning on the >>>>>>>>>>>> host nodes as to communicate to the engine host node. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>>>>> configuration on the ovn-central was lost. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>>>>> >>>>>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please check the output of >>>>>>>>>>>>> >>>>>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>> >>>>>>>>>>>>> it should be similar to >>>>>>>>>>>>> >>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>>>>> Private key: >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>>>>> pssl:6641:[::] >>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>>>>> Private key: >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>>>>> [root@ovirt-43 ~]# ls -l >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I did a restart of the ovn-controller, this is the >>>>>>>>>>>>>> output of the ovn-controller.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log >>>>>>>>>>>>>> file /var/log/openvswitch/ovn-controller.log >>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>> connected >>>>>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have also done the vdsm-tool ovn-config >>>>>>>>>>>>>> OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC >>>>>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>>>>>> >>>>>>>>>>>>>> After the restart of the ovn-controller the OVIRT >>>>>>>>>>>>>> ENGINE still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>>>>> DC02-host01. >>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've re-done the vdsm-tool command and nothing >>>>>>>>>>>>>> changed.... again....with the same errors as the systemctl restart >>>>>>>>>>>>>> ovn-controller >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please include ovirt-users list in your reply, to >>>>>>>>>>>>>>> share the knowledge and experience with the community! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>>>>> DC01 >>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> node02 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> DC02 >>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looks good. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks and >>>>>>>>>>>>>>>> VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>>>>> messages? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>>>>>> hosts by executing >>>>>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>>>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>>>>> internally. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So I would expect to see the same output for node01 >>>>>>>>>>>>>>>> to have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Me too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis >>>>>>>>>>>>>>>>> <k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> OVN is selected as the default network provider on >>>>>>>>>>>>>>>>>> the clusters and the hosts. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>>>>> This configuration is required already during the >>>>>>>>>>>>>>>>> host is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> When executed on the nodes the same command >>>>>>>>>>>>>>>>>> (ovn-sbctl show) times-out on all nodes..... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the >>>>>>>>>>>>>>>>> failing host dc01-node1? >>>>>>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 >>>>>>>>>>>>>>>>>>>> node and start from scratch. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To trigger the automatic configuration of the >>>>>>>>>>>>>>>>>>> host, it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Even though the installation completed >>>>>>>>>>>>>>>>>>>> successfully and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with >>>>>>>>>>>>>>>>>>>> the other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 >>>>>>>>>>>>>>>>>>>> for the ovs db. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host >>>>>>>>>>>>>>>>>>> should contain a helpful hint. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>

Sure I've attached it for easier reference. On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Thank you Best Regards Konstantinos Betsis
On Wed, Sep 16, 2020, 13:25 Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 16, 2020 at 12:15 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....
If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Maybe you could ensure that there is no entry anymore during dc01-host02 is reinstalling, before the host is added to oVirt again?
Do you agree with this?
Sounds good, but OVN should be not the reason to reinstall.
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
Hi Dominik
Just saw the below on host dc01-host02
ovs-vsctl show f3b13557-dfb4-45a4-b6af-c995ccf68720 Bridge br-int Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host01"} Port "vnet10" Interface "vnet10" Port "vnet11" Interface "vnet11" Port "vnet0" Interface "vnet0" Port "vnet9" Interface "vnet9" Port "vnet8" Interface "vnet8" Port br-int Interface br-int type: internal Port "vnet12" Interface "vnet12" Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-host02"} Port "vnet7" Interface "vnet7" Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-host01"} Port "vnet6" Interface "vnet6" ovs_version: "2.11.0"
Why would this node establish a geneve tunnel to himself? Other nodes do not exhibit this behavior.
On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > Below is the output of the ovs-vsctl list interface > > _uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:02", > iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, > vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} > ifindex : 34 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:02" > mtu : 1442 > mtu_request : [] > name : "vnet6" > ofport : 2 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=10828495, > rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, > rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, > tx_packets=106954} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : bad80911-3993-4085-a0b0-962b6c9156cd > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : [] > error : [] > external_ids : {} > ifindex : 39 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 0 > link_speed : [] > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:37:52:c4:cb:03" > mtu : [] > mtu_request : [] > name : "ovn-c4b238-0" > ofport : 7 > ofport_request : [] > options : {csum="true", key=flow, > remote_ip="192.168.121.164"} > other_config : {} > statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, > tx_packets=0} > status : {tunnel_egress_iface="ovirtmgmt-ams03", > tunnel_egress_iface_carrier=up} > type : geneve > > _uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:0d", > iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, > vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} > ifindex : 28 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:0d" > mtu : 1442 > mtu_request : [] > name : "vnet0" > ofport : 1 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=20609787, > rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, > rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, > tx_packets=117735} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:10", > iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, > vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} > ifindex : 40 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:10" > mtu : 1442 > mtu_request : [] > name : "vnet11" > ofport : 10 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=3311352, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=51012, > tx_bytes=5514116, tx_dropped=0, tx_errors=0, tx_packets=103456} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:15", > iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, > vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} > ifindex : 37 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:15" > mtu : 1442 > mtu_request : [] > name : "vnet9" > ofport : 5 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, > tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:17", > iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, > vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} > ifindex : 41 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:17" > mtu : 1442 > mtu_request : [] > name : "vnet12" > ofport : 11 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=5513640, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, > rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, > tx_packets=51018} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b > admin_state : down > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : [] > error : [] > external_ids : {} > ifindex : 7 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 0 > link_speed : [] > link_state : down > lldp : {} > mac : [] > mac_in_use : "32:0a:69:67:07:4f" > mtu : 1442 > mtu_request : [] > name : br-int > ofport : 65534 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, > rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, > tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} > status : {driver_name=openvswitch} > type : internal > > _uuid : e2114584-8ceb-43d6-817b-e457738ead8a > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:03", > iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, > vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} > ifindex : 35 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:03" > mtu : 1442 > mtu_request : [] > name : "vnet7" > ofport : 3 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, > tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : ee16943e-d145-4080-893f-464098a6388f > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : [] > error : [] > external_ids : {} > ifindex : 39 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 0 > link_speed : [] > link_state : up > lldp : {} > mac : [] > mac_in_use : "1e:50:3f:a8:42:d1" > mtu : [] > mtu_request : [] > name : "ovn-be3abc-0" > ofport : 8 > ofport_request : [] > options : {csum="true", key=flow, > remote_ip="DC01-host02"} > other_config : {} > statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, > tx_packets=0} > status : {tunnel_egress_iface="ovirtmgmt-ams03", > tunnel_egress_iface_carrier=up} > type : geneve > > _uuid : 86a229be-373e-4c43-b2f1-6190523ed73a > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:1c", > iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, > vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} > ifindex : 38 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:1c" > mtu : 1442 > mtu_request : [] > name : "vnet10" > ofport : 6 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, > tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > _uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : [] > error : [] > external_ids : {} > ifindex : 39 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 0 > link_speed : [] > link_state : up > lldp : {} > mac : [] > mac_in_use : "7a:28:24:eb:ec:d2" > mtu : [] > mtu_request : [] > name : "ovn-95ccb0-0" > ofport : 9 > ofport_request : [] > options : {csum="true", key=flow, > remote_ip="DC01-host01"} > other_config : {} > statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, > tx_packets=224029} > status : {tunnel_egress_iface="ovirtmgmt-ams03", > tunnel_egress_iface_carrier=up} > type : geneve > > _uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d > admin_state : up > bfd : {} > bfd_status : {} > cfm_fault : [] > cfm_fault_status : [] > cfm_flap_count : [] > cfm_health : [] > cfm_mpid : [] > cfm_remote_mpids : [] > cfm_remote_opstate : [] > duplex : full > error : [] > external_ids : {attached-mac="56:6f:77:61:00:06", > iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, > vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} > ifindex : 36 > ingress_policing_burst: 0 > ingress_policing_rate: 0 > lacp_current : [] > link_resets : 1 > link_speed : 10000000 > link_state : up > lldp : {} > mac : [] > mac_in_use : "fe:6f:77:61:00:06" > mtu : 1442 > mtu_request : [] > name : "vnet8" > ofport : 4 > ofport_request : [] > options : {} > other_config : {} > statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, > rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, > tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} > status : {driver_name=tun, driver_version="1.6", > firmware_version=""} > type : "" > > > I've identified which VMs have these MAC addresses but i do not see > any "conflict" with any other VM's MAC address. > > I really do not understand why these will create a conflict. > > On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> > wrote: > >> >> >> On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis < >> k.betsis@gmail.com> wrote: >> >>> So a new test-net was created under DC01 and was depicted in the >>> networks tab under both DC01 and DC02. >>> I believe for some reason networks are duplicated in DCs, maybe >>> for future use??? Don't know. >>> If one tries to delete the network from the other DC it gets an >>> error, while if deleted from the once initially created it gets deleted >>> from both. >>> >>> >> In oVirt a logical network is an entity in a data center. If the >> automatic synchronization is enabled on the ovirt-provider-ovn entity in >> oVirt Engine, the OVN networks are reflected to all data centers. If you do >> not like this, you can disable the automatic synchronization of the >> ovirt-provider-ovn in Admin Portal. >> >> >>> From the DC01-node02 i get the following errors: >>> >>> 2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, >>> force recompute next time. >>> 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport >>> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >>> 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >>> Claiming 56:6f:77:61:00:06 >>> 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport >>> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >>> 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >>> Claiming 56:6f:77:61:00:03 >>> 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport >>> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >>> 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >>> Claiming 56:6f:77:61:00:15 >>> 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport >>> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >>> 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >>> Claiming 56:6f:77:61:00:0d >>> 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport >>> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >>> 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >>> Claiming 56:6f:77:61:00:02 >>> 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport >>> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >>> 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >>> Claiming 56:6f:77:61:00:1c >>> 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, >>> force recompute next time. >>> 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport >>> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >>> 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >>> Claiming 56:6f:77:61:00:06 >>> 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport >>> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >>> 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >>> Claiming 56:6f:77:61:00:03 >>> 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport >>> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >>> 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >>> Claiming 56:6f:77:61:00:15 >>> 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport >>> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >>> 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >>> Claiming 56:6f:77:61:00:0d >>> 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport >>> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >>> 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >>> Claiming 56:6f:77:61:00:02 >>> 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport >>> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >>> 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >>> Claiming 56:6f:77:61:00:1c >>> >>> >>> And this repeats forever. >>> >>> >> Looks like the southbound db is confused. >> >> Can you try to delete all chassis listed by >> sudo ovn-sbctl show >> via >> sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh >> dev-host0 >> ? >> if the script remove_chassis.sh is not installed, you can use >> >> https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... >> instead. >> >> Can you please also share the output of >> ovs-vsctl list Interface >> on the host which produced the logfile above? >> >> >> >> >>> The connections to ovn-sbctl is ok and the geneve tunnels are >>> depicted under ovs-vsctl ok. >>> VMs still not able to ping each other. >>> >>> On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler <dholler@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < >>>> k.betsis@gmail.com> wrote: >>>> >>>>> Hi Dominik >>>>> >>>>> Fixed the issue. >>>>> >>>> >>>> Thanks. >>>> >>>> >>>>> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>> needed update also. >>>>> The package is upgraded to the latest version. >>>>> >>>>> Once the provider was updated with the following it functioned >>>>> perfectly: >>>>> >>>>> Name: ovirt-provider-ovn >>>>> Description: oVirt network provider for OVN >>>>> Type: External Network Provider >>>>> Network Plugin: oVirt Network Provider for OVN >>>>> Automatic Synchronization: Checked >>>>> Unmanaged: Unchecked >>>>> Provider URL: https:dc02-ovirt01.testdomain.com:9696 >>>>> Requires Authentication: Checked >>>>> Username: admin@internal >>>>> Password: "The admin password" >>>>> Protocol: HTTPS >>>>> Host Name: dc02-ovirt01.testdomain.com >>>>> API Port: 35357 >>>>> API Version: v2.0 >>>>> Tenant Name: "Empty" >>>>> >>>>> For some reason the TLS certificate was in conflict with the ovn >>>>> provider details, i would bet the "host" entry. >>>>> >>>>> So now geneve tunnels are established. >>>>> OVN provider is working. >>>>> >>>>> But VMs still do not communicated on the same VM network >>>>> spanning different hosts. >>>>> >>>>> So if we have a VM network test-net on both dc01-host01 and >>>>> dc01-host02 and each host has a VM with IP addresses on the same network, >>>>> VMs on the same VM network should communicate directly. >>>>> But traffic does not reach each other. >>>>> >>>>> >>>> Can you create a new external network, with port security >>>> disabled, and an IPv4 subnet? >>>> If the VMs get an IP address via DHCP, ovn is working, and should >>>> be able to ping each other, too. >>>> If not, there should be a helpful entry in the ovn-controller.log >>>> of the host the VM is running. >>>> >>>> >>>>> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler < >>>>> dholler@redhat.com> wrote: >>>>> >>>>>> Can you try again with: >>>>>> >>>>>> [OVN REMOTE] >>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>> [SSL] >>>>>> https-enabled=false >>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>> >>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>> [OVIRT] >>>>>> ovirt-sso-client-secret=*random_test* >>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>> <https://dc02-ovirt01.testdomain.com/> >>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>> [NETWORK] >>>>>> port-security-enabled-default=True >>>>>> [PROVIDER] >>>>>> >>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>> >>>>>> >>>>>> >>>>>> Please note that the should match the HTTP or HTTPS in the of >>>>>> the ovirt-prover-ovn configuration in oVirt Engine. >>>>>> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >>>>>> config file should use >>>>>> https-enabled=false >>>>>> >>>>>> >>>>>> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >>>>>> k.betsis@gmail.com> wrote: >>>>>> >>>>>>> This is the updated one: >>>>>>> >>>>>>> # This file is automatically generated by engine-setup. Please >>>>>>> do not edit manually >>>>>>> [OVN REMOTE] >>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>> [SSL] >>>>>>> https-enabled=true >>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>> >>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>> >>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>> [OVIRT] >>>>>>> ovirt-sso-client-secret=*random_text* >>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>> [NETWORK] >>>>>>> port-security-enabled-default=True >>>>>>> [PROVIDER] >>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>> [AUTH] >>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>> >>>>>>> >>>>>>> However, it still does not connect. >>>>>>> It prompts for the certificate but then fails and prompts to >>>>>>> see the log but the ovirt-provider-ovn.log does not list anything. >>>>>>> >>>>>>> Yes we've got ovirt for about a year now from about version 4.1 >>>>>>> >>>>>>> >>>>>> This might explain the trouble. Upgrade of ovirt-provider-ovn >>>>>> should work flawlessly starting from oVirt 4.2. >>>>>> >>>>>> >>>>>>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler < >>>>>>> dholler@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>> >>>>>>>>> There is a file with the below entries >>>>>>>>> >>>>>>>> >>>>>>>> Impressive, do you know when this config file was created and >>>>>>>> if it was manually modified? >>>>>>>> Is this an upgrade from oVirt 4.1? >>>>>>>> >>>>>>>> >>>>>>>>> [root@dc02-ovirt01 log]# cat >>>>>>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>>>>> # This file is automatically generated by engine-setup. >>>>>>>>> Please do not edit manually >>>>>>>>> [OVN REMOTE] >>>>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>>>> [SSL] >>>>>>>>> https-enabled=false >>>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>>> >>>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>>> >>>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>>> [OVIRT] >>>>>>>>> ovirt-sso-client-secret=*random_test* >>>>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>>>> [NETWORK] >>>>>>>>> port-security-enabled-default=True >>>>>>>>> [PROVIDER] >>>>>>>>> >>>>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>>>> >>>>>>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>>>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>>>>>> break everything? >>>>>>>>> >>>>>>>>> >>>>>>>> Changing the file should improve, but better create a backup >>>>>>>> into another diretory before modification. >>>>>>>> The only required change is >>>>>>>> from >>>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>>> to >>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dominik >>>>>>>>>>> >>>>>>>>>>> That immediately fixed the geneve tunnels between all >>>>>>>>>>> hosts. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> thanks for the feedback. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> However, the ovn provider is not broken. >>>>>>>>>>> After fixing the networks we tried to move a VM to the >>>>>>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>>>>>> dc01-node01. >>>>>>>>>>> >>>>>>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>>>>>> below: >>>>>>>>>>> Failed to synchronize networks of Provider >>>>>>>>>>> ovirt-provider-ovn. >>>>>>>>>>> >>>>>>>>>>> The ovn-provider configure on the engine is the below: >>>>>>>>>>> Name: ovirt-provider-ovn >>>>>>>>>>> Description: oVirt network provider for OVN >>>>>>>>>>> Type: External Network Provider >>>>>>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>>>>>> Automatic Synchronization: Checked >>>>>>>>>>> Unmanaged: Unchecked >>>>>>>>>>> Provider URL: http:localhost:9696 >>>>>>>>>>> Requires Authentication: Checked >>>>>>>>>>> Username: admin@internal >>>>>>>>>>> Password: "The admin password" >>>>>>>>>>> Protocol: hTTP >>>>>>>>>>> Host Name: dc02-ovirt01 >>>>>>>>>>> API Port: 35357 >>>>>>>>>>> API Version: v2.0 >>>>>>>>>>> Tenant Name: "Empty" >>>>>>>>>>> >>>>>>>>>>> In the past this was deleted by an engineer and recreated >>>>>>>>>>> as per the documentation, and it worked. Do we need to update something due >>>>>>>>>>> to the SSL on the ovn? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>>>>>> engine-setup should have created one. >>>>>>>>>> If the file is missing, for testing purposes, you can >>>>>>>>>> create a >>>>>>>>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>>>>>>>> [PROVIDER] >>>>>>>>>> provider-host=REPLACE_WITH_FQDN >>>>>>>>>> [SSL] >>>>>>>>>> >>>>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>>>> >>>>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>>>> https-enabled=true >>>>>>>>>> [OVN REMOTE] >>>>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>>>> [AUTH] >>>>>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>>>>> [NETWORK] >>>>>>>>>> port-security-enabled-default=True >>>>>>>>>> >>>>>>>>>> and restart the ovirt-provider-ovn service. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> From the ovn-provider logs the below is generated after a >>>>>>>>>>> service restart and when the start VM is triggered >>>>>>>>>>> >>>>>>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>>>>>> Request: GET /v2.0/ports >>>>>>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema >>>>>>>>>>> from tcp:127.0.0.1:6641: Unknown error -1 >>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>> File >>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>>>>>> _handle_request >>>>>>>>>>> method, path_parts, content >>>>>>>>>>> File >>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>>>>>> handle_request >>>>>>>>>>> return self.call_response_handler(handler, content, >>>>>>>>>>> parameters) >>>>>>>>>>> File >>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in >>>>>>>>>>> call_response_handler >>>>>>>>>>> with NeutronApi() as ovn_north: >>>>>>>>>>> File >>>>>>>>>>> "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ >>>>>>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", >>>>>>>>>>> line 46, in connect >>>>>>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>>>>>> line 127, in from_server >>>>>>>>>>> helper = idlutils.get_schema_helper(connection_string, >>>>>>>>>>> schema_name) >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>>>>>> line 128, in get_schema_helper >>>>>>>>>>> 'err': os.strerror(err)}) >>>>>>>>>>> Exception: Could not retrieve schema from tcp: >>>>>>>>>>> 127.0.0.1:6641: Unknown error -1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> When i update the ovn provider from the GUI to have >>>>>>>>>>> https://localhost:9696/ and HTTPS as the protocol the >>>>>>>>>>> test fails. >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>> >>>>>>>>>>>>> When these commands are used on the ovirt-engine host >>>>>>>>>>>>> the output is the one depicted in your email. >>>>>>>>>>>>> For your reference see also below: >>>>>>>>>>>>> >>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>>>>>> Private key: >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>>>>>> ptcp:6641 >>>>>>>>>>>>> >>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>>>>>> Private key: >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>>>>>> configured to use plain TCP without ssl. >>>>>>>>>>>> engine-setup usually configures ovn-central to use SSL. >>>>>>>>>>>> That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>>>>>> that engine-setup was triggered correctly. Looks like the >>>>>>>>>>>> ovn db was dropped somehow, this should not happen. >>>>>>>>>>>> This can be fixed manually by executing the following >>>>>>>>>>>> commands on engine's machine: >>>>>>>>>>>> ovn-nbctl set-ssl >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>>>>>> ovn-sbctl set-ssl >>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>>>>>> >>>>>>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>>> >>>>>>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>>>>>> following happens: >>>>>>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: >>>>>>>>>>>>> database connection failed (No such file or directory) >>>>>>>>>>>>> The above i believe is expected since no northbound >>>>>>>>>>>>> connections should be established from the host nodes. >>>>>>>>>>>>> >>>>>>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>>>>>> The output is stuck till i terminate it. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes, the ovn-* commands works only on engine's machine, >>>>>>>>>>>> which has the role ovn-central. >>>>>>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> For the requested logs the below are found in the >>>>>>>>>>>>> ovsdb-server-sb.log >>>>>>>>>>>>> >>>>>>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>> excessive rate >>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> How can we fix these SSL errors? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I addressed this above. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I thought vdsm did the certificate provisioning on the >>>>>>>>>>>>> host nodes as to communicate to the engine host node. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>>>>>> configuration on the ovn-central was lost. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please check the output of >>>>>>>>>>>>>> >>>>>>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>>> >>>>>>>>>>>>>> it should be similar to >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>>>>>> pssl:6641:[::] >>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>>>>>> [root@ovirt-43 ~]# ls -l >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did a restart of the ovn-controller, this is the >>>>>>>>>>>>>>> output of the ovn-controller.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log >>>>>>>>>>>>>>> file /var/log/openvswitch/ovn-controller.log >>>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>>> connected >>>>>>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have also done the vdsm-tool ovn-config >>>>>>>>>>>>>>> OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC >>>>>>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the ovn >>>>>>>>>>>>>>> controller, i can redo it if you wan. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After the restart of the ovn-controller the OVIRT >>>>>>>>>>>>>>> ENGINE still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>>>>>> DC02-host01. >>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've re-done the vdsm-tool command and nothing >>>>>>>>>>>>>>> changed.... again....with the same errors as the systemctl restart >>>>>>>>>>>>>>> ovn-controller >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please include ovirt-users list in your reply, to >>>>>>>>>>>>>>>> share the knowledge and experience with the community! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>>>>>> DC01 >>>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> node02 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> DC02 >>>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>> . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looks good. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks >>>>>>>>>>>>>>>>> and VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>>>>>> messages? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can also try restart the ovn configuration on all >>>>>>>>>>>>>>>> hosts by executing >>>>>>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>>>>>>>>> LOCAL_OVIRTMGMT_IP >>>>>>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>>>>>> internally. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So I would expect to see the same output for node01 >>>>>>>>>>>>>>>>> to have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Me too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos >>>>>>>>>>>>>>>>>> Betsis <k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> OVN is selected as the default network provider on >>>>>>>>>>>>>>>>>>> the clusters and the hosts. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>>>>>> This configuration is required already during the >>>>>>>>>>>>>>>>>> host is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine and >>>>>>>>>>>>>>>>>>> shows only two hosts, 1 per DC. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> When executed on the nodes the same command >>>>>>>>>>>>>>>>>>> (ovn-sbctl show) times-out on all nodes..... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the >>>>>>>>>>>>>>>>>> failing host dc01-node1? >>>>>>>>>>>>>>>>>> This should point us the relevant difference in the >>>>>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos 7 >>>>>>>>>>>>>>>>>>>>> node and start from scratch. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> To trigger the automatic configuration of the >>>>>>>>>>>>>>>>>>>> host, it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Even though the installation completed >>>>>>>>>>>>>>>>>>>>> successfully and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with >>>>>>>>>>>>>>>>>>>>> the other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port 6642 >>>>>>>>>>>>>>>>>>>>> for the ovs db. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the >>>>>>>>>>>>>>>>>>>> host should contain a helpful hint. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>

From the configuration I can see only three nodes..... "Encap":{ #dc01-node02 "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"}, #dc01-node01 "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"}, #dc02-node01 "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}} So I don't understand why the dc01-node02 tries to establish a tunnel with itself..... Is there a way for ovn to refresh according to Ovirt network database as to not affect VM networks? On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Thank you Best Regards Konstantinos Betsis
On Wed, Sep 16, 2020, 13:25 Dominik Holler <dholler@redhat.com> wrote:
On Wed, Sep 16, 2020 at 12:15 PM Konstantinos Betsis < k.betsis@gmail.com> wrote:
I have a better solution. I am currently migrating all VMs over to dc01-node01 and then i'll format it as to fix the partitioning as well.
In theory the ovs sbdb will be fixed once it is re-installed....
If not we can then check if there is a stale entry in the ovirt host where the sb db is managed.
Maybe you could ensure that there is no entry anymore during dc01-host02 is reinstalling, before the host is added to oVirt again?
Do you agree with this?
Sounds good, but OVN should be not the reason to reinstall.
On Wed, Sep 16, 2020 at 1:00 PM Dominik Holler <dholler@redhat.com> wrote:
Maybe because of a duplicated entry in the ovn sb db? Can you please stop the ovn-ctrontroller on this host, remove the host from the ovn sb db, ensure it is gone and restart the ovn-controller on the host?
On Wed, Sep 16, 2020 at 11:55 AM Konstantinos Betsis < k.betsis@gmail.com> wrote:
> Hi Dominik > > Just saw the below on host dc01-host02 > > ovs-vsctl show > f3b13557-dfb4-45a4-b6af-c995ccf68720 > Bridge br-int > Port "ovn-95ccb0-0" > Interface "ovn-95ccb0-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-host01"} > Port "vnet10" > Interface "vnet10" > Port "vnet11" > Interface "vnet11" > Port "vnet0" > Interface "vnet0" > Port "vnet9" > Interface "vnet9" > Port "vnet8" > Interface "vnet8" > Port br-int > Interface br-int > type: internal > Port "vnet12" > Interface "vnet12" > Port "ovn-be3abc-0" > Interface "ovn-be3abc-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-host02"} > Port "vnet7" > Interface "vnet7" > Port "ovn-c4b238-0" > Interface "ovn-c4b238-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc02-host01"} > Port "vnet6" > Interface "vnet6" > ovs_version: "2.11.0" > > > Why would this node establish a geneve tunnel to himself? > Other nodes do not exhibit this behavior. > > On Wed, Sep 16, 2020 at 12:21 PM Konstantinos Betsis < > k.betsis@gmail.com> wrote: > >> Hi Dominik >> >> Below is the output of the ovs-vsctl list interface >> >> _uuid : bdaf92c1-4389-4ddf-aab0-93975076ebb2 >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:02", >> iface-id="5d03a7a5-82a1-40f9-b50c-353a26167fa3", iface-status=active, >> vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} >> ifindex : 34 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:02" >> mtu : 1442 >> mtu_request : [] >> name : "vnet6" >> ofport : 2 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=10828495, >> rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, >> rx_packets=117713, tx_bytes=20771797, tx_dropped=0, tx_errors=0, >> tx_packets=106954} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : bad80911-3993-4085-a0b0-962b6c9156cd >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : [] >> error : [] >> external_ids : {} >> ifindex : 39 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 0 >> link_speed : [] >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:37:52:c4:cb:03" >> mtu : [] >> mtu_request : [] >> name : "ovn-c4b238-0" >> ofport : 7 >> ofport_request : [] >> options : {csum="true", key=flow, >> remote_ip="192.168.121.164"} >> other_config : {} >> statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, >> tx_packets=0} >> status : {tunnel_egress_iface="ovirtmgmt-ams03", >> tunnel_egress_iface_carrier=up} >> type : geneve >> >> _uuid : 8e7705d1-0b9d-4e30-8277-c339e7e1c27a >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:0d", >> iface-id="b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7", iface-status=active, >> vm-id="8d73f333-bca4-4b32-9b87-2e7ee07eda84"} >> ifindex : 28 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:0d" >> mtu : 1442 >> mtu_request : [] >> name : "vnet0" >> ofport : 1 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=20609787, >> rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, >> rx_packets=104535, tx_bytes=10830007, tx_dropped=0, tx_errors=0, >> tx_packets=117735} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : 86dcc68a-63e4-4445-9373-81c1f4502c17 >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:10", >> iface-id="4e8d5636-4110-41b2-906d-f9b04c2e62cd", iface-status=active, >> vm-id="9a002a9b-5f09-4def-a531-d50ff683470b"} >> ifindex : 40 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:10" >> mtu : 1442 >> mtu_request : [] >> name : "vnet11" >> ofport : 10 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=3311352, >> rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, >> rx_packets=51012, tx_bytes=5514116, tx_dropped=0, tx_errors=0, >> tx_packets=103456} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : e8d5e4a2-b9a0-4146-8d98-34713cb443de >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:15", >> iface-id="b88de6e4-6d77-4e42-b734-4cc676728910", iface-status=active, >> vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} >> ifindex : 37 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:15" >> mtu : 1442 >> mtu_request : [] >> name : "vnet9" >> ofport : 5 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, >> rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, >> tx_bytes=4500, tx_dropped=0, tx_errors=0, tx_packets=74} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : 6a2974b3-cd72-4688-a630-0a7e9c779b21 >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:17", >> iface-id="64681036-26e2-41d7-b73f-ab5302610145", iface-status=active, >> vm-id="bf0dc78c-dad5-41a0-914c-ae0da0f9a388"} >> ifindex : 41 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:17" >> mtu : 1442 >> mtu_request : [] >> name : "vnet12" >> ofport : 11 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=5513640, >> rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, >> rx_packets=103450, tx_bytes=3311868, tx_dropped=0, tx_errors=0, >> tx_packets=51018} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : 44498e54-f122-41a0-a41a-7a88ba2dba9b >> admin_state : down >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : [] >> error : [] >> external_ids : {} >> ifindex : 7 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 0 >> link_speed : [] >> link_state : down >> lldp : {} >> mac : [] >> mac_in_use : "32:0a:69:67:07:4f" >> mtu : 1442 >> mtu_request : [] >> name : br-int >> ofport : 65534 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, >> rx_dropped=326, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, >> tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} >> status : {driver_name=openvswitch} >> type : internal >> >> _uuid : e2114584-8ceb-43d6-817b-e457738ead8a >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:03", >> iface-id="16162721-c815-4cd8-ab57-f22e6e482c7f", iface-status=active, >> vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} >> ifindex : 35 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:03" >> mtu : 1442 >> mtu_request : [] >> name : "vnet7" >> ofport : 3 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, >> rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, >> tx_bytes=4730, tx_dropped=0, tx_errors=0, tx_packets=77} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : ee16943e-d145-4080-893f-464098a6388f >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : [] >> error : [] >> external_ids : {} >> ifindex : 39 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 0 >> link_speed : [] >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "1e:50:3f:a8:42:d1" >> mtu : [] >> mtu_request : [] >> name : "ovn-be3abc-0" >> ofport : 8 >> ofport_request : [] >> options : {csum="true", key=flow, >> remote_ip="DC01-host02"} >> other_config : {} >> statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, >> tx_packets=0} >> status : {tunnel_egress_iface="ovirtmgmt-ams03", >> tunnel_egress_iface_carrier=up} >> type : geneve >> >> _uuid : 86a229be-373e-4c43-b2f1-6190523ed73a >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:1c", >> iface-id="12d829c3-64eb-44bc-a0bd-d7219991f35f", iface-status=active, >> vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} >> ifindex : 38 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:1c" >> mtu : 1442 >> mtu_request : [] >> name : "vnet10" >> ofport : 6 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=117912, rx_crc_err=0, >> rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2195, >> tx_bytes=4204, tx_dropped=0, tx_errors=0, tx_packets=66} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> _uuid : fa4b8d96-bffe-4b56-930e-0e7fcc5f68ac >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : [] >> error : [] >> external_ids : {} >> ifindex : 39 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 0 >> link_speed : [] >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "7a:28:24:eb:ec:d2" >> mtu : [] >> mtu_request : [] >> name : "ovn-95ccb0-0" >> ofport : 9 >> ofport_request : [] >> options : {csum="true", key=flow, >> remote_ip="DC01-host01"} >> other_config : {} >> statistics : {rx_bytes=0, rx_packets=0, tx_bytes=12840478, >> tx_packets=224029} >> status : {tunnel_egress_iface="ovirtmgmt-ams03", >> tunnel_egress_iface_carrier=up} >> type : geneve >> >> _uuid : 5e3df5c7-958c-491d-8d41-0ae83c613f1d >> admin_state : up >> bfd : {} >> bfd_status : {} >> cfm_fault : [] >> cfm_fault_status : [] >> cfm_flap_count : [] >> cfm_health : [] >> cfm_mpid : [] >> cfm_remote_mpids : [] >> cfm_remote_opstate : [] >> duplex : full >> error : [] >> external_ids : {attached-mac="56:6f:77:61:00:06", >> iface-id="9a6cc189-0934-4468-97ae-09f90fa4598d", iface-status=active, >> vm-id="e45b7b34-24f2-41ed-b95e-cd1ad532e8d3"} >> ifindex : 36 >> ingress_policing_burst: 0 >> ingress_policing_rate: 0 >> lacp_current : [] >> link_resets : 1 >> link_speed : 10000000 >> link_state : up >> lldp : {} >> mac : [] >> mac_in_use : "fe:6f:77:61:00:06" >> mtu : 1442 >> mtu_request : [] >> name : "vnet8" >> ofport : 4 >> ofport_request : [] >> options : {} >> other_config : {} >> statistics : {collisions=0, rx_bytes=180, rx_crc_err=0, >> rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=2, >> tx_bytes=8829812, tx_dropped=0, tx_errors=0, tx_packets=154540} >> status : {driver_name=tun, driver_version="1.6", >> firmware_version=""} >> type : "" >> >> >> I've identified which VMs have these MAC addresses but i do not see >> any "conflict" with any other VM's MAC address. >> >> I really do not understand why these will create a conflict. >> >> On Wed, Sep 16, 2020 at 12:06 PM Dominik Holler <dholler@redhat.com> >> wrote: >> >>> >>> >>> On Tue, Sep 15, 2020 at 6:53 PM Konstantinos Betsis < >>> k.betsis@gmail.com> wrote: >>> >>>> So a new test-net was created under DC01 and was depicted in the >>>> networks tab under both DC01 and DC02. >>>> I believe for some reason networks are duplicated in DCs, maybe >>>> for future use??? Don't know. >>>> If one tries to delete the network from the other DC it gets an >>>> error, while if deleted from the once initially created it gets deleted >>>> from both. >>>> >>>> >>> In oVirt a logical network is an entity in a data center. If the >>> automatic synchronization is enabled on the ovirt-provider-ovn entity in >>> oVirt Engine, the OVN networks are reflected to all data centers. If you do >>> not like this, you can disable the automatic synchronization of the >>> ovirt-provider-ovn in Admin Portal. >>> >>> >>>> From the DC01-node02 i get the following errors: >>>> >>>> 2020-09-15T16:48:49.904Z|22748|main|INFO|OVNSB commit failed, >>>> force recompute next time. >>>> 2020-09-15T16:48:49.905Z|22749|binding|INFO|Claiming lport >>>> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >>>> 2020-09-15T16:48:49.905Z|22750|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >>>> Claiming 56:6f:77:61:00:06 >>>> 2020-09-15T16:48:49.905Z|22751|binding|INFO|Claiming lport >>>> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >>>> 2020-09-15T16:48:49.905Z|22752|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >>>> Claiming 56:6f:77:61:00:03 >>>> 2020-09-15T16:48:49.905Z|22753|binding|INFO|Claiming lport >>>> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >>>> 2020-09-15T16:48:49.905Z|22754|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >>>> Claiming 56:6f:77:61:00:15 >>>> 2020-09-15T16:48:49.905Z|22755|binding|INFO|Claiming lport >>>> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >>>> 2020-09-15T16:48:49.905Z|22756|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >>>> Claiming 56:6f:77:61:00:0d >>>> 2020-09-15T16:48:49.905Z|22757|binding|INFO|Claiming lport >>>> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >>>> 2020-09-15T16:48:49.905Z|22758|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >>>> Claiming 56:6f:77:61:00:02 >>>> 2020-09-15T16:48:49.905Z|22759|binding|INFO|Claiming lport >>>> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >>>> 2020-09-15T16:48:49.905Z|22760|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >>>> Claiming 56:6f:77:61:00:1c >>>> 2020-09-15T16:48:49.959Z|22761|main|INFO|OVNSB commit failed, >>>> force recompute next time. >>>> 2020-09-15T16:48:49.960Z|22762|binding|INFO|Claiming lport >>>> 9a6cc189-0934-4468-97ae-09f90fa4598d for this chassis. >>>> 2020-09-15T16:48:49.960Z|22763|binding|INFO|9a6cc189-0934-4468-97ae-09f90fa4598d: >>>> Claiming 56:6f:77:61:00:06 >>>> 2020-09-15T16:48:49.960Z|22764|binding|INFO|Claiming lport >>>> 16162721-c815-4cd8-ab57-f22e6e482c7f for this chassis. >>>> 2020-09-15T16:48:49.960Z|22765|binding|INFO|16162721-c815-4cd8-ab57-f22e6e482c7f: >>>> Claiming 56:6f:77:61:00:03 >>>> 2020-09-15T16:48:49.960Z|22766|binding|INFO|Claiming lport >>>> b88de6e4-6d77-4e42-b734-4cc676728910 for this chassis. >>>> 2020-09-15T16:48:49.960Z|22767|binding|INFO|b88de6e4-6d77-4e42-b734-4cc676728910: >>>> Claiming 56:6f:77:61:00:15 >>>> 2020-09-15T16:48:49.960Z|22768|binding|INFO|Claiming lport >>>> b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7 for this chassis. >>>> 2020-09-15T16:48:49.960Z|22769|binding|INFO|b7ff5f2b-4bb4-4250-8ad8-8a7e19d2b4c7: >>>> Claiming 56:6f:77:61:00:0d >>>> 2020-09-15T16:48:49.960Z|22770|binding|INFO|Claiming lport >>>> 5d03a7a5-82a1-40f9-b50c-353a26167fa3 for this chassis. >>>> 2020-09-15T16:48:49.960Z|22771|binding|INFO|5d03a7a5-82a1-40f9-b50c-353a26167fa3: >>>> Claiming 56:6f:77:61:00:02 >>>> 2020-09-15T16:48:49.960Z|22772|binding|INFO|Claiming lport >>>> 12d829c3-64eb-44bc-a0bd-d7219991f35f for this chassis. >>>> 2020-09-15T16:48:49.960Z|22773|binding|INFO|12d829c3-64eb-44bc-a0bd-d7219991f35f: >>>> Claiming 56:6f:77:61:00:1c >>>> >>>> >>>> And this repeats forever. >>>> >>>> >>> Looks like the southbound db is confused. >>> >>> Can you try to delete all chassis listed by >>> sudo ovn-sbctl show >>> via >>> sudo /usr/share/ovirt-provider-ovn/scripts/remove_chassis.sh >>> dev-host0 >>> ? >>> if the script remove_chassis.sh is not installed, you can use >>> >>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/scripts/rem... >>> instead. >>> >>> Can you please also share the output of >>> ovs-vsctl list Interface >>> on the host which produced the logfile above? >>> >>> >>> >>> >>>> The connections to ovn-sbctl is ok and the geneve tunnels are >>>> depicted under ovs-vsctl ok. >>>> VMs still not able to ping each other. >>>> >>>> On Tue, Sep 15, 2020 at 7:22 PM Dominik Holler < >>>> dholler@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Sep 15, 2020 at 6:18 PM Konstantinos Betsis < >>>>> k.betsis@gmail.com> wrote: >>>>> >>>>>> Hi Dominik >>>>>> >>>>>> Fixed the issue. >>>>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>>> I believe the /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>> needed update also. >>>>>> The package is upgraded to the latest version. >>>>>> >>>>>> Once the provider was updated with the following it functioned >>>>>> perfectly: >>>>>> >>>>>> Name: ovirt-provider-ovn >>>>>> Description: oVirt network provider for OVN >>>>>> Type: External Network Provider >>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>> Automatic Synchronization: Checked >>>>>> Unmanaged: Unchecked >>>>>> Provider URL: https:dc02-ovirt01.testdomain.com:9696 >>>>>> Requires Authentication: Checked >>>>>> Username: admin@internal >>>>>> Password: "The admin password" >>>>>> Protocol: HTTPS >>>>>> Host Name: dc02-ovirt01.testdomain.com >>>>>> API Port: 35357 >>>>>> API Version: v2.0 >>>>>> Tenant Name: "Empty" >>>>>> >>>>>> For some reason the TLS certificate was in conflict with the >>>>>> ovn provider details, i would bet the "host" entry. >>>>>> >>>>>> So now geneve tunnels are established. >>>>>> OVN provider is working. >>>>>> >>>>>> But VMs still do not communicated on the same VM network >>>>>> spanning different hosts. >>>>>> >>>>>> So if we have a VM network test-net on both dc01-host01 and >>>>>> dc01-host02 and each host has a VM with IP addresses on the same network, >>>>>> VMs on the same VM network should communicate directly. >>>>>> But traffic does not reach each other. >>>>>> >>>>>> >>>>> Can you create a new external network, with port security >>>>> disabled, and an IPv4 subnet? >>>>> If the VMs get an IP address via DHCP, ovn is working, and >>>>> should be able to ping each other, too. >>>>> If not, there should be a helpful entry in the >>>>> ovn-controller.log of the host the VM is running. >>>>> >>>>> >>>>>> On Tue, Sep 15, 2020 at 7:07 PM Dominik Holler < >>>>>> dholler@redhat.com> wrote: >>>>>> >>>>>>> Can you try again with: >>>>>>> >>>>>>> [OVN REMOTE] >>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>> [SSL] >>>>>>> https-enabled=false >>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>> >>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>> >>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>> [OVIRT] >>>>>>> ovirt-sso-client-secret=*random_test* >>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>> <https://dc02-ovirt01.testdomain.com/> >>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>> [NETWORK] >>>>>>> port-security-enabled-default=True >>>>>>> [PROVIDER] >>>>>>> >>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please note that the should match the HTTP or HTTPS in the of >>>>>>> the ovirt-prover-ovn configuration in oVirt Engine. >>>>>>> So if the ovirt-provider-ovn entity in Engine is on HTTP, the >>>>>>> config file should use >>>>>>> https-enabled=false >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 15, 2020 at 5:56 PM Konstantinos Betsis < >>>>>>> k.betsis@gmail.com> wrote: >>>>>>> >>>>>>>> This is the updated one: >>>>>>>> >>>>>>>> # This file is automatically generated by engine-setup. >>>>>>>> Please do not edit manually >>>>>>>> [OVN REMOTE] >>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>> [SSL] >>>>>>>> https-enabled=true >>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>> >>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>> >>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>> [OVIRT] >>>>>>>> ovirt-sso-client-secret=*random_text* >>>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>>> [NETWORK] >>>>>>>> port-security-enabled-default=True >>>>>>>> [PROVIDER] >>>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>>> [AUTH] >>>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>>> >>>>>>>> >>>>>>>> However, it still does not connect. >>>>>>>> It prompts for the certificate but then fails and prompts to >>>>>>>> see the log but the ovirt-provider-ovn.log does not list anything. >>>>>>>> >>>>>>>> Yes we've got ovirt for about a year now from about version >>>>>>>> 4.1 >>>>>>>> >>>>>>>> >>>>>>> This might explain the trouble. Upgrade of ovirt-provider-ovn >>>>>>> should work flawlessly starting from oVirt 4.2. >>>>>>> >>>>>>> >>>>>>>> On Tue, Sep 15, 2020 at 6:44 PM Dominik Holler < >>>>>>>> dholler@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 15, 2020 at 5:34 PM Konstantinos Betsis < >>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> There is a file with the below entries >>>>>>>>>> >>>>>>>>> >>>>>>>>> Impressive, do you know when this config file was created >>>>>>>>> and if it was manually modified? >>>>>>>>> Is this an upgrade from oVirt 4.1? >>>>>>>>> >>>>>>>>> >>>>>>>>>> [root@dc02-ovirt01 log]# cat >>>>>>>>>> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf >>>>>>>>>> # This file is automatically generated by engine-setup. >>>>>>>>>> Please do not edit manually >>>>>>>>>> [OVN REMOTE] >>>>>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>>>>> [SSL] >>>>>>>>>> https-enabled=false >>>>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>>>> >>>>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>>>> >>>>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>>>> [OVIRT] >>>>>>>>>> ovirt-sso-client-secret=*random_test* >>>>>>>>>> ovirt-host=https://dc02-ovirt01.testdomain.com:443 >>>>>>>>>> ovirt-sso-client-id=ovirt-provider-ovn >>>>>>>>>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem >>>>>>>>>> [NETWORK] >>>>>>>>>> port-security-enabled-default=True >>>>>>>>>> [PROVIDER] >>>>>>>>>> >>>>>>>>>> provider-host=dc02-ovirt01.testdomain.com >>>>>>>>>> >>>>>>>>>> The only entry missing is the [AUTH] and under [SSL] the >>>>>>>>>> https-enabled is false. Should I edit this in this file or is this going to >>>>>>>>>> break everything? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Changing the file should improve, but better create a backup >>>>>>>>> into another diretory before modification. >>>>>>>>> The only required change is >>>>>>>>> from >>>>>>>>> ovn-remote=tcp:127.0.0.1:6641 >>>>>>>>> to >>>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Tue, Sep 15, 2020 at 6:27 PM Dominik Holler < >>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 15, 2020 at 5:11 PM Konstantinos Betsis < >>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Dominik >>>>>>>>>>>> >>>>>>>>>>>> That immediately fixed the geneve tunnels between all >>>>>>>>>>>> hosts. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> thanks for the feedback. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> However, the ovn provider is not broken. >>>>>>>>>>>> After fixing the networks we tried to move a VM to the >>>>>>>>>>>> DC01-host01 so we powered it down and simply configured it to run on >>>>>>>>>>>> dc01-node01. >>>>>>>>>>>> >>>>>>>>>>>> While checking the logs on the ovirt engine i noticed the >>>>>>>>>>>> below: >>>>>>>>>>>> Failed to synchronize networks of Provider >>>>>>>>>>>> ovirt-provider-ovn. >>>>>>>>>>>> >>>>>>>>>>>> The ovn-provider configure on the engine is the below: >>>>>>>>>>>> Name: ovirt-provider-ovn >>>>>>>>>>>> Description: oVirt network provider for OVN >>>>>>>>>>>> Type: External Network Provider >>>>>>>>>>>> Network Plugin: oVirt Network Provider for OVN >>>>>>>>>>>> Automatic Synchronization: Checked >>>>>>>>>>>> Unmanaged: Unchecked >>>>>>>>>>>> Provider URL: http:localhost:9696 >>>>>>>>>>>> Requires Authentication: Checked >>>>>>>>>>>> Username: admin@internal >>>>>>>>>>>> Password: "The admin password" >>>>>>>>>>>> Protocol: hTTP >>>>>>>>>>>> Host Name: dc02-ovirt01 >>>>>>>>>>>> API Port: 35357 >>>>>>>>>>>> API Version: v2.0 >>>>>>>>>>>> Tenant Name: "Empty" >>>>>>>>>>>> >>>>>>>>>>>> In the past this was deleted by an engineer and recreated >>>>>>>>>>>> as per the documentation, and it worked. Do we need to update something due >>>>>>>>>>>> to the SSL on the ovn? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Is there a file in /etc/ovirt-provider-ovn/conf.d/ ? >>>>>>>>>>> engine-setup should have created one. >>>>>>>>>>> If the file is missing, for testing purposes, you can >>>>>>>>>>> create a >>>>>>>>>>> file /etc/ovirt-provider-ovn/conf.d/00-setup-ovirt-provider-ovn-test.conf : >>>>>>>>>>> [PROVIDER] >>>>>>>>>>> provider-host=REPLACE_WITH_FQDN >>>>>>>>>>> [SSL] >>>>>>>>>>> >>>>>>>>>>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer >>>>>>>>>>> >>>>>>>>>>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass >>>>>>>>>>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem >>>>>>>>>>> https-enabled=true >>>>>>>>>>> [OVN REMOTE] >>>>>>>>>>> ovn-remote=ssl:127.0.0.1:6641 >>>>>>>>>>> [AUTH] >>>>>>>>>>> auth-plugin=auth.plugins.static_token:NoAuthPlugin >>>>>>>>>>> [NETWORK] >>>>>>>>>>> port-security-enabled-default=True >>>>>>>>>>> >>>>>>>>>>> and restart the ovirt-provider-ovn service. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> From the ovn-provider logs the below is generated after a >>>>>>>>>>>> service restart and when the start VM is triggered >>>>>>>>>>>> >>>>>>>>>>>> 2020-09-15 15:07:33,579 root Starting server >>>>>>>>>>>> 2020-09-15 15:07:33,579 root Version: 1.2.29-1 >>>>>>>>>>>> 2020-09-15 15:07:33,579 root Build date: 20191217125241 >>>>>>>>>>>> 2020-09-15 15:07:33,579 root Githash: cb5a80d >>>>>>>>>>>> 2020-09-15 15:08:26,582 root From: ::ffff:127.0.0.1:59980 >>>>>>>>>>>> Request: GET /v2.0/ports >>>>>>>>>>>> 2020-09-15 15:08:26,582 root Could not retrieve schema >>>>>>>>>>>> from tcp:127.0.0.1:6641: Unknown error -1 >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/base_handler.py", line 138, in >>>>>>>>>>>> _handle_request >>>>>>>>>>>> method, path_parts, content >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py", line 175, in >>>>>>>>>>>> handle_request >>>>>>>>>>>> return self.call_response_handler(handler, content, >>>>>>>>>>>> parameters) >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/share/ovirt-provider-ovn/handlers/neutron.py", line 35, in >>>>>>>>>>>> call_response_handler >>>>>>>>>>>> with NeutronApi() as ovn_north: >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/share/ovirt-provider-ovn/neutron/neutron_api.py", line 95, in __init__ >>>>>>>>>>>> self.ovsidl, self.idl = ovn_connection.connect() >>>>>>>>>>>> File "/usr/share/ovirt-provider-ovn/ovn_connection.py", >>>>>>>>>>>> line 46, in connect >>>>>>>>>>>> ovnconst.OVN_NORTHBOUND >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", >>>>>>>>>>>> line 127, in from_server >>>>>>>>>>>> helper = >>>>>>>>>>>> idlutils.get_schema_helper(connection_string, schema_name) >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", >>>>>>>>>>>> line 128, in get_schema_helper >>>>>>>>>>>> 'err': os.strerror(err)}) >>>>>>>>>>>> Exception: Could not retrieve schema from tcp: >>>>>>>>>>>> 127.0.0.1:6641: Unknown error -1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> When i update the ovn provider from the GUI to have >>>>>>>>>>>> https://localhost:9696/ and HTTPS as the protocol the >>>>>>>>>>>> test fails. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 15, 2020 at 5:35 PM Dominik Holler < >>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis < >>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>> >>>>>>>>>>>>>> When these commands are used on the ovirt-engine host >>>>>>>>>>>>>> the output is the one depicted in your email. >>>>>>>>>>>>>> For your reference see also below: >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl >>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection >>>>>>>>>>>>>> ptcp:6641 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl >>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection >>>>>>>>>>>>>> read-write role="" ptcp:6642 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> ^^^ the line above points to the problem: ovn-central is >>>>>>>>>>>>> configured to use plain TCP without ssl. >>>>>>>>>>>>> engine-setup usually configures ovn-central to use SSL. >>>>>>>>>>>>> That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, >>>>>>>>>>>>> that engine-setup was triggered correctly. Looks like >>>>>>>>>>>>> the ovn db was dropped somehow, this should not happen. >>>>>>>>>>>>> This can be fixed manually by executing the following >>>>>>>>>>>>> commands on engine's machine: >>>>>>>>>>>>> ovn-nbctl set-ssl >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> ovn-nbctl set-connection pssl:6641 >>>>>>>>>>>>> ovn-sbctl set-ssl >>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>> /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>> ovn-sbctl set-connection pssl:6642 >>>>>>>>>>>>> >>>>>>>>>>>>> The /var/log/openvswitch/ovn-controller.log on the hosts >>>>>>>>>>>>> should tell that br-int.mgmt is connected now. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> [root@ath01-ovirt01 certs]# ls -l >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>> -rw-------. 1 root root 2893 Jun 25 11:08 >>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>>>> >>>>>>>>>>>>>> When i try the above commands on the node hosts the >>>>>>>>>>>>>> following happens: >>>>>>>>>>>>>> ovn-nbctl get-ssl / get-connection >>>>>>>>>>>>>> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: >>>>>>>>>>>>>> database connection failed (No such file or directory) >>>>>>>>>>>>>> The above i believe is expected since no northbound >>>>>>>>>>>>>> connections should be established from the host nodes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ovn-sbctl get-ssl /get-connection >>>>>>>>>>>>>> The output is stuck till i terminate it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Yes, the ovn-* commands works only on engine's machine, >>>>>>>>>>>>> which has the role ovn-central. >>>>>>>>>>>>> On the hosts, there is only the ovn-controller, which >>>>>>>>>>>>> connects the ovn southbound to openvswitch on the host. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> For the requested logs the below are found in the >>>>>>>>>>>>>> ovsdb-server-sb.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>>> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 >>>>>>>>>>>>>> log messages in last 8 seconds (most recently, 1 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>>> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>>> error parsing stream: line 0, column 0, byte 0: invalid character U+0016 >>>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 >>>>>>>>>>>>>> log messages in last 12 seconds (most recently, 4 seconds ago) due to >>>>>>>>>>>>>> excessive rate >>>>>>>>>>>>>> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>>> received SSL data on JSON-RPC channel >>>>>>>>>>>>>> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: >>>>>>>>>>>>>> connection dropped (Protocol error) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> How can we fix these SSL errors? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I addressed this above. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I thought vdsm did the certificate provisioning on the >>>>>>>>>>>>>> host nodes as to communicate to the engine host node. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Yes, this seems to work in your scenario, just the SSL >>>>>>>>>>>>> configuration on the ovn-central was lost. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler < >>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looks still like the ovn-controller on the host >>>>>>>>>>>>>>> has problems communicating with ovn-southbound. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are there any hints in /var/log/openvswitch/*.log, >>>>>>>>>>>>>>> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you please check the output of >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ovn-nbctl get-ssl >>>>>>>>>>>>>>> ovn-nbctl get-connection >>>>>>>>>>>>>>> ovn-sbctl get-ssl >>>>>>>>>>>>>>> ovn-sbctl get-connection >>>>>>>>>>>>>>> ls -l /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> it should be similar to >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-ssl >>>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >>>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-nbctl get-connection >>>>>>>>>>>>>>> pssl:6641:[::] >>>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-ssl >>>>>>>>>>>>>>> Private key: >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >>>>>>>>>>>>>>> CA Certificate: /etc/pki/ovirt-engine/ca.pem >>>>>>>>>>>>>>> Bootstrap: false >>>>>>>>>>>>>>> [root@ovirt-43 ~]# ovn-sbctl get-connection >>>>>>>>>>>>>>> read-write role="" pssl:6642:[::] >>>>>>>>>>>>>>> [root@ovirt-43 ~]# ls -l >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-* >>>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >>>>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >>>>>>>>>>>>>>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >>>>>>>>>>>>>>> -rw-------. 1 root root 2709 Oct 14 2019 >>>>>>>>>>>>>>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis < >>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did a restart of the ovn-controller, this is the >>>>>>>>>>>>>>>> output of the ovn-controller.log >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log >>>>>>>>>>>>>>>> file /var/log/openvswitch/ovn-controller.log >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>>>>>>>>>>>>>>> connected >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL >>>>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL >>>>>>>>>>>>>>>> reconnected, force recompute. >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> waiting 2 seconds before reconnect >>>>>>>>>>>>>>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> waiting 4 seconds before reconnect >>>>>>>>>>>>>>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connecting... >>>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> connection attempt failed (Protocol error) >>>>>>>>>>>>>>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>>>>>>>>>>>>>>> continuing to reconnect in the background but suppressing further logging >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have also done the vdsm-tool ovn-config >>>>>>>>>>>>>>>> OVIRT_ENGINE_IP OVIRTMGMT_NETWORK_DC >>>>>>>>>>>>>>>> This is how the OVIRT_ENGINE_IP is provided in the >>>>>>>>>>>>>>>> ovn controller, i can redo it if you wan. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After the restart of the ovn-controller the OVIRT >>>>>>>>>>>>>>>> ENGINE still shows only two geneve connections one with DC01-host02 and >>>>>>>>>>>>>>>> DC02-host01. >>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>> hostname: "dc02-host01" >>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>> ip: "DC02-host01_IP" >>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>> hostname: "DC01-host02" >>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>> ip: "DC01-host02" >>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've re-done the vdsm-tool command and nothing >>>>>>>>>>>>>>>> changed.... again....with the same errors as the systemctl restart >>>>>>>>>>>>>>>> ovn-controller >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler < >>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please include ovirt-users list in your reply, to >>>>>>>>>>>>>>>>> share the knowledge and experience with the community! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis >>>>>>>>>>>>>>>>> <k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ok below the output per node and DC >>>>>>>>>>>>>>>>>> DC01 >>>>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> node02 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> DC02 >>>>>>>>>>>>>>>>>> node01 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open >>>>>>>>>>>>>>>>>> . external-ids:ovn-remote >>>>>>>>>>>>>>>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-type >>>>>>>>>>>>>>>>>> geneve >>>>>>>>>>>>>>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get >>>>>>>>>>>>>>>>>> open . external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looks good. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> DC01 node01 and node02 share the same VM networks >>>>>>>>>>>>>>>>>> and VMs deployed on top of them cannot talk to VM on the other hypervisor. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Maybe there is a hint on ovn-controller.log on >>>>>>>>>>>>>>>>> dc01-node02 ? Maybe restarting ovn-controller creates more helpful log >>>>>>>>>>>>>>>>> messages? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can also try restart the ovn configuration on >>>>>>>>>>>>>>>>> all hosts by executing >>>>>>>>>>>>>>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP >>>>>>>>>>>>>>>>> LOCAL_OVIRTMGMT_IP >>>>>>>>>>>>>>>>> on each host, this would trigger >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup... >>>>>>>>>>>>>>>>> internally. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So I would expect to see the same output for node01 >>>>>>>>>>>>>>>>>> to have a geneve tunnel to node02 and vice versa. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Me too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler < >>>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos >>>>>>>>>>>>>>>>>>> Betsis <k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Dominik >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> OVN is selected as the default network provider >>>>>>>>>>>>>>>>>>>> on the clusters and the hosts. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> sounds good. >>>>>>>>>>>>>>>>>>> This configuration is required already during the >>>>>>>>>>>>>>>>>>> host is added to oVirt Engine, because OVN is configured during this step. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The "ovn-sbctl show" works on the ovirt engine >>>>>>>>>>>>>>>>>>>> and shows only two hosts, 1 per DC. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>>>>>>>>>>>>>>> hostname: "dc01-node02" >>>>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>>>> ip: "X.X.X.X" >>>>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>>>>>>>>>>>>>>> hostname: "dc02-node1" >>>>>>>>>>>>>>>>>>>> Encap geneve >>>>>>>>>>>>>>>>>>>> ip: "A.A.A.A" >>>>>>>>>>>>>>>>>>>> options: {csum="true"} >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The new node is not listed (dc01-node1). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> When executed on the nodes the same command >>>>>>>>>>>>>>>>>>>> (ovn-sbctl show) times-out on all nodes..... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The output of the >>>>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovn-conntroller.log lists on all logs >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>>>>>>>>>>>>>>> unexpected SSL connection close >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Can you please compare the output of >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>>> external-ids:ovn-remote >>>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>>> external-ids:ovn-encap-type >>>>>>>>>>>>>>>>>>> ovs-vsctl --no-wait get open . >>>>>>>>>>>>>>>>>>> external-ids:ovn-encap-ip >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> of the working hosts, e.g. dc01-node02, and the >>>>>>>>>>>>>>>>>>> failing host dc01-node1? >>>>>>>>>>>>>>>>>>> This should point us the relevant difference in >>>>>>>>>>>>>>>>>>> the configuration. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please include ovirt-users list in your replay, to >>>>>>>>>>>>>>>>>>> share the knowledge and experience with the community. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>> Konstantinos Betsis >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler < >>>>>>>>>>>>>>>>>>>> dholler@redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B < >>>>>>>>>>>>>>>>>>>>> k.betsis@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi all >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>>>>>>>>>>>>>>> 1 Cluster is based on Centos 7 and the other on >>>>>>>>>>>>>>>>>>>>>> OVIRT NG Node image. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The environment was stable till an upgrade took >>>>>>>>>>>>>>>>>>>>>> place a couple of months ago. >>>>>>>>>>>>>>>>>>>>>> As such we had to re-install one of the Centos >>>>>>>>>>>>>>>>>>>>>> 7 node and start from scratch. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> To trigger the automatic configuration of the >>>>>>>>>>>>>>>>>>>>> host, it is required to configure ovirt-provider-ovn as the default network >>>>>>>>>>>>>>>>>>>>> provider for the cluster before adding the host to oVirt. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Even though the installation completed >>>>>>>>>>>>>>>>>>>>>> successfully and VMs are created, the following are not working as expected: >>>>>>>>>>>>>>>>>>>>>> 1. ovn geneve tunnels are not established with >>>>>>>>>>>>>>>>>>>>>> the other Centos 7 node in the cluster. >>>>>>>>>>>>>>>>>>>>>> 2. Centos 7 node is configured by ovirt engine >>>>>>>>>>>>>>>>>>>>>> however no geneve tunnel is established when "ovn-sbctl show" is issued on >>>>>>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 3. no flows are shown on the engine on port >>>>>>>>>>>>>>>>>>>>>> 6642 for the ovs db. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Does anyone have any experience on how to >>>>>>>>>>>>>>>>>>>>>> troubleshoot OVN on ovirt? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> /var/log/openvswitch/ovncontroller.log on the >>>>>>>>>>>>>>>>>>>>> host should contain a helpful hint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> Users mailing list -- users@ovirt.org >>>>>>>>>>>>>>>>>>>>>> To unsubscribe send an email to >>>>>>>>>>>>>>>>>>>>>> users-leave@ovirt.org >>>>>>>>>>>>>>>>>>>>>> Privacy Statement: >>>>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/privacy-policy.html >>>>>>>>>>>>>>>>>>>>>> oVirt Code of Conduct: >>>>>>>>>>>>>>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>>>>>>>>>>>>>>> List Archives: >>>>>>>>>>>>>>>>>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF3EK... >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>

On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02 "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"}, #dc01-node01 "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"}, #dc02-node01 "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database as to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos, One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname: $ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02 This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes? Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db Thanks, Dumitru

On 10/1/20 8:59 AM, Dumitru Ceara wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02 "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"}, #dc01-node01 "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"}, #dc02-node01 "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database as to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Also, it might help pinpoint the issue if we have the ovn-controller logs from the OVN nodes. They should be in /var/log/openvswitch/ovn-controller.log Thanks again, Dumitru

Hi Dimitru I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset. But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2.... How, can we fix this? On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02
"da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
#dc01-node01
"4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
#dc02-node01
"f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database as to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru

Regarding the ovn-controller logs.... 2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time. I don't think we can see anything more from these. On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02
"da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
#dc01-node01
"4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
#dc02-node01
"f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database as to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru

Hi guys Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface. Is there a way i can flush it and have it reconfigured from ovirt? Thank you Best Regards Konstantinos Betsis On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.
I don't think we can see anything more from these.
On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02
"da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
#dc01-node01
"4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
#dc02-node01
"f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database
as
to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis < k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru

On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys
Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.
Is there a way i can flush it and have it reconfigured from ovirt?
Can you please delete the chassis via ovn-sbctl chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 while 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by ovn-sbctl show The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.
Thank you Best Regards Konstantinos Betsis
On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.
I don't think we can see anything more from these.
On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02
"da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
#dc01-node01
"4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
#dc02-node01
"f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network database
as
to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis < k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler < dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since this is where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru

Hi Dominic That fixed it. VMs have full connectivity and I don't see any errors on the nodes ovn controller. Thanks for the help and quick responses, I really appreciate it. In summary for future reference: If certificate errors are met need to review: ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip The ovn-remote will state if the OVN connection is using TCP or TLS. We then do: ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-* As to check the ovn northbound and southbound configuration and listening ports and if TCP or TLS is used. If tls is used we must update the nodes with: ovn-nbctl set-ssl "ovn northbound interface certificate key" "ovn northbound interface certificate file" ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl "ovn southbound interface certificate key" "ovn southbound interface certificate file" ovn-sbctl set-connection pssl:6642 The certificates must reside within nodes through the VDSM client. Finally, we check that all tunnels are established and working ok. If we get to a stuck chassis we simply stop the ovn service on the node and delete the chassis from the northbound interface through: ovn-sbctl chassis-del "chassis_ID" Thank you Best Regards Konstantinos Betsis On Tue, Oct 6, 2020 at 11:37 AM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys
Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.
Is there a way i can flush it and have it reconfigured from ovirt?
Can you please delete the chassis via
ovn-sbctl chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6
while 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by ovn-sbctl show
The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.
Thank you Best Regards Konstantinos Betsis
On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.
I don't think we can see anything more from these.
On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
From the configuration I can see only three nodes..... "Encap":{ #dc01-node02
"da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
#dc01-node01
"4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
#dc02-node01
"f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
So I don't understand why the dc01-node02 tries to establish a tunnel with itself.....
Is there a way for ovn to refresh according to Ovirt network
to not affect VM networks?
On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis < k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Sure
I've attached it for easier reference.
On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler < dholler@redhat.com <mailto:dholler@redhat.com>> wrote:
On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
Hi Dominik
The DC01-node02 was formatted and reinstalled and then attached to ovirt environment. Unfortunately we exhibit the same issue. The new DC01-node02 tries to establish geneve tunnels to his own IP.
[root@dc01-node02 ~]# ovs-vsctl show eff2663e-cb10-41b0-93ba-605bb5c7bd78 Bridge br-int fail_mode: secure Port "ovn-95ccb0-0" Interface "ovn-95ccb0-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node01_IP"} Port "ovn-be3abc-0" Interface "ovn-be3abc-0" type: geneve options: {csum="true", key=flow, remote_ip="dc01-node02_IP"} Port "ovn-c4b238-0" Interface "ovn-c4b238-0" type: geneve options: {csum="true", key=flow, remote_ip="dc02-node01_IP"} Port br-int Interface br-int type: internal ovs_version: "2.11.0"
Is there a way to fix this on the Ovirt engine since
database as this is
where the information resides? Something is broken there.
I suspect that there is an inconsistency in the OVN SB DB. Is there a way to share your /var/lib/openvswitch/ovnsb_db.db with us?
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru

On Tue, Oct 6, 2020 at 12:25 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominic
That fixed it.
Thanks for letting us know and your patience.
VMs have full connectivity and I don't see any errors on the nodes ovn controller.
Thanks for the help and quick responses, I really appreciate it.
In summary for future reference:
Thanks for this nice summary, I am sure this will help others in the community.
If certificate errors are met need to review:
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
The ovn-remote will state if the OVN connection is using TCP or TLS.
We then do:
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
As to check the ovn northbound and southbound configuration and listening ports and if TCP or TLS is used.
If tls is used we must update the nodes with:
ovn-nbctl set-ssl "ovn northbound interface certificate key" "ovn northbound interface certificate file" ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl "ovn southbound interface certificate key" "ovn southbound interface certificate file" ovn-sbctl set-connection pssl:6642
The certificates must reside within nodes through the VDSM client.
Finally, we check that all tunnels are established and working ok.
If we get to a stuck chassis we simply stop the ovn service on the node and delete the chassis from the northbound interface through:
ovn-sbctl chassis-del "chassis_ID"
Thank you Best Regards Konstantinos Betsis
On Tue, Oct 6, 2020 at 11:37 AM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys
Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.
Is there a way i can flush it and have it reconfigured from ovirt?
Can you please delete the chassis via
ovn-sbctl chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6
while 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by ovn-sbctl show
The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.
Thank you Best Regards Konstantinos Betsis
On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.
I don't think we can see anything more from these.
On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote: > From the configuration I can see only three nodes..... > "Encap":{ > #dc01-node02 > "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"}, > #dc01-node01 > "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"}, > #dc02-node01 > "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}} > > So I don't understand why the dc01-node02 tries to establish a tunnel > with itself..... > > Is there a way for ovn to refresh according to Ovirt network database as > to not affect VM networks? > > On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis < k.betsis@gmail.com > <mailto:k.betsis@gmail.com>> wrote: > > Sure > > I've attached it for easier reference. > > On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler < dholler@redhat.com > <mailto:dholler@redhat.com>> wrote: > > > > On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis > <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote: > > Hi Dominik > > The DC01-node02 was formatted and reinstalled and then > attached to ovirt environment. > Unfortunately we exhibit the same issue. > The new DC01-node02 tries to establish geneve tunnels to his > own IP. > > [root@dc01-node02 ~]# ovs-vsctl show > eff2663e-cb10-41b0-93ba-605bb5c7bd78 > Bridge br-int > fail_mode: secure > Port "ovn-95ccb0-0" > Interface "ovn-95ccb0-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-node01_IP"} > Port "ovn-be3abc-0" > Interface "ovn-be3abc-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-node02_IP"} > Port "ovn-c4b238-0" > Interface "ovn-c4b238-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc02-node01_IP"} > Port br-int > Interface br-int > type: internal > ovs_version: "2.11.0" > > > Is there a way to fix this on the Ovirt engine since this is > where the information resides? > Something is broken there. > > > I suspect that there is an inconsistency in the OVN SB DB. > Is there a way to share your /var/lib/openvswitch/ovnsb_db.db > with us? > >
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru
participants (3)
-
Dominik Holler
-
Dumitru Ceara
-
Konstantinos Betsis