
Lance, Well I installed the new kernel module and it cleared up a lot of the errors I was seeing in the log, but what I notice is that I still can't ping instances between hosts. I'm starting to wonder am I missing something fundamental here? I don't see anything in the ovs-vswitchd.log to show tunnel? I do show in the kernel log on reload of the module: [1056295.308707] openvswitch: module verification failed: signature and/or required key missing - tainting kernel [1056295.311034] openvswitch: Open vSwitch switching datapath 2.6.90 [1056295.311145] openvswitch: LISP tunneling driver [1056295.311147] openvswitch: GRE over IPv4 tunneling driver [1056295.311153] openvswitch: Geneve tunneling driver [1056295.311164] openvswitch: VxLAN tunneling driver [1056295.311166] openvswitch: STT tunneling driver [node2] [root@ovirt-node2 openvswitch]# cat ovs-vswitchd.log 2016-12-06T04:22:23.192Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2016-12-06T04:22:23.194Z|00002|ovs_numa|INFO|Discovered 16 CPU cores on NUMA node 0 2016-12-06T04:22:23.194Z|00003|ovs_numa|INFO|Discovered 16 CPU cores on NUMA node 1 2016-12-06T04:22:23.194Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 32 CPU cores 2016-12-06T04:22:23.194Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-06T04:22:23.195Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2016-12-06T04:22:23.197Z|00007|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation 2016-12-06T04:22:23.197Z|00008|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1 2016-12-06T04:22:23.197Z|00009|ofproto_dpif|INFO|system@ovs-system: Datapath supports truncate action 2016-12-06T04:22:23.197Z|00010|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids 2016-12-06T04:22:23.197Z|00011|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state 2016-12-06T04:22:23.197Z|00012|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone 2016-12-06T04:22:23.197Z|00013|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark 2016-12-06T04:22:23.197Z|00014|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2016-12-06T04:22:23.197Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat 2016-12-06T04:22:23.339Z|00001|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 0 2016-12-06T04:22:23.339Z|00016|bridge|INFO|bridge br-int: added interface vnet0 on port 5 2016-12-06T04:22:23.339Z|00017|bridge|INFO|bridge br-int: added interface br-int on port 65534 2016-12-06T04:22:23.339Z|00018|bridge|INFO|bridge br-int: using datapath ID 000016d6e0b66442 2016-12-06T04:22:23.339Z|00019|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2016-12-06T04:22:23.340Z|00020|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.6.90 2016-12-06T04:22:32.437Z|00021|bridge|INFO|bridge br-int: added interface ovn-c0dc09-0 on port 6 2016-12-06T04:22:32.437Z|00022|bridge|INFO|bridge br-int: added interface ovn-252778-0 on port 7 2016-12-06T04:22:33.342Z|00023|memory|INFO|281400 kB peak resident set size after 10.2 seconds 2016-12-06T04:22:33.342Z|00024|memory|INFO|handlers:23 ofconns:2 ports:4 revalidators:9 rules:79 2016-12-06T04:22:42.440Z|00025|connmgr|INFO|br-int<->unix: 76 flow_mods 10 s ago (75 adds, 1 deletes) [root@ovirt-node2 openvswitch]# cat ovn-controller.log 2016-12-06T04:22:32.398Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log 2016-12-06T04:22:32.400Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-06T04:22:32.400Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2016-12-06T04:22:32.402Z|00004|reconnect|INFO|tcp:172.20.192.77:6642: connecting... 2016-12-06T04:22:32.403Z|00005|reconnect|INFO|tcp:172.20.192.77:6642: connected 2016-12-06T04:22:32.406Z|00006|binding|INFO|Claiming lport 56432d2b-a96d-4ac7-b0e9-3450a006e1d4 for this chassis. 2016-12-06T04:22:32.406Z|00007|binding|INFO|Claiming 00:1a:4a:16:01:64 dynamic 2016-12-06T04:22:32.407Z|00008|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2016-12-06T04:22:32.407Z|00009|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2016-12-06T04:22:32.407Z|00010|pinctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2016-12-06T04:22:32.407Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2016-12-06T04:22:32.408Z|00012|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2016-12-06T04:22:32.408Z|00013|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2016-12-06T04:22:32.440Z|00014|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:32.441Z|00015|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:32.441Z|00016|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:37.408Z|00017|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:42.408Z|00018|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:47.409Z|00019|ofctrl|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:22:47.409Z|00020|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:22:57.411Z|00021|ofctrl|INFO|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:22:57.411Z|00022|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:23:12.413Z|00023|ofctrl|INFO|Dropped 4 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:23:12.413Z|00024|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:23:22.415Z|00025|ofctrl|INFO|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:23:22.415Z|00026|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:23:37.417Z|00027|ofctrl|INFO|Dropped 5 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:23:37.417Z|00028|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:23:47.419Z|00029|ofctrl|INFO|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:23:47.419Z|00030|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) 2016-12-06T04:23:57.421Z|00031|ofctrl|INFO|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2016-12-06T04:23:57.421Z|00032|ofctrl|INFO|dropping duplicate flow: table_id=32, priority=150, reg10=0x2/0x2, actions=resubmit(,33) [root@ovirt-node2 openvswitch]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no DEV-NOC 8000.0cc47a1ef306 no bond0 DEV-VM-NET 8000.0cc47a1ef306 no bond0.700 ovirtmgmt 8000.0cc47a08b3c2 no enp7s0f0 -- Devin Acosta Red Hat Certified Architect, LinuxStack devin@linuxguru.co On Mon, Dec 5, 2016 at 2:34 PM, Lance Richardson <lrichard@redhat.com> wrote:
From: "Devin Acosta" <devin@pabstatencio.com> To: "Lance Richardson" <lrichard@redhat.com> Cc: "Marcin Mirecki" <mmirecki@redhat.com>, "users" <Users@ovirt.org> Sent: Monday, December 5, 2016 4:17:35 PM Subject: Re: [ovirt-users] oVIRT 4 / OVN / Communication issues of instances between nodes.
Lance,
I found some interesting logs, we have (3) oVIRT nodes.
We are running: CentOS Linux release 7.2.1511 (Core) Linux hostname 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
<snip>
2016-12-05T20:47:56.774Z|00021|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.3) (xid=0x17): OFPBAC_BAD_TYPE
This (generally unintelligible message usually indicates that the kernel openvswitch module doesn't support conntrack.
<snip>
2016-12-05T20:35:04.345Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2016-12-05T20:35:04.347Z|00002|ovs_numa|INFO|Discovered 16 CPU cores on NUMA node 0 2016-12-05T20:35:04.347Z|00003|ovs_numa|INFO|Discovered 16 CPU cores on NUMA node 1 2016-12-05T20:35:04.347Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes
and 32
CPU cores 2016-12-05T20:35:04.348Z|00005|reconnect|INFO|unix:/ var/run/openvswitch/db.sock: connecting... 2016-12-05T20:35:04.348Z|00006|reconnect|INFO|unix:/ var/run/openvswitch/db.sock: connected 2016-12-05T20:35:04.350Z|00007|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation 2016-12-05T20:35:04.350Z|00008|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1 2016-12-05T20:35:04.350Z|00009|ofproto_dpif|INFO|system@ovs-system: Datapath does not support truncate action 2016-12-05T20:35:04.350Z|00010|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids 2016-12-05T20:35:04.350Z|00011|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_state 2016-12-05T20:35:04.350Z|00012|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_zone 2016-12-05T20:35:04.350Z|00013|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_mark 2016-12-05T20:35:04.350Z|00014|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_label 2016-12-05T20:35:04.350Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_state_nat
OK, "Datapath does not support ct_*" confirms that the kernel openvswitch module doesn't support the conntrack features needed by OVN.
Most likely the loaded module is the stock CentOS one, you can build the out-of-tree kernel module RPM from the same source tree where you built the other OVS/OVN RPMs via:
make rpm-fedora-kmod
This should leave an RPM named something like:
openvswitch-kmod-2.6.90-1.el7.centos.x86_64.rpm
Install that and reboot and things should be working better.
Regards,
Lance
Your help is greatly appreciated!
Devin
On Mon, Dec 5, 2016 at 12:31 PM, Lance Richardson <lrichard@redhat.com> wrote:
From: "Devin Acosta" <devin@pabstatencio.com> To: "Marcin Mirecki" <mmirecki@redhat.com> Cc: "users" <Users@ovirt.org> Sent: Monday, December 5, 2016 12:11:46 PM Subject: Re: [ovirt-users] oVIRT 4 / OVN / Communication issues of instances between nodes.
Marcin,
Also I noticed in your original post it mentions:
ip link - the result should include a link called genev_sys_ ...
I noticed that on my hosts I don't see any links with name:
genev_sys_ ??
Could this be a problem?
lo: enp4s0f0: enp4s0f1: enp7s0f0: enp7s0f1: bond0: DEV-NOC: ovirtmgmt: bond0.700@bond0: DEV-VM-NET: bond0.705@bond0: ;vdsmdummy;: vnet0: vnet1: vnet2: vnet3: vnet4: ovs-system: br-int: vnet5: vnet6:
Hi Devin,
What distribution and kernel version are you using?
One thing you could check is whether the vport_geneve kernel module is being loaded, e.g. you should see something like:
$ lsmod | grep vport vport_geneve 12560 1 openvswitch 246755 5 vport_geneve
If vport_geneve is not loaded, you could "sudo modprobe vport_geneve" to make sure it's available and can be loaded.
The first 100 lines or so of ovs-vswitchd.log might have some useful information about where things are going wrong.
It does sound as though there is some issue with geneve tunnels, which would certainly explain issues with inter-node traffic.
Regards,
Lance
--
Devin Acosta Red Hat Certified Architect, LinuxStack 602-354-1220 || devin@linuxguru.co