Hi Dominik
When these commands are used on the ovirt-engine host the output is the one
depicted in your email.
For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-nbctl get-connection
ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-sbctl get-connection
read-write role="" ptcp:6642
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
-rw-------. 1 root root 2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.p12
-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
-rw-------. 1 root root 2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens:
ovn-nbctl get-ssl / get-connection
ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection
failed (No such file or directory)
The above i believe is expected since no northbound connections should be
established from the host nodes.
ovn-sbctl get-ssl /get-connection
The output is stuck till i terminate it.
For the requested logs the below are found in the ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146:
connection dropped (Protocol error)
2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188:
connection dropped (Protocol error)
2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044:
connection dropped (Protocol error)
2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148:
connection dropped (Protocol error)
2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190:
received SSL data on JSON-RPC channel
2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190:
connection dropped (Protocol error)
2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046:
connection dropped (Protocol error)
2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150:
connection dropped (Protocol error)
2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192:
connection dropped (Protocol error)
2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last
8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last
8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048:
received SSL data on JSON-RPC channel
2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048:
connection dropped (Protocol error)
2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152:
connection dropped (Protocol error)
2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194:
connection dropped (Protocol error)
2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050:
connection dropped (Protocol error)
2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154:
connection dropped (Protocol error)
2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196:
received SSL data on JSON-RPC channel
2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196:
connection dropped (Protocol error)
2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052:
connection dropped (Protocol error)
How can we fix these SSL errors?
I thought vdsm did the certificate provisioning on the host nodes as to
communicate to the engine host node.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <dholler(a)redhat.com> wrote:
Looks still like the ovn-controller on the host has problems
communicating
with ovn-southbound.
Are there any hints in /var/log/openvswitch/*.log,
especially in /var/log/openvswitch/ovsdb-server-sb.log ?
Can you please check the output of
ovn-nbctl get-ssl
ovn-nbctl get-connection
ovn-sbctl get-ssl
ovn-sbctl get-connection
ls -l /etc/pki/ovirt-engine/keys/ovn-*
it should be similar to
[root@ovirt-43 ~]# ovn-nbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ovirt-43 ~]# ovn-nbctl get-connection
pssl:6641:[::]
[root@ovirt-43 ~]# ovn-sbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ovirt-43 ~]# ovn-sbctl get-connection
read-write role="" pssl:6642:[::]
[root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
-rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019
/etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
-rw-------. 1 root root 2709 Oct 14 2019
/etc/pki/ovirt-engine/keys/ovn-ndb.p12
-rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019
/etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
-rw-------. 1 root root 2709 Oct 14 2019
/etc/pki/ovirt-engine/keys/ovn-sdb.p12
On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis(a)gmail.com>
wrote:
> I did a restart of the ovn-controller, this is the output of the
> ovn-controller.log
>
> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file
> /var/log/openvswitch/ovn-controller.log
> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
> connecting...
> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
> connected
> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force
> recompute.
> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connecting...
> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force
> recompute.
> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected
> SSL connection close
> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connection attempt failed (Protocol error)
> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connecting...
> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected
> SSL connection close
> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connection attempt failed (Protocol error)
> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> waiting 2 seconds before reconnect
> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connecting...
> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected
> SSL connection close
> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connection attempt failed (Protocol error)
> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> waiting 4 seconds before reconnect
> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connecting...
> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected
> SSL connection close
> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> connection attempt failed (Protocol error)
> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
> continuing to reconnect in the background but suppressing further logging
>
>
> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP
> OVIRTMGMT_NETWORK_DC
> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can
> redo it if you wan.
>
> After the restart of the ovn-controller the OVIRT ENGINE still shows only
> two geneve connections one with DC01-host02 and DC02-host01.
> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
> hostname: "dc02-host01"
> Encap geneve
> ip: "DC02-host01_IP"
> options: {csum="true"}
> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
> hostname: "DC01-host02"
> Encap geneve
> ip: "DC01-host02"
> options: {csum="true"}
>
> I've re-done the vdsm-tool command and nothing changed.... again....with
> the same errors as the systemctl restart ovn-controller
>
> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler(a)redhat.com>
> wrote:
>
>> Please include ovirt-users list in your reply, to share the knowledge
>> and experience with the community!
>>
>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <k.betsis(a)gmail.com>
>> wrote:
>>
>>> Ok below the output per node and DC
>>> DC01
>>> node01
>>>
>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-remote
>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-type
>>> geneve
>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-ip
>>>
>>> "*OVIRTMGMT_IP_DC01-NODE01*"
>>>
>>> node02
>>>
>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-remote
>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-type
>>> geneve
>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-ip
>>>
>>> "*OVIRTMGMT_IP_DC01-NODE02*"
>>>
>>> DC02
>>> node01
>>>
>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-remote
>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-type
>>> geneve
>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>> external-ids:ovn-encap-ip
>>>
>>> "*OVIRTMGMT_IP_DC02-NODE01*"
>>>
>>>
>> Looks good.
>>
>>
>>> DC01 node01 and node02 share the same VM networks and VMs deployed on
>>> top of them cannot talk to VM on the other hypervisor.
>>>
>>
>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe
>> restarting ovn-controller creates more helpful log messages?
>>
>> You can also try restart the ovn configuration on all hosts by executing
>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP
>> on each host, this would trigger
>>
>>
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/se...
>> internally.
>>
>>
>>> So I would expect to see the same output for node01 to have a geneve
>>> tunnel to node02 and vice versa.
>>>
>>>
>> Me too.
>>
>>
>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <dholler(a)redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis <
>>>> k.betsis(a)gmail.com> wrote:
>>>>
>>>>> Hi Dominik
>>>>>
>>>>> OVN is selected as the default network provider on the clusters and
>>>>> the hosts.
>>>>>
>>>>>
>>>> sounds good.
>>>> This configuration is required already during the host is added to
>>>> oVirt Engine, because OVN is configured during this step.
>>>>
>>>>
>>>>> The "ovn-sbctl show" works on the ovirt engine and shows
only two
>>>>> hosts, 1 per DC.
>>>>>
>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
>>>>> hostname: "dc01-node02"
>>>>> Encap geneve
>>>>> ip: "X.X.X.X"
>>>>> options: {csum="true"}
>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
>>>>> hostname: "dc02-node1"
>>>>> Encap geneve
>>>>> ip: "A.A.A.A"
>>>>> options: {csum="true"}
>>>>>
>>>>>
>>>>> The new node is not listed (dc01-node1).
>>>>>
>>>>> When executed on the nodes the same command (ovn-sbctl show)
>>>>> times-out on all nodes.....
>>>>>
>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists on
>>>>> all logs
>>>>>
>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect:
>>>>> unexpected SSL connection close
>>>>>
>>>>>
>>>>>
>>>> Can you please compare the output of
>>>>
>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote
>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type
>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
>>>>
>>>> of the working hosts, e.g. dc01-node02, and the failing host
>>>> dc01-node1?
>>>> This should point us the relevant difference in the configuration.
>>>>
>>>> Please include ovirt-users list in your replay, to share the knowledge
>>>> and experience with the community.
>>>>
>>>>
>>>>
>>>>> Thank you
>>>>> Best regards
>>>>> Konstantinos Betsis
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler
<dholler(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B
<k.betsis(a)gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> We have a small installation based on OVIRT 4.3.
>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node
image.
>>>>>>>
>>>>>>> The environment was stable till an upgrade took place a
couple of
>>>>>>> months ago.
>>>>>>> As such we had to re-install one of the Centos 7 node and
start
>>>>>>> from scratch.
>>>>>>>
>>>>>>
>>>>>> To trigger the automatic configuration of the host, it is
required
>>>>>> to configure ovirt-provider-ovn as the default network provider
for the
>>>>>> cluster before adding the host to oVirt.
>>>>>>
>>>>>>
>>>>>>> Even though the installation completed successfully and VMs
are
>>>>>>> created, the following are not working as expected:
>>>>>>> 1. ovn geneve tunnels are not established with the other
Centos 7
>>>>>>> node in the cluster.
>>>>>>> 2. Centos 7 node is configured by ovirt engine however no
geneve
>>>>>>> tunnel is established when "ovn-sbctl show" is
issued on the engine.
>>>>>>>
>>>>>>
>>>>>> Does "ovn-sbctl show" list the hosts?
>>>>>>
>>>>>>
>>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs
db.
>>>>>>>
>>>>>>> Does anyone have any experience on how to troubleshoot OVN on
ovirt?
>>>>>>>
>>>>>>>
>>>>>> /var/log/openvswitch/ovncontroller.log on the host should contain
a
>>>>>> helpful hint.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thank you
>>>>>>> _______________________________________________
>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>> oVirt Code of Conduct:
>>>>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> List Archives:
>>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF...
>>>>>>>
>>>>>>