On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <k.betsis(a)gmail.com>
wrote:
Hi Dominik
When these commands are used on the ovirt-engine host the output is the
one depicted in your email.
For your reference see also below:
[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-nbctl get-connection
ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-sbctl get-connection
read-write role="" ptcp:6642
^^^ the line above points to the problem: ovn-central is configured to use
plain TCP without ssl.
engine-setup usually configures ovn-central to use SSL. That the files
/etc/pki/ovirt-engine/keys/ovn-* exist, shows,
that engine-setup was triggered correctly. Looks like the ovn db was
dropped somehow, this should not happen.
This can be fixed manually by executing the following commands on engine's
machine:
ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
/etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem
ovn-nbctl set-connection pssl:6641
ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
/etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem
ovn-sbctl set-connection pssl:6642
The /var/log/openvswitch/ovn-controller.log on the hosts should tell that
br-int.mgmt is connected now.
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
-rw-------. 1 root root 2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.p12
-rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
-rw-------. 1 root root 2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.p12
When i try the above commands on the node hosts the following happens:
ovn-nbctl get-ssl / get-connection
ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection
failed (No such file or directory)
The above i believe is expected since no northbound connections should be
established from the host nodes.
ovn-sbctl get-ssl /get-connection
The output is stuck till i terminate it.
Yes, the ovn-* commands works only on engine's machine, which has the role
ovn-central.
On the hosts, there is only the ovn-controller, which connects the
ovn southbound to openvswitch on the host.
For the requested logs the below are found in the
ovsdb-server-sb.log
2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146:
connection dropped (Protocol error)
2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188:
connection dropped (Protocol error)
2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044:
connection dropped (Protocol error)
2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148:
connection dropped (Protocol error)
2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in
last 12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in
last 12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190:
received SSL data on JSON-RPC channel
2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190:
connection dropped (Protocol error)
2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046:
connection dropped (Protocol error)
2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150:
connection dropped (Protocol error)
2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192:
connection dropped (Protocol error)
2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in
last 8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in
last 8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048:
received SSL data on JSON-RPC channel
2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048:
connection dropped (Protocol error)
2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152:
connection dropped (Protocol error)
2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194:
connection dropped (Protocol error)
2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050:
connection dropped (Protocol error)
2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154:
connection dropped (Protocol error)
2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in
last 12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in
last 12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196:
received SSL data on JSON-RPC channel
2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196:
connection dropped (Protocol error)
2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052:
connection dropped (Protocol error)
How can we fix these SSL errors?
I addressed this above.
I thought vdsm did the certificate provisioning on the host nodes as
to
communicate to the engine host node.
Yes, this seems to work in your scenario, just the SSL configuration on the
ovn-central was lost.
On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler
<dholler(a)redhat.com> wrote:
> Looks still like the ovn-controller on the host has problems
> communicating with ovn-southbound.
>
> Are there any hints in /var/log/openvswitch/*.log,
> especially in /var/log/openvswitch/ovsdb-server-sb.log ?
>
> Can you please check the output of
>
> ovn-nbctl get-ssl
> ovn-nbctl get-connection
> ovn-sbctl get-ssl
> ovn-sbctl get-connection
> ls -l /etc/pki/ovirt-engine/keys/ovn-*
>
> it should be similar to
>
> [root@ovirt-43 ~]# ovn-nbctl get-ssl
> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
> CA Certificate: /etc/pki/ovirt-engine/ca.pem
> Bootstrap: false
> [root@ovirt-43 ~]# ovn-nbctl get-connection
> pssl:6641:[::]
> [root@ovirt-43 ~]# ovn-sbctl get-ssl
> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
> CA Certificate: /etc/pki/ovirt-engine/ca.pem
> Bootstrap: false
> [root@ovirt-43 ~]# ovn-sbctl get-connection
> read-write role="" pssl:6642:[::]
> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019
> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
> -rw-------. 1 root root 2709 Oct 14 2019
> /etc/pki/ovirt-engine/keys/ovn-ndb.p12
> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019
> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
> -rw-------. 1 root root 2709 Oct 14 2019
> /etc/pki/ovirt-engine/keys/ovn-sdb.p12
>
>
>
>
> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <k.betsis(a)gmail.com>
> wrote:
>
>> I did a restart of the ovn-controller, this is the output of the
>> ovn-controller.log
>>
>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file
>> /var/log/openvswitch/ovn-controller.log
>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
>> connecting...
>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
>> connected
>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force
>> recompute.
>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connecting...
>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force
>> recompute.
>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected
>> SSL connection close
>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connection attempt failed (Protocol error)
>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connecting...
>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected
>> SSL connection close
>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connection attempt failed (Protocol error)
>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> waiting 2 seconds before reconnect
>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connecting...
>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected
>> SSL connection close
>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connection attempt failed (Protocol error)
>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> waiting 4 seconds before reconnect
>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connecting...
>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected
>> SSL connection close
>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> connection attempt failed (Protocol error)
>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>> continuing to reconnect in the background but suppressing further logging
>>
>>
>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP
>> OVIRTMGMT_NETWORK_DC
>> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can
>> redo it if you wan.
>>
>> After the restart of the ovn-controller the OVIRT ENGINE still shows
>> only two geneve connections one with DC01-host02 and DC02-host01.
>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
>> hostname: "dc02-host01"
>> Encap geneve
>> ip: "DC02-host01_IP"
>> options: {csum="true"}
>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
>> hostname: "DC01-host02"
>> Encap geneve
>> ip: "DC01-host02"
>> options: {csum="true"}
>>
>> I've re-done the vdsm-tool command and nothing changed.... again....with
>> the same errors as the systemctl restart ovn-controller
>>
>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <dholler(a)redhat.com>
>> wrote:
>>
>>> Please include ovirt-users list in your reply, to share the knowledge
>>> and experience with the community!
>>>
>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <
>>> k.betsis(a)gmail.com> wrote:
>>>
>>>> Ok below the output per node and DC
>>>> DC01
>>>> node01
>>>>
>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-remote
>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-type
>>>> geneve
>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-ip
>>>>
>>>> "*OVIRTMGMT_IP_DC01-NODE01*"
>>>>
>>>> node02
>>>>
>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-remote
>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-type
>>>> geneve
>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-ip
>>>>
>>>> "*OVIRTMGMT_IP_DC01-NODE02*"
>>>>
>>>> DC02
>>>> node01
>>>>
>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-remote
>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-type
>>>> geneve
>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>> external-ids:ovn-encap-ip
>>>>
>>>> "*OVIRTMGMT_IP_DC02-NODE01*"
>>>>
>>>>
>>> Looks good.
>>>
>>>
>>>> DC01 node01 and node02 share the same VM networks and VMs deployed on
>>>> top of them cannot talk to VM on the other hypervisor.
>>>>
>>>
>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe
>>> restarting ovn-controller creates more helpful log messages?
>>>
>>> You can also try restart the ovn configuration on all hosts by executing
>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP
>>> on each host, this would trigger
>>>
>>>
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/se...
>>> internally.
>>>
>>>
>>>> So I would expect to see the same output for node01 to have a geneve
>>>> tunnel to node02 and vice versa.
>>>>
>>>>
>>> Me too.
>>>
>>>
>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler
<dholler(a)redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis <
>>>>> k.betsis(a)gmail.com> wrote:
>>>>>
>>>>>> Hi Dominik
>>>>>>
>>>>>> OVN is selected as the default network provider on the clusters
and
>>>>>> the hosts.
>>>>>>
>>>>>>
>>>>> sounds good.
>>>>> This configuration is required already during the host is added to
>>>>> oVirt Engine, because OVN is configured during this step.
>>>>>
>>>>>
>>>>>> The "ovn-sbctl show" works on the ovirt engine and
shows only two
>>>>>> hosts, 1 per DC.
>>>>>>
>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
>>>>>> hostname: "dc01-node02"
>>>>>> Encap geneve
>>>>>> ip: "X.X.X.X"
>>>>>> options: {csum="true"}
>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
>>>>>> hostname: "dc02-node1"
>>>>>> Encap geneve
>>>>>> ip: "A.A.A.A"
>>>>>> options: {csum="true"}
>>>>>>
>>>>>>
>>>>>> The new node is not listed (dc01-node1).
>>>>>>
>>>>>> When executed on the nodes the same command (ovn-sbctl show)
>>>>>> times-out on all nodes.....
>>>>>>
>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists
on
>>>>>> all logs
>>>>>>
>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect:
>>>>>> unexpected SSL connection close
>>>>>>
>>>>>>
>>>>>>
>>>>> Can you please compare the output of
>>>>>
>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote
>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type
>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
>>>>>
>>>>> of the working hosts, e.g. dc01-node02, and the failing host
>>>>> dc01-node1?
>>>>> This should point us the relevant difference in the configuration.
>>>>>
>>>>> Please include ovirt-users list in your replay, to share
>>>>> the knowledge and experience with the community.
>>>>>
>>>>>
>>>>>
>>>>>> Thank you
>>>>>> Best regards
>>>>>> Konstantinos Betsis
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler
<dholler(a)redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B
<k.betsis(a)gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all
>>>>>>>>
>>>>>>>> We have a small installation based on OVIRT 4.3.
>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG
Node
>>>>>>>> image.
>>>>>>>>
>>>>>>>> The environment was stable till an upgrade took place a
couple of
>>>>>>>> months ago.
>>>>>>>> As such we had to re-install one of the Centos 7 node and
start
>>>>>>>> from scratch.
>>>>>>>>
>>>>>>>
>>>>>>> To trigger the automatic configuration of the host, it is
required
>>>>>>> to configure ovirt-provider-ovn as the default network
provider for the
>>>>>>> cluster before adding the host to oVirt.
>>>>>>>
>>>>>>>
>>>>>>>> Even though the installation completed successfully and
VMs are
>>>>>>>> created, the following are not working as expected:
>>>>>>>> 1. ovn geneve tunnels are not established with the other
Centos 7
>>>>>>>> node in the cluster.
>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no
geneve
>>>>>>>> tunnel is established when "ovn-sbctl show" is
issued on the engine.
>>>>>>>>
>>>>>>>
>>>>>>> Does "ovn-sbctl show" list the hosts?
>>>>>>>
>>>>>>>
>>>>>>>> 3. no flows are shown on the engine on port 6642 for the
ovs db.
>>>>>>>>
>>>>>>>> Does anyone have any experience on how to troubleshoot
OVN on
>>>>>>>> ovirt?
>>>>>>>>
>>>>>>>>
>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should
contain a
>>>>>>> helpful hint.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thank you
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>>> oVirt Code of Conduct:
>>>>>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>> List Archives:
>>>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBVGLQJBWJF...
>>>>>>>>
>>>>>>>