Hi Dominic 

That fixed it.
VMs have full connectivity and I don't see any errors on the nodes ovn controller.

Thanks for the help and quick responses, I really appreciate it.

In summary for future reference:
If certificate errors are met need to review:
ovs-vsctl --no-wait get open . external-ids:ovn-remote
ovs-vsctl --no-wait get open . external-ids:ovn-encap-type
ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
The ovn-remote will state if the OVN connection is using TCP or TLS.

We then do:
ovn-nbctl get-ssl 
ovn-nbctl get-connection 
ovn-sbctl get-ssl
ovn-sbctl get-connection
ls -l /etc/pki/ovirt-engine/keys/ovn-*

As to check the ovn northbound and southbound configuration and listening ports and if TCP or TLS is used.

If tls is used we must update the nodes with:
ovn-nbctl set-ssl "ovn northbound interface certificate key" "ovn northbound interface certificate file"
ovn-nbctl set-connection pssl:6641
ovn-sbctl set-ssl "ovn southbound interface certificate key" "ovn southbound interface certificate file"
ovn-sbctl set-connection pssl:6642

The certificates must reside within nodes through the VDSM client.

Finally, we check that all tunnels are established and working ok.

If we get to a stuck chassis we simply stop the ovn service on the node and delete the chassis from the northbound interface through:
ovn-sbctl  chassis-del "chassis_ID"

Thank you
Best Regards
Konstantinos Betsis


On Tue, Oct 6, 2020 at 11:37 AM Dominik Holler <dholler@redhat.com> wrote:


On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys

Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.

Is there a way i can flush it and have it reconfigured from ovirt?


Can you please delete the chassis via

ovn-sbctl  chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6

while  32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by
ovn-sbctl  show

The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.

 
Thank you
Best Regards
Konstantinos Betsis

On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.

I don't think we can see anything more from these.



On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru

I've seen that as well.....
I've deleted the dc01-node2 (ams03-hypersec02) from ovirt.
I've also issued ovs-vsctl emer-reset.

But ovn-sbctl list chassis still depicts the node twice.
The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....

How, can we fix this?

On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
> From the configuration I can see only three nodes.....
> "Encap":{
> #dc01-node02
> "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
> #dc01-node01
> "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
> #dc02-node01
> "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
>
> So I don't understand why the dc01-node02 tries to establish a tunnel
> with itself.....
>
> Is there a way for ovn to refresh according to Ovirt network database as
> to not affect VM networks?
>
> On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com
> <mailto:k.betsis@gmail.com>> wrote:
>
>     Sure
>
>     I've attached it for easier reference.
>
>     On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com
>     <mailto:dholler@redhat.com>> wrote:
>
>
>
>         On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis
>         <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
>
>             Hi Dominik
>
>             The DC01-node02 was formatted and reinstalled and then
>             attached to ovirt environment.
>             Unfortunately we exhibit the same issue.
>             The new DC01-node02 tries to establish geneve tunnels to his
>             own IP. 
>
>                 [root@dc01-node02 ~]# ovs-vsctl show
>                 eff2663e-cb10-41b0-93ba-605bb5c7bd78
>                     Bridge br-int
>                         fail_mode: secure
>                         Port "ovn-95ccb0-0"
>                             Interface "ovn-95ccb0-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc01-node01_IP"}
>                         Port "ovn-be3abc-0"
>                             Interface "ovn-be3abc-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc01-node02_IP"}
>                         Port "ovn-c4b238-0"
>                             Interface "ovn-c4b238-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc02-node01_IP"}
>                         Port br-int
>                             Interface br-int
>                                 type: internal
>                     ovs_version: "2.11.0"
>
>
>             Is there a way to fix this on the Ovirt engine since this is
>             where the information resides?
>             Something is broken there.
>
>
>         I suspect that there is an inconsistency in the OVN SB DB.
>         Is there a way to share your /var/lib/openvswitch/ovnsb_db.db
>         with us?
>          
>

Hi Konstantinos,

One of the things I noticed in the SB DB you attached is that two of the
chassis records have the same hostname:

$ ovn-sbctl list chassis | grep ams03-hypersec02
hostname            : ams03-hypersec02
hostname            : ams03-hypersec02

This shouldn't be a major issue but shows a potential misconfiguration
on the nodes. Could you please double check the hostname configuration
of the nodes?

Would it also be possible to attach the openvswitch conf.db from the
three nodes? It should be in /var/lib/openvswitch/conf.db

Thanks,
Dumitru