On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys

Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.

Is there a way i can flush it and have it reconfigured from ovirt?


Can you please delete the chassis via

ovn-sbctl  chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6

while  32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by
ovn-sbctl  show

The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.

 
Thank you
Best Regards
Konstantinos Betsis

On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time.
2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.

I don't think we can see anything more from these.



On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru

I've seen that as well.....
I've deleted the dc01-node2 (ams03-hypersec02) from ovirt.
I've also issued ovs-vsctl emer-reset.

But ovn-sbctl list chassis still depicts the node twice.
The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....

How, can we fix this?

On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote:
> From the configuration I can see only three nodes.....
> "Encap":{
> #dc01-node02
> "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"},
> #dc01-node01
> "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"},
> #dc02-node01
> "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}}
>
> So I don't understand why the dc01-node02 tries to establish a tunnel
> with itself.....
>
> Is there a way for ovn to refresh according to Ovirt network database as
> to not affect VM networks?
>
> On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis <k.betsis@gmail.com
> <mailto:k.betsis@gmail.com>> wrote:
>
>     Sure
>
>     I've attached it for easier reference.
>
>     On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler <dholler@redhat.com
>     <mailto:dholler@redhat.com>> wrote:
>
>
>
>         On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis
>         <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote:
>
>             Hi Dominik
>
>             The DC01-node02 was formatted and reinstalled and then
>             attached to ovirt environment.
>             Unfortunately we exhibit the same issue.
>             The new DC01-node02 tries to establish geneve tunnels to his
>             own IP. 
>
>                 [root@dc01-node02 ~]# ovs-vsctl show
>                 eff2663e-cb10-41b0-93ba-605bb5c7bd78
>                     Bridge br-int
>                         fail_mode: secure
>                         Port "ovn-95ccb0-0"
>                             Interface "ovn-95ccb0-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc01-node01_IP"}
>                         Port "ovn-be3abc-0"
>                             Interface "ovn-be3abc-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc01-node02_IP"}
>                         Port "ovn-c4b238-0"
>                             Interface "ovn-c4b238-0"
>                                 type: geneve
>                                 options: {csum="true", key=flow,
>                 remote_ip="dc02-node01_IP"}
>                         Port br-int
>                             Interface br-int
>                                 type: internal
>                     ovs_version: "2.11.0"
>
>
>             Is there a way to fix this on the Ovirt engine since this is
>             where the information resides?
>             Something is broken there.
>
>
>         I suspect that there is an inconsistency in the OVN SB DB.
>         Is there a way to share your /var/lib/openvswitch/ovnsb_db.db
>         with us?
>          
>

Hi Konstantinos,

One of the things I noticed in the SB DB you attached is that two of the
chassis records have the same hostname:

$ ovn-sbctl list chassis | grep ams03-hypersec02
hostname            : ams03-hypersec02
hostname            : ams03-hypersec02

This shouldn't be a major issue but shows a potential misconfiguration
on the nodes. Could you please double check the hostname configuration
of the nodes?

Would it also be possible to attach the openvswitch conf.db from the
three nodes? It should be in /var/lib/openvswitch/conf.db

Thanks,
Dumitru