
On Tue, Oct 6, 2020 at 12:25 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dominic
That fixed it.
Thanks for letting us know and your patience.
VMs have full connectivity and I don't see any errors on the nodes ovn controller.
Thanks for the help and quick responses, I really appreciate it.
In summary for future reference:
Thanks for this nice summary, I am sure this will help others in the community.
If certificate errors are met need to review:
ovs-vsctl --no-wait get open . external-ids:ovn-remote ovs-vsctl --no-wait get open . external-ids:ovn-encap-type ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
The ovn-remote will state if the OVN connection is using TCP or TLS.
We then do:
ovn-nbctl get-ssl ovn-nbctl get-connection ovn-sbctl get-ssl ovn-sbctl get-connection ls -l /etc/pki/ovirt-engine/keys/ovn-*
As to check the ovn northbound and southbound configuration and listening ports and if TCP or TLS is used.
If tls is used we must update the nodes with:
ovn-nbctl set-ssl "ovn northbound interface certificate key" "ovn northbound interface certificate file" ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl "ovn southbound interface certificate key" "ovn southbound interface certificate file" ovn-sbctl set-connection pssl:6642
The certificates must reside within nodes through the VDSM client.
Finally, we check that all tunnels are established and working ok.
If we get to a stuck chassis we simply stop the ovn service on the node and delete the chassis from the northbound interface through:
ovn-sbctl chassis-del "chassis_ID"
Thank you Best Regards Konstantinos Betsis
On Tue, Oct 6, 2020 at 11:37 AM Dominik Holler <dholler@redhat.com> wrote:
On Tue, Oct 6, 2020 at 10:31 AM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi guys
Sorry to disturb you but i am pretty much stuck at this point with the ovn southbound interface.
Is there a way i can flush it and have it reconfigured from ovirt?
Can you please delete the chassis via
ovn-sbctl chassis-del 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6
while 32cd0eb4-d763-4036-bbc9-a4d3a4013ee6 should be replaced with the id of the suspicious chassis show by ovn-sbctl show
The ovn-controller will add the chassis again in a few seconds, but I hope that this would remove the inconsistency in the db.
Thank you Best Regards Konstantinos Betsis
On Thu, Oct 1, 2020 at 6:52 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Regarding the ovn-controller logs....
2020-10-01T15:51:03.156Z|14143|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.220Z|14144|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.284Z|14145|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.347Z|14146|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.411Z|14147|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.474Z|14148|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.538Z|14149|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.601Z|14150|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.664Z|14151|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:03.727Z|14152|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.792Z|14153|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.855Z|14154|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.919Z|14155|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:08.982Z|14156|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.046Z|14157|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.109Z|14158|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.173Z|14159|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.236Z|14160|main|INFO|OVNSB commit failed, force recompute next time. 2020-10-01T15:51:09.299Z|14161|main|INFO|OVNSB commit failed, force recompute next time.
I don't think we can see anything more from these.
On Thu, Oct 1, 2020 at 6:12 PM Konstantinos Betsis <k.betsis@gmail.com> wrote:
Hi Dimitru
I've seen that as well..... I've deleted the dc01-node2 (ams03-hypersec02) from ovirt. I've also issued ovs-vsctl emer-reset.
But ovn-sbctl list chassis still depicts the node twice. The ovs-sbctl show still depicts 3 geneve tunnels from dc01-node2....
How, can we fix this?
On Thu, Oct 1, 2020 at 9:59 AM Dumitru Ceara <dceara@redhat.com> wrote:
On 9/30/20 3:41 PM, Konstantinos Betsis wrote: > From the configuration I can see only three nodes..... > "Encap":{ > #dc01-node02 > "da8fb1dc-f832-4d62-a01d-2e5aef018c8d":{"ip":"10.137.156.56","chassis_name":"be3abcc9-7358-4040-a37b-8d8a782f239c","options":["map",[["csum","true"]]],"type":"geneve"}, > #dc01-node01 > "4808bd8f-7e46-4f29-9a96-046bb580f0c5":{"ip":"10.137.156.55","chassis_name":"95ccb04a-3a08-4a62-8bc0-b8a7a42956f8","options":["map",[["csum","true"]]],"type":"geneve"}, > #dc02-node01 > "f20b33ae-5a6b-456c-b9cb-2e4d8b54d8be":{"ip":"192.168.121.164","chassis_name":"c4b23834-aec7-4bf8-8be7-aa94a50a6144","options":["map",[["csum","true"]]],"type":"geneve"}} > > So I don't understand why the dc01-node02 tries to establish a tunnel > with itself..... > > Is there a way for ovn to refresh according to Ovirt network database as > to not affect VM networks? > > On Wed, Sep 30, 2020 at 2:33 PM Konstantinos Betsis < k.betsis@gmail.com > <mailto:k.betsis@gmail.com>> wrote: > > Sure > > I've attached it for easier reference. > > On Wed, Sep 30, 2020 at 2:21 PM Dominik Holler < dholler@redhat.com > <mailto:dholler@redhat.com>> wrote: > > > > On Wed, Sep 30, 2020 at 1:16 PM Konstantinos Betsis > <k.betsis@gmail.com <mailto:k.betsis@gmail.com>> wrote: > > Hi Dominik > > The DC01-node02 was formatted and reinstalled and then > attached to ovirt environment. > Unfortunately we exhibit the same issue. > The new DC01-node02 tries to establish geneve tunnels to his > own IP. > > [root@dc01-node02 ~]# ovs-vsctl show > eff2663e-cb10-41b0-93ba-605bb5c7bd78 > Bridge br-int > fail_mode: secure > Port "ovn-95ccb0-0" > Interface "ovn-95ccb0-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-node01_IP"} > Port "ovn-be3abc-0" > Interface "ovn-be3abc-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc01-node02_IP"} > Port "ovn-c4b238-0" > Interface "ovn-c4b238-0" > type: geneve > options: {csum="true", key=flow, > remote_ip="dc02-node01_IP"} > Port br-int > Interface br-int > type: internal > ovs_version: "2.11.0" > > > Is there a way to fix this on the Ovirt engine since this is > where the information resides? > Something is broken there. > > > I suspect that there is an inconsistency in the OVN SB DB. > Is there a way to share your /var/lib/openvswitch/ovnsb_db.db > with us? > >
Hi Konstantinos,
One of the things I noticed in the SB DB you attached is that two of the chassis records have the same hostname:
$ ovn-sbctl list chassis | grep ams03-hypersec02 hostname : ams03-hypersec02 hostname : ams03-hypersec02
This shouldn't be a major issue but shows a potential misconfiguration on the nodes. Could you please double check the hostname configuration of the nodes?
Would it also be possible to attach the openvswitch conf.db from the three nodes? It should be in /var/lib/openvswitch/conf.db
Thanks, Dumitru