On Tue, Apr 23, 2024 at 6:57 AM Levi Wilbert <stop.play.rwd@gmail.com> wrote:
I had this same issue on oVirt Node 4.5.5, however, I did not see the same code in /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml on the hosted engine.

On my version 4.5.5, I have two blocks: one installs ovs and ensures Open vSwitch is started, the second block installs the ovirt-provider-ovn-driver and configures OVN (as well as some other steps).

Hi,

I would like to clarify what is happening.
 

For the first block, my when statement shows as:
  when:
    - cluster_switch == "ovs" or (ovn_central is defined)

For the second block, it shows:
  when:
    - ovn_central is defined

In Ansible, inside a when: statement, multiple lines beginning with "-" are equivalent to AND conditions. For example:
when:
  -  this == true
  - that == true

This would be equivalent to when: (this == true) and (that == true).

This condition is actually the problem, if you take a look at the previous one, the key thing is "ovn_central | ipaddr", this expects a valid ip address otherwise the condition will be false. However when the condition is only "ovn_central is defined" it will be true also for empty string.
 

I didn't want to toy with the control logic, but I realized that this was a non-issue. The error in this occurs in the Configuring OVN step, which in my configure.yml is near the end of the second block. The when statements are working fine, otherwise it wouldn't be executing those steps.

I dug in further, and the issue comes about when the installer attempts to run:
vdsm-tool config-ovn <IP-Central> <FQDN> !

I tried this on my own system:
[root@b-drone11 ~]# vdsm-tool ovn-config 10.99.8.31 b-drone11.arcc.uwyo.edu
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/vdsm/tool/ovn_config.py", line 117, in get_network
    return networks[net_name]
KeyError: 'b-drone11.arcc.uwyo.edu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 195, in main
    return tool_command[cmd]["command"](*args)
  File "/usr/lib/python3.9/site-packages/vdsm/tool/ovn_config.py", line 63, in ovn_config
    ip_address = get_ip_addr(get_network(network_caps(), net_name))
  File "/usr/lib/python3.9/site-packages/vdsm/tool/ovn_config.py", line 119, in get_network
    raise NetworkNotFoundError(net_name)
vdsm.tool.ovn_config.NetworkNotFoundError: b-drone11.arcc.uwyo.edu


It's the same error as in the host-deploy logs. If you dig in a bit more, you'll find in the ovn_config.py script referred to by the above output, there's a function get_networks() that is throwing the error:
def get_network(net_caps, net_name):
    networks = net_caps['networks']
    try:
        return networks[net_name]
    except KeyError:
        raise NetworkNotFoundError(net_name)

Digging in EVEN further, if you look at where the function is called and how the "net_name" variable comes in, you'll find that it's only run when a FQDN is given as an argument to vdsm-tool ovn-config instead of an IP:

    if is_ipaddress(args[2]):
        ip_address = args[2]
    else:
        net_name = args[2]
        ip_address = get_ip_addr(get_network(network_caps(), net_name))
        if not ip_address:
            raise IpAddressNotFoundError(net_name)


By looking above this block you can see the comment below. Which states that the second argument is IP or network name and FQDN comes only after that. So that is tied to the ansible condition that we are getting the second parameter as an empty string.

    """
    ovn-config IP-central [tunneling-IP|tunneling-network] host-fqdn
    Configures the ovn-controller on the host.

    Parameters:
    IP-central - the IP of the engine (the host where OVN central is located)
    tunneling-IP - the local IP which is to be used for OVN tunneling
    tunneling-network - the vdsm network meant to be used for OVN tunneling
    host-fqdn - FQDN that will be set as system-id for OvS (optional)
    """



Now, this is as far I got. As far as WHY the get_network() function isn't working, I haven't looked further into the ovirt code and can't say. But it appears somehow this function fails when attempting to resolve FQDN's. Which brings me to the WORKAROUND!


So the get_network() isn't really buggy in this sense, it expects a network name and not FQDN.
 

Since the error lies in translating a FQDN to an IP, if you instead provide an IP address in the first place, it completely bypasses the buggy get_networks() function, and lets you add a host.
 
This is actually not a workaround, but proper initialization of how it is supposed to be done.


So, when you run the host deploy, if you add the host using it's IP address vs. its FQDN, it goes through fine, and I've tested this on my cluster and it worked beautifully.

The only caveat is you can't add with the FQDN, but for now, our cluster is up and working.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MXRNYITWWLR4RUWXRS7YIHSDCXG5PMK4/

With that being said, the problem is somewhere in the engine in a way how it propagates "ovn_central" and why it ends up being an empty string.

Hopefully this helps.
Best regards,
Ales
 
--

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA

amusil@redhat.com