[ovirt-devel] node zero Networking

Mon Dec 25 14:14:32 UTC 2017

Hi all,

I spent quite some time trying to deploy node zero, trying to look at
https://bugzilla.redhat.com/show_bug.cgi?id=1528253 , and always fail
around the end, with:

ERROR fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true,
"cmd": "ip rule list | grep ovirtmgmt | sed s/\\\\[.*\\\\]\\ //g | awk
'{ print $9 }'", "delta": "0:00:00.008292", "end": "2017-12-25
11:51:39.146800", "rc": 0, "start": "2017-12-25 11:51:39.138508",
"stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
ERROR Failed to execute stage 'Closing up': Failed executing ansible-playbook
ERROR Hosted Engine deployment failed: this system is not reliable,
please check the issue,fix and redeploy

I use the following setup:

I have a libvirt vm on my laptop, with a single virtual nic eth0.

This nic is connected to a bridge called intbr on my laptop. This
bridge has no access to outside, and VMs on it have no default route.
There is a local dhcp+dns serving this bridge, using the address range
192.168.3.0/24.

The vm serves as a nested-kvm hosted-engine host.

eth0 gets a static IP address 192.168.3.42 from dhcpd.

There is also (didn't check who/what exactly creates it, I think it's
libvirt's default) a bridge there called virbr0. virbr0 has IP address
192.168.122.1/24 .

When I deploy HE, the engine machine gets also a single virtual nic,
which is connected to virbr0, and gets an IP address in that range
(192.168.122.85, currently).

deploy fails when running the task:

  - name: Get ovirtmgmt route table id
    shell: ip rule list | grep ovirtmgmt | sed s/\\[.*\\]\ //g | awk
'{ print $9 }'
    register: ovirtmgmt_table_id
    until: ovirtmgmt_table_id.stdout_lines|length >= 1
    retries: 50
    delay: 10
    changed_when: True

The output of 'ip rule list' is:

0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

So does not include 'ovirtmgmt'.

I do have:

# brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
ovirtmgmt               8000.06d1bd012412       no              eth0
virbr0          8000.525400012499       yes             virbr0-nic
                                                        vnet0
And:

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
master ovirtmgmt state UP qlen 1000
    link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff
18: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP qlen 1000
    link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
19: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
virbr0 state DOWN qlen 1000
    link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff
20: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
master virbr0 state UNKNOWN qlen 1000
    link/ether fe:d1:bd:01:24:04 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcd1:bdff:fe01:2404/64 scope link
       valid_lft forever preferred_lft forever
21: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 1e:1b:84:c2:51:ff brd ff:ff:ff:ff:ff:ff
22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP qlen 1000
    link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.42/24 brd 192.168.3.255 scope global dynamic ovirtmgmt
       valid_lft 70927sec preferred_lft 70927sec
    inet6 fe80::4d1:bdff:fe01:2412/64 scope link
       valid_lft forever preferred_lft forever

(And of course told deploy that I want to use eth0).

Questions:

1. Did this already work for anyone at all? If so, can you please
share details? Specifically, how was networking configured?

2. It might be that my problems are due to not having a (default)
route for ovirtmgmt bridge/network. If so, then I consider this a bug,
but do not mind configuring one for now.

3. All of the relevant section of the playbook has a comment preceding it:

  # all of the next is a workaroud for the network issue, vdsm
installation breaks the routing and it needs to be fixed
  # once we'll fix the host installation it could be removed

Do we have specific details/bug/whatever about the problem we are
working around? Perhaps it's already solved and I can try to remove
this part?

4. Both now (with (3.) being worked around) and eventually (when it's
(what?) fixed), how should this work? Should the engine local vm
indeed start connected to virbr0, and then move to ovirtmgmt? Or only
the new engine vm (residing on the shared storage) should be in
ovirtmgmt?

5. In particular, what should I supply for the "engine fqdn", and what
should it resolve to, both in the beginning an eventually?

Thanks,
-- 
Didi