node zero Networking

Hi all, I spent quite some time trying to deploy node zero, trying to look at https://bugzilla.redhat.com/show_bug.cgi?id=1528253 , and always fail around the end, with: ERROR fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "ip rule list | grep ovirtmgmt | sed s/\\\\[.*\\\\]\\ //g | awk '{ print $9 }'", "delta": "0:00:00.008292", "end": "2017-12-25 11:51:39.146800", "rc": 0, "start": "2017-12-25 11:51:39.138508", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ERROR Failed to execute stage 'Closing up': Failed executing ansible-playbook ERROR Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy I use the following setup: I have a libvirt vm on my laptop, with a single virtual nic eth0. This nic is connected to a bridge called intbr on my laptop. This bridge has no access to outside, and VMs on it have no default route. There is a local dhcp+dns serving this bridge, using the address range 192.168.3.0/24. The vm serves as a nested-kvm hosted-engine host. eth0 gets a static IP address 192.168.3.42 from dhcpd. There is also (didn't check who/what exactly creates it, I think it's libvirt's default) a bridge there called virbr0. virbr0 has IP address 192.168.122.1/24 . When I deploy HE, the engine machine gets also a single virtual nic, which is connected to virbr0, and gets an IP address in that range (192.168.122.85, currently). deploy fails when running the task: - name: Get ovirtmgmt route table id shell: ip rule list | grep ovirtmgmt | sed s/\\[.*\\]\ //g | awk '{ print $9 }' register: ovirtmgmt_table_id until: ovirtmgmt_table_id.stdout_lines|length >= 1 retries: 50 delay: 10 changed_when: True The output of 'ip rule list' is: 0: from all lookup local 32766: from all lookup main 32767: from all lookup default So does not include 'ovirtmgmt'. I do have: # brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.06d1bd012412 no eth0 virbr0 8000.525400012499 yes virbr0-nic vnet0 And: # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000 link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff 18: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 19: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000 link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff 20: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN qlen 1000 link/ether fe:d1:bd:01:24:04 brd ff:ff:ff:ff:ff:ff inet6 fe80::fcd1:bdff:fe01:2404/64 scope link valid_lft forever preferred_lft forever 21: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 1e:1b:84:c2:51:ff brd ff:ff:ff:ff:ff:ff 22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff inet 192.168.3.42/24 brd 192.168.3.255 scope global dynamic ovirtmgmt valid_lft 70927sec preferred_lft 70927sec inet6 fe80::4d1:bdff:fe01:2412/64 scope link valid_lft forever preferred_lft forever (And of course told deploy that I want to use eth0). Questions: 1. Did this already work for anyone at all? If so, can you please share details? Specifically, how was networking configured? 2. It might be that my problems are due to not having a (default) route for ovirtmgmt bridge/network. If so, then I consider this a bug, but do not mind configuring one for now. 3. All of the relevant section of the playbook has a comment preceding it: # all of the next is a workaroud for the network issue, vdsm installation breaks the routing and it needs to be fixed # once we'll fix the host installation it could be removed Do we have specific details/bug/whatever about the problem we are working around? Perhaps it's already solved and I can try to remove this part? 4. Both now (with (3.) being worked around) and eventually (when it's (what?) fixed), how should this work? Should the engine local vm indeed start connected to virbr0, and then move to ovirtmgmt? Or only the new engine vm (residing on the shared storage) should be in ovirtmgmt? 5. In particular, what should I supply for the "engine fqdn", and what should it resolve to, both in the beginning an eventually? Thanks, -- Didi

On Mon, Dec 25, 2017 at 4:14 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
I spent quite some time trying to deploy node zero, trying to look at https://bugzilla.redhat.com/show_bug.cgi?id=1528253 , and always fail around the end, with:
ERROR fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "ip rule list | grep ovirtmgmt | sed s/\\\\[.*\\\\]\\ //g | awk '{ print $9 }'", "delta": "0:00:00.008292", "end": "2017-12-25 11:51:39.146800", "rc": 0, "start": "2017-12-25 11:51:39.138508", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ERROR Failed to execute stage 'Closing up': Failed executing ansible-playbook ERROR Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
I use the following setup:
I have a libvirt vm on my laptop, with a single virtual nic eth0.
This nic is connected to a bridge called intbr on my laptop. This bridge has no access to outside, and VMs on it have no default route. There is a local dhcp+dns serving this bridge, using the address range 192.168.3.0/24.
The vm serves as a nested-kvm hosted-engine host.
eth0 gets a static IP address 192.168.3.42 from dhcpd.
There is also (didn't check who/what exactly creates it, I think it's libvirt's default) a bridge there called virbr0. virbr0 has IP address 192.168.122.1/24 .
When I deploy HE, the engine machine gets also a single virtual nic, which is connected to virbr0, and gets an IP address in that range (192.168.122.85, currently).
deploy fails when running the task:
- name: Get ovirtmgmt route table id shell: ip rule list | grep ovirtmgmt | sed s/\\[.*\\]\ //g | awk '{ print $9 }' register: ovirtmgmt_table_id until: ovirtmgmt_table_id.stdout_lines|length >= 1 retries: 50 delay: 10 changed_when: True
The output of 'ip rule list' is:
0: from all lookup local 32766: from all lookup main 32767: from all lookup default
So does not include 'ovirtmgmt'.
I do have:
# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.06d1bd012412 no eth0 virbr0 8000.525400012499 yes virbr0-nic vnet0 And:
# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000 link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff 18: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 19: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000 link/ether 52:54:00:01:24:99 brd ff:ff:ff:ff:ff:ff 20: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN qlen 1000 link/ether fe:d1:bd:01:24:04 brd ff:ff:ff:ff:ff:ff inet6 fe80::fcd1:bdff:fe01:2404/64 scope link valid_lft forever preferred_lft forever 21: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 1e:1b:84:c2:51:ff brd ff:ff:ff:ff:ff:ff 22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 06:d1:bd:01:24:12 brd ff:ff:ff:ff:ff:ff inet 192.168.3.42/24 brd 192.168.3.255 scope global dynamic ovirtmgmt valid_lft 70927sec preferred_lft 70927sec inet6 fe80::4d1:bdff:fe01:2412/64 scope link valid_lft forever preferred_lft forever
(And of course told deploy that I want to use eth0).
Questions:
1. Did this already work for anyone at all? If so, can you please share details? Specifically, how was networking configured?
2. It might be that my problems are due to not having a (default) route for ovirtmgmt bridge/network. If so, then I consider this a bug, but do not mind configuring one for now.
For the record, adding a default router indeed solved this issue.
3. All of the relevant section of the playbook has a comment preceding it:
# all of the next is a workaroud for the network issue, vdsm installation breaks the routing and it needs to be fixed # once we'll fix the host installation it could be removed
Do we have specific details/bug/whatever about the problem we are working around? Perhaps it's already solved and I can try to remove this part?
4. Both now (with (3.) being worked around) and eventually (when it's (what?) fixed), how should this work? Should the engine local vm indeed start connected to virbr0, and then move to ovirtmgmt? Or only the new engine vm (residing on the shared storage) should be in ovirtmgmt?
5. In particular, what should I supply for the "engine fqdn", and what should it resolve to, both in the beginning an eventually?
Thanks, -- Didi
-- Didi
participants (1)
-
Yedidyah Bar David