Hi Simone,
Here is the value of local_vm_ip during the bootstrap VM phase:
TASK [Create local VM]
TASK [Get local VM IP]:
2018-10-14 11:50:40,831+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 local_vm_ip: {'stderr_lines': [], u'changed': True, u'end': u'2018-10-14 11:50:39.860036', u'stdout': u'192.168.124.16', u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:31:d3:9e | awk '{ print $5 }' | cut -f1 -d'/'", 'failed': False, 'attempts': 2, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.058237', 'stdout_lines': [u'192.168.124.16'], u'start': u'2018-10-14 11:50:39.801799'}
And this is its value at the moment the setup crashes (at the usual place – “Clean /etc/hosts on the host”):
2018-10-14 12:58:23,373+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 local_vm_ip: {'stderr_lines': [], u'changed': True, u'end': u'2018-10-14 12:58:22.193727', u'stdout': u'', u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:31:d3:9e | awk '{ print $5 }' | cut -f1 -d'/'", 'failed': False, u'delta': u'0:00:00.069294', u'stderr': u'', u'rc': 0, 'stdout_lines': [], u'start': u'2018-10-14 12:58:22.124433'}
stdout_lines has disappeared.
After the error, I ran command:
[root@host04 ansible]# virsh -r net-dhcp-leases default
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
[root@host04 ansible]#
Is the problem that I have assigned a static IP (using cloudinitVMStaticCIDR), yet the command ansible is running to obtain the IP address (virsh -r net-dhcp-leases) is only getting IPs obtained by DHCP, not static? Perhaps the bootstrap VM always obtains an DHCP IP on the natted network, so this command works at that earlier stage. Your documentation doesn’t mention a net-xxx command to obtain a list of static IPs.
Not sure if its helpful but here’s the output of a net-dumpxml command:
[root@host04 ansible]# virsh -r net-dumpxml default
<network>
<name>default</name>
<uuid>91fe2eee-c36e-4d08-9928-bff2d036aca5</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr0' stp='on' delay='0'/>
<mac address='52:54:00:22:1b:4b'/>
<ip address='192.168.124.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.124.2' end='192.168.124.254'/>
</dhcp>
</ip>
</network>
[root@host04 ansible]#
Many thanks,
Brendan
From: Brendan Holmes <me@brendanh.com>
Sent: 13 October 2018 23:09
To: 'Simone Tiraboschi' <stirabos@redhat.com>
Cc: 'users' <users@ovirt.org>
Subject: RE: [ovirt-users] Re: Diary of hosted engine install woes
Hi Simone,
“restart from scratch deploying with a static IP”. Okay, I have reinstalled the host using oVirt Node from scratch. I am assigning a static IP using the attached answers.conf which contains:
OVEHOSTED_VM/cloudinitVMStaticCIDR=str:10.0.0.109/24
create_target_vm.yml and all other RedHat code is as-shipped. I’m getting the same error:
[ INFO ] TASK [Copy /etc/hosts back to the Hosted Engine VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Copy local VM disk to shared storage]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Clean /etc/hosts on the host]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml': line 396, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n changed_when: True\n - name: Clean /etc/hosts on the host\n ^ here\n"}
Any ideas? How can I debug this failure to assign an IP (undefined variable)?
Many thanks,
Brendan
From: Simone Tiraboschi <stirabos@redhat.com>
Sent: 10 October 2018 12:06
To: B Holmes <me@brendanh.com>
Cc: users <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Diary of hosted engine install woes
On Tue, Oct 9, 2018 at 11:50 PM Brendan Holmes <me@brendanh.com> wrote:
Hi Simone,
Yes the MAC address in answers.conf: OVEHOSTED_VM/vmMACAddr=
is added as a reservation to the DHCP server, so in theory 10.0.0.109 should be assigned.
However perhaps DHCP is not working. I have just changed to a static IP instead:
OVEHOSTED_VM/cloudinitVMStaticCIDR=str:10.0.0.109/24
(let me know if this isn’t the correct way)
My host fails to get an IP automatically from this DHCP server, so it is quite possible engine’s DHCP has been failing too. Each time the host boots, I must type dhclient in order to receive an IP address. Anyway, after changing this and re-running hosted-engine –deploy, failed due to:
[ INFO ] TASK [Copy local VM disk to shared storage]
[ INFO ] changed: [localhost]
[ INFO ] TASK [show local_vm_ip.std_out_lines[0] that will be written to etc hosts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [show FQDN]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Clean /etc/hosts on the host]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml': line 400, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n debug: var=FQDN\n - name: Clean /etc/hosts on the host\n ^ here\n"}
I have just tried deploying using the webui, same error. I suspect the “undefined variable” is local_vm_ip.std_out_lines[0]. My new debug task that tries to output this is:
- name: show local_vm_ip.std_out_lines[0] that will be written to etc hosts
debug: var=local_vm_ip.stdout_lines[0]
You can see the output of this above. I think I was mistaken to suggest the value of this is localhost. Localhost is just the machine this task ran on. I don’t think list local_vm_ip.std_out_lines is defined. Any more ideas?
The issue is on a task that isn't part of the code we are shipping.
I can just suggest to simply reinstall the rpm to get rid of any modification and restart from scratch deploying with a static IP if your DHCP server is not properly working.
Many thanks
From: Simone Tiraboschi <stirabos@redhat.com>
Sent: 09 October 2018 16:51
To: B Holmes <me@brendanh.com>
Cc: users <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Diary of hosted engine install woes
On Tue, Oct 9, 2018 at 4:54 PM <me@brendanh.com> wrote:
I'ved added a record to the DNS server here:
ovirt-engine.example.com 10.0.0.109
OK, and how the engine VM will get that address?
Are you using DHCP? do you have a DHCP reservation for the MAC address you are using on the engine VM?
Are you configuring it with a static IP?
This IP address is on the physical network that the host is on (host is on 10.0.0.171). I trust this is correct and I should not resolve to a natted IP instead. I notice that regardless of this record, the name ovirt-engine.example.com resolves to a natted IP: 192.168.124.51 because the ansible script adds an entry to /etc/hosts:
192.168.124.51 ovirt-engine.example.com
While the script is running, if I I can successfully ping ovirt-engine.example.com, it responds on 192.168.124.51. So as you say: "host can correctly resolve the name of the engine VM", but it's not the DNS record's IP. If I remove the DNS record and run hosted-engine --deploy, I get error:
[ ERROR ] Host name is not valid: ovirt-engine.example.com did not resolve into an IP address
Anyway, I added back the DNS record and ran hosted-engine --deploy command, it failed at:
[ INFO ] TASK [Clean /etc/hosts on the host]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml': line 396, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n changed_when: True\n - name: Clean /etc/hosts on the host\n ^ here\n"}
To debug, I added tasks to create_target_vm.yml that output the values of local_vm_ip.std_out_lines[0] and FQDN that are used in this task, then ran the usual deploy command again. They are both localhost:
[ INFO ] TASK [show local_vm_ip.std_out_lines[0] that will be written to etc hosts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [show FQDN]
[ INFO ] ok: [localhost]
This time, it gets past [Clean /etc/hosts on the host], but hangs at [ INFO ] TASK [Check engine VM health] same as before.
This is fine, the bootstrap local VM runs over a natted network then, once ready it will be shutdown and moved to the shared storage. At that point it will be restarted on your management network.
I catted /etc/hosts while it was hanging and it contains:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
The ovirt-engine.example.com has been deleted! I pinged ovirt-engine.example.com and it now resolves to its IP on the physical network: 10.0.0.109. So I added back this /etc/hosts entry:
192.168.124.51 ovirt-engine.example.com
Please avoid this.
It subsequently errored:
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.167559", "end": "2018-10-09 15:43:41.947274", "rc": 0, "start": "2018-10-09 15:43:41.779715", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct 9 15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct 9 15:43:37 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"host\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c5d76f8b\", \"local_conf_timestamp\": 6810, \"host-ts\": 6810}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_
on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct 9 15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct 9 15:43:37 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"host\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c5d76f8b\", \"local_conf_timestamp\": 6810, \"host-ts\": 6810}, \"global_maintenance\": false}"]}
How can I check the hosted-engine's IP address to ensure name resolution is correct?
You can connect to that VM with VNC and check the IP there.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SVBXIBLS5TSP7SZROSSE6JD5ICBZLV3E/