On Tue, Oct 9, 2018 at 4:54 PM <me@brendanh.com> wrote:
I'ved added a record to the DNS server here:
ovirt-engine.example.com  10.0.0.109

OK, and how the engine VM will get that address?
Are you using DHCP? do you have a DHCP reservation for the MAC address you are using on the engine VM?
Are you configuring it with a static IP?
 

This IP address is on the physical network that the host is on (host is on 10.0.0.171).  I trust this is correct and I should not resolve to a natted IP instead.  I notice that regardless of this record, the name ovirt-engine.example.com resolves to a natted IP: 192.168.124.51 because the ansible script adds an entry to /etc/hosts:
192.168.124.51  ovirt-engine.example.com
While the script is running, if I I can successfully ping ovirt-engine.example.com, it responds on 192.168.124.51.  So as you say: "host can correctly resolve the name of the engine VM", but it's not the DNS record's IP.  If I remove the DNS record and run hosted-engine --deploy, I get error:
[ ERROR ] Host name is not valid: ovirt-engine.example.com did not resolve into an IP address

Anyway, I added back the DNS record and ran hosted-engine --deploy command, it failed at:
[ INFO  ] TASK [Clean /etc/hosts on the host]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml': line 396, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    changed_when: True\n  - name: Clean /etc/hosts on the host\n    ^ here\n"}

To debug, I added tasks to create_target_vm.yml that output the values of local_vm_ip.std_out_lines[0] and FQDN that are used in this task, then ran the usual deploy command again.  They are both localhost:
[ INFO  ] TASK [show local_vm_ip.std_out_lines[0] that will be written to etc hosts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [show FQDN]
[ INFO  ] ok: [localhost]

This time, it gets past [Clean /etc/hosts on the host], but hangs at [ INFO  ] TASK [Check engine VM health] same as before.

This is fine, the bootstrap local VM runs over a natted network then, once ready it will be shutdown and moved to the shared storage. At that point it will be restarted on your management network.
 
  I catted /etc/hosts while it was hanging and it contains:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

The ovirt-engine.example.com has been deleted!  I pinged ovirt-engine.example.com and it now resolves to its IP on the physical network: 10.0.0.109.  So I added back this /etc/hosts entry:
192.168.124.51  ovirt-engine.example.com

Please avoid this.
 

It subsequently errored:
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.167559", "end": "2018-10-09 15:43:41.947274", "rc": 0, "start": "2018-10-09 15:43:41.779715", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct  9 15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct  9 15:43:37 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"host\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c5d76f8b\", \"local_conf_timestamp\": 6810, \"host-ts\": 6810}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_
 on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct  9 15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct  9 15:43:37 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"host\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c5d76f8b\", \"local_conf_timestamp\": 6810, \"host-ts\": 6810}, \"global_maintenance\": false}"]}

How can I check the hosted-engine's IP address to ensure name resolution is correct?

You can connect to that VM with VNC and check the IP there.
 
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SVBXIBLS5TSP7SZROSSE6JD5ICBZLV3E/