I'ved added a record to the DNS server here:
ovirt-engine.example.com 10.0.0.109
This IP address is on the physical network that the host is on (host is on 10.0.0.171). I
trust this is correct and I should not resolve to a natted IP instead. I notice that
regardless of this record, the name
ovirt-engine.example.com resolves to a natted IP:
192.168.124.51 because the ansible script adds an entry to /etc/hosts:
192.168.124.51
ovirt-engine.example.com
While the script is running, if I I can successfully ping
ovirt-engine.example.com, it
responds on 192.168.124.51. So as you say: "host can correctly resolve the name of
the engine VM", but it's not the DNS record's IP. If I remove the DNS record
and run hosted-engine --deploy, I get error:
[ ERROR ] Host name is not valid:
ovirt-engine.example.com did not resolve into an IP
address
Anyway, I added back the DNS record and ran hosted-engine --deploy command, it failed at:
[ INFO ] TASK [Clean /etc/hosts on the host]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an
option with an undefined variable. The error was: list object has no element 0\n\nThe
error appears to have been in
'/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml': line 396,
column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe
offending line appears to be:\n\n changed_when: True\n - name: Clean /etc/hosts on the
host\n ^ here\n"}
To debug, I added tasks to create_target_vm.yml that output the values of
local_vm_ip.std_out_lines[0] and FQDN that are used in this task, then ran the usual
deploy command again. They are both localhost:
[ INFO ] TASK [show local_vm_ip.std_out_lines[0] that will be written to etc hosts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [show FQDN]
[ INFO ] ok: [localhost]
This time, it gets past [Clean /etc/hosts on the host], but hangs at [ INFO ] TASK [Check
engine VM health] same as before. I catted /etc/hosts while it was hanging and it
contains:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
The
ovirt-engine.example.com has been deleted! I pinged
ovirt-engine.example.com and it
now resolves to its IP on the physical network: 10.0.0.109. So I added back this
/etc/hosts entry:
192.168.124.51
ovirt-engine.example.com
It subsequently errored:
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120,
"changed": true, "cmd": ["hosted-engine",
"--vm-status", "--json"], "delta":
"0:00:00.167559", "end": "2018-10-09 15:43:41.947274",
"rc": 0, "start": "2018-10-09 15:43:41.779715",
"stderr": "", "stderr_lines": [], "stdout":
"{\"1\": {\"conf_on_shared_storage\": true,
\"live-data\": true, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct 9
15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct 9 15:43:37
2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\",
\"hostname\": \"host\", \"host-id\": 1,
\"engine-status\": {\"reason\": \"failed liveliness check\",
\"health\": \"bad\", \"vm\": \"up\",
\"detail\": \"Up\"}, \"score\": 3400, \"stopped\":
false, \"maintenance\": false, \"crc32\": \"c5d76f8b\",
\"local_conf_timestamp\": 6810, \"host-ts\": 6810},
\"global_maintenance\": false}", "stdout_lines":
["{\"1\": {\"conf_
on_shared_storage\": true, \"live-data\": true, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=6810 (Tue Oct 9
15:43:36 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=6810 (Tue Oct 9 15:43:37
2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\",
\"hostname\": \"host\", \"host-id\": 1,
\"engine-status\": {\"reason\": \"failed liveliness check\",
\"health\": \"bad\", \"vm\": \"up\",
\"detail\": \"Up\"}, \"score\": 3400, \"stopped\":
false, \"maintenance\": false, \"crc32\": \"c5d76f8b\",
\"local_conf_timestamp\": 6810, \"host-ts\": 6810},
\"global_maintenance\": false}"]}
How can I check the hosted-engine's IP address to ensure name resolution is correct?