Two items here.
oVirt version 3.6.4 Fresh install, not an upgrade.
First, I noticed this issue when I did an install on a test machine but
I didn't have the data to present. Because of that and some other
posts dealing with the network issue I kept notes when I installed on
my production system. I'm doing a hosted-engine setup.
As part of the preparation I did the following before installing and
deploying.
* Removed NetworkManager with yum remove NetworkManager
* The NIC that will be used for the oVirt management NIC is connected
to a switch port expecting VLAN 50 so I set up a VLAN50 ifcfg file.
* The IP address of the server, prefix, gateway, and TWO DNS servers
were setup in the ifcfg file. and name resolution worked. I could ping
the host by name as well as the oVirt Engine VM which was in DNS so the
name resolved but obviously nothing would reply. Other servers and
workstations could resolve the host and engine names.
1. On the host I ran hosted-engine --deploy and installed the OS
(Centos 7 (1511) on the Engine VM. I rebooted the Engine VM, told the
deployment that the Engine VM was running and it then continued and
deployment told me to install the engine on the Engine VM.
2. I updated the Engine VM via yum update, installed the oVirt
repositories, and ran the engine-setup which completed successfully.
3. I then went back to the host and told it the Engine was setup and at
this point things went bad. The deployment started whining about not
being able to resolve
myenginevm.mydomain.com host, did cleanup, per-
termination, termination, and said the deployment failed and the system
was unreliable, fix it, whine, whine, whine.
4. I tried a ping on
myenginevm.mydomain.com and it failed.
What I found was that when the bridge was created (ifcfg-ovirtmgmt) the
DNS servers were left out! They were in the original NIC ifcfg file
but it appears the deployment didn't bother to bring them over to the
bridge ifcfg. I find this very puzzling since the deployment insists
on FQDNs so it should be smart enough to bring over the DNS server
settings and not leave them out. My /etc/resolv.conf file also had no
DNS servers in it.
I added the DNS server to the bridge ifcfg file, did a systemctl
restart network and all is well again. The host can ping the VM!
However, the deployment thinks it failed and I can not restart the
Engine VM. I tried a reboot, made sure the ovirt daemons were running
but if I try and do anything such as hosted-engine vm-start I get
"Unable to read vm.conf, please check ovirt-ha-agent logs".
Second, I think that having the deployment fail simply because it can
not contact the Engine VM is a very huge error/bug/whatever - its
silly. The deployment went well, the VM exists and is running but due
to the deployment messing up the DNS servers it just can't find it.
The deployment should first, handle the name server setup correctly
and second fail gracefully..
I rebooted the server but still get the error about not being able to
read vm.conf. At this point I now have to run through the entire
deployment again just because one phase messed up unless there is a way
to work around this. However, in the work that I've done with oVirt
I've notice the deployment is not real robust and when it encounters
errors that should allow it to recover. I suggest that consideration
be given to making the deployment smarter and more robust.