On Sat, Apr 2, 2016 at 8:14 PM, Brett I. Holcomb <biholcomb(a)l1049h.com> wrote:
On Fri, 2016-04-01 at 21:39 -0400, Brett I. Holcomb wrote:
Two items here.
oVirt version 3.6.4 Fresh install, not an upgrade.
First, I noticed this issue when I did an install on a test machine but I
didn't have the data to present. Because of that and some other posts
dealing with the network issue I kept notes when I installed on my
production system. I'm doing a hosted-engine setup.
As part of the preparation I did the following before installing and
deploying.
* Removed NetworkManager with yum remove NetworkManager
* The NIC that will be used for the oVirt management NIC is connected to a
switch port expecting VLAN 50 so I set up a VLAN50 ifcfg file.
* The IP address of the server, prefix, gateway, and TWO DNS servers were
setup in the ifcfg file. and name resolution worked. I could ping the host
by name as well as the oVirt Engine VM which was in DNS so the name resolved
but obviously nothing would reply. Other servers and workstations could
resolve the host and engine names.
1. On the host I ran hosted-engine --deploy and installed the OS (Centos 7
(1511) on the Engine VM. I rebooted the Engine VM, told the deployment that
the Engine VM was running and it then continued and deployment told me to
install the engine on the Engine VM.
2. I updated the Engine VM via yum update, installed the oVirt
repositories, and ran the engine-setup which completed successfully.
3. I then went back to the host and told it the Engine was setup and at this
point things went bad. The deployment started whining about not being able
to resolve
myenginevm.mydomain.com host, did cleanup, per-termination,
termination, and said the deployment failed and the system was unreliable,
fix it, whine, whine, whine.
4. I tried a ping on
myenginevm.mydomain.com and it failed.
What I found was that when the bridge was created (ifcfg-ovirtmgmt) the DNS
servers were left out! They were in the original NIC ifcfg file but it
appears the deployment didn't bother to bring them over to the bridge ifcfg.
I find this very puzzling since the deployment insists on FQDNs so it should
be smart enough to bring over the DNS server settings and not leave them
out. My /etc/resolv.conf file also had no DNS servers in it.
I added the DNS server to the bridge ifcfg file, did a systemctl restart
network and all is well again. The host can ping the VM!
However, the deployment thinks it failed and I can not restart the Engine
VM. I tried a reboot, made sure the ovirt daemons were running but if I try
and do anything such as hosted-engine vm-start I get "Unable to read
vm.conf, please check ovirt-ha-agent logs".
Second, I think that having the deployment fail simply because it can not
contact the Engine VM is a very huge error/bug/whatever - its silly. The
deployment went well, the VM exists and is running but due to the deployment
messing up the DNS servers it just can't find it. The deployment should
first, handle the name server setup correctly and second fail gracefully..
I rebooted the server but still get the error about not being able to read
vm.conf. At this point I now have to run through the entire deployment
again just because one phase messed up unless there is a way to work around
this. However, in the work that I've done with oVirt I've notice the
deployment is not real robust and when it encounters errors that should
allow it to recover. I suggest that consideration be given to making the
deployment smarter and more robust.
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
More info.
This gets broken during the hosted-engine --deploy first phase (before the
OS is installed on the Engine VM) which makes sense because I assume that's
when the bridge is created.
I added another logical network with a VLAN tag and this broke name
resolution again. I had to do systemctl restart network again and then name
resolution was back.
I'm attempting to use the web portal but it's very/very slow. When I select
the admin portal it can take 5+ minutes before it displays the login page if
it ever does and doesn't time out. Once I get the Admin login it goes
pretty quickly. I'm using Firefox 45.0.1 on Fedora 23. Any reason for
this? From what I see the message about not supporting the browser is
bogus. My host has 64 gig memory, and E2620-v3 processor.
Looks similar to [1]. Adding Simone.
[1]