[ovirt-users] Network issues on install

Yedidyah Bar David didi at redhat.com
Mon Apr 4 09:11:15 UTC 2016


On Sat, Apr 2, 2016 at 8:14 PM, Brett I. Holcomb <biholcomb at l1049h.com> wrote:
>
> On Fri, 2016-04-01 at 21:39 -0400, Brett I. Holcomb wrote:
>
> Two items here.
>
> oVirt version 3.6.4  Fresh install, not an upgrade.
>
>
> First, I noticed this issue when I did an install on a test machine but I
> didn't have the data to present.  Because of that and some other posts
> dealing with the network issue I kept notes when I installed on my
> production system.   I'm doing a hosted-engine setup.
>
> As part of the preparation I did the following before installing and
> deploying.
>
> * Removed NetworkManager with yum remove NetworkManager
> * The NIC that will be used for the oVirt management NIC is connected to a
> switch port expecting VLAN 50 so I set up a VLAN50 ifcfg file.
> * The IP address of the  server, prefix, gateway, and TWO DNS servers were
> setup in the ifcfg file. and name resolution worked.  I could ping the host
> by name as well as the oVirt Engine VM which was in DNS so the name resolved
> but obviously nothing would reply.  Other servers and workstations could
> resolve the host and engine names.
>
> 1.  On the host I ran hosted-engine --deploy and installed the OS (Centos 7
> (1511) on the Engine VM.  I rebooted the Engine VM, told the deployment that
> the Engine VM was running and it then continued and deployment told me to
> install the engine on the Engine VM.
> 2.  I updated the Engine VM via yum update,  installed the oVirt
> repositories, and ran the engine-setup which completed successfully.
> 3. I then went back to the host and told it the Engine was setup and at this
> point things went bad.  The deployment started whining about not being able
> to resolve myenginevm.mydomain.com host, did cleanup, per-termination,
> termination, and said the deployment failed and the system was unreliable,
> fix it,  whine, whine, whine.
> 4.  I tried a ping on myenginevm.mydomain.com and it failed.
>
> What I found was that when the bridge was created (ifcfg-ovirtmgmt) the DNS
> servers were left out!  They were in the original NIC ifcfg file but it
> appears the deployment didn't bother to bring them over to the bridge ifcfg.
> I find this very puzzling since the deployment insists on FQDNs so it should
> be smart enough to bring over the DNS server settings and not leave them
> out.  My /etc/resolv.conf file also had no DNS servers in it.
>
> I added the DNS server to the bridge ifcfg file, did a systemctl restart
> network and all is well again.  The host can ping the VM!
>
> However, the deployment thinks it failed and I can not restart the Engine
> VM.  I tried a reboot, made sure the ovirt daemons were running but if I try
> and do anything such as hosted-engine vm-start I get  "Unable to read
> vm.conf, please check ovirt-ha-agent logs".
>
> Second, I think that having the deployment fail simply because it can not
> contact the Engine VM is a very huge error/bug/whatever - its silly.  The
> deployment went well, the VM exists and is running but due to the deployment
> messing up the DNS servers it just can't find it.  The deployment should
> first, handle the name server setup correctly and second fail gracefully..
>
> I rebooted the server but still get the error about not being able to read
> vm.conf.  At this point I now have to run through the entire deployment
> again just because one phase messed up unless there is a way to work around
> this.  However, in the work that I've done with oVirt I've notice the
> deployment is not real robust and when it encounters errors that should
> allow it to recover.  I suggest that consideration be given to making the
> deployment smarter and more robust.
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> More info.
>
> This gets broken during the hosted-engine --deploy first phase (before the
> OS is installed on the Engine VM) which makes sense because I assume that's
> when the bridge is created.
>
> I added another logical network with a VLAN tag and this broke name
> resolution again.  I had to do systemctl restart network again and then name
> resolution was back.
>
> I'm attempting to use the web portal but it's very/very slow.  When I select
> the admin portal it can take 5+ minutes before it displays the login page if
> it ever does and doesn't time out.  Once I get the Admin login it goes
> pretty quickly.  I'm using Firefox 45.0.1 on Fedora 23.  Any reason for
> this?  From what I see the message about not supporting the browser is
> bogus.  My host has 64 gig memory, and E2620-v3 processor.

Looks similar to [1]. Adding Simone.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1160423
-- 
Didi



More information about the Users mailing list