[ovirt-users] Network issues on install
Brett I. Holcomb
biholcomb at l1049h.com
Mon Apr 4 17:08:44 UTC 2016
On Mon, 2016-04-04 at 11:29 +0200, Simone Tiraboschi wrote:
> On Mon, Apr 4, 2016 at 11:11 AM, Yedidyah Bar David <didi at redhat.com>
> wrote:
> > On Sat, Apr 2, 2016 at 8:14 PM, Brett I. Holcomb
> > com> wrote:
> > >
> > > On Fri, 2016-04-01 at 21:39 -0400, Brett I. Holcomb wrote:
> > >
> > > Two items here.
> > >
> > > oVirt version 3.6.4 Fresh install, not an upgrade.
> > >
> > >
> > > First, I noticed this issue when I did an install on a test
> > > machine but I
> > > didn't have the data to present. Because of that and some other
> > > posts
> > > dealing with the network issue I kept notes when I installed on
> > > my
> > > production system. I'm doing a hosted-engine setup.
> > >
> > > As part of the preparation I did the following before installing
> > > and
> > > deploying.
> > >
> > > * Removed NetworkManager with yum remove NetworkManager
> > > * The NIC that will be used for the oVirt management NIC is
> > > connected to a
> > > switch port expecting VLAN 50 so I set up a VLAN50 ifcfg file.
> > > * The IP address of the server, prefix, gateway, and TWO DNS
> > > servers were
> > > setup in the ifcfg file. and name resolution worked. I could
> > > ping the host
> > > by name as well as the oVirt Engine VM which was in DNS so the
> > > name resolved
> > > but obviously nothing would reply. Other servers and
> > > workstations could
> > > resolve the host and engine names.
>
> The issue is just here:
> explicit static DNS configuration under
> /etc/sysconfig/network-scripts/ifcfg-* is currently not allowed by
> VDSM (see https://bugzilla.redhat.com/show_bug.cgi?id=1160667 ).
> So, if you are going to use static network configuration, please
> remove DNS1, DNS2... lines from /etc/sysconfig/network-scripts/ifcfg-
> *
> and just configure it with a nameserver line under /etc/resolv.conf
> then systemctl restart network
> Try deploying hosted-engine again: VDSM network configuration will
> not
> touch /etc/resolv.conf and it will work.
>
>
> > > 1. On the host I ran hosted-engine --deploy and installed the OS
> > > (Centos 7
> > > (1511) on the Engine VM. I rebooted the Engine VM, told the
> > > deployment that
> > > the Engine VM was running and it then continued and deployment
> > > told me to
> > > install the engine on the Engine VM.
> > > 2. I updated the Engine VM via yum update, installed the oVirt
> > > repositories, and ran the engine-setup which completed
> > > successfully.
> > > 3. I then went back to the host and told it the Engine was setup
> > > and at this
> > > point things went bad. The deployment started whining about not
> > > being able
> > > to resolve myenginevm.mydomain.com host, did cleanup, per-
> > > termination,
> > > termination, and said the deployment failed and the system was
> > > unreliable,
> > > fix it, whine, whine, whine.
> > > 4. I tried a ping on myenginevm.mydomain.com and it failed.
> > >
> > > What I found was that when the bridge was created (ifcfg-
> > > ovirtmgmt) the DNS
> > > servers were left out! They were in the original NIC ifcfg file
> > > but it
> > > appears the deployment didn't bother to bring them over to the
> > > bridge ifcfg.
> > > I find this very puzzling since the deployment insists on FQDNs
> > > so it should
> > > be smart enough to bring over the DNS server settings and not
> > > leave them
> > > out. My /etc/resolv.conf file also had no DNS servers in it.
> > >
> > > I added the DNS server to the bridge ifcfg file, did a systemctl
> > > restart
> > > network and all is well again. The host can ping the VM!
> > >
> > > However, the deployment thinks it failed and I can not restart
> > > the Engine
> > > VM. I tried a reboot, made sure the ovirt daemons were running
> > > but if I try
> > > and do anything such as hosted-engine vm-start I get "Unable to
> > > read
> > > vm.conf, please check ovirt-ha-agent logs".
> > >
> > > Second, I think that having the deployment fail simply because it
> > > can not
> > > contact the Engine VM is a very huge error/bug/whatever - its
> > > silly. The
> > > deployment went well, the VM exists and is running but due to the
> > > deployment
> > > messing up the DNS servers it just can't find it. The deployment
> > > should
> > > first, handle the name server setup correctly and second fail
> > > gracefully..
> > >
> > > I rebooted the server but still get the error about not being
> > > able to read
> > > vm.conf. At this point I now have to run through the entire
> > > deployment
> > > again just because one phase messed up unless there is a way to
> > > work around
> > > this. However, in the work that I've done with oVirt I've notice
> > > the
> > > deployment is not real robust and when it encounters errors that
> > > should
> > > allow it to recover. I suggest that consideration be given to
> > > making the
> > > deployment smarter and more robust.
> > >
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/users
> > >
> > >
> > >
> > > More info.
> > >
> > > This gets broken during the hosted-engine --deploy first phase
> > > (before the
> > > OS is installed on the Engine VM) which makes sense because I
> > > assume that's
> > > when the bridge is created.
> > >
> > > I added another logical network with a VLAN tag and this broke
> > > name
> > > resolution again. I had to do systemctl restart network again
> > > and then name
> > > resolution was back.
> > >
> > > I'm attempting to use the web portal but it's very/very
> > > slow. When I select
> > > the admin portal it can take 5+ minutes before it displays the
> > > login page if
> > > it ever does and doesn't time out. Once I get the Admin login it
> > > goes
> > > pretty quickly. I'm using Firefox 45.0.1 on Fedora 23. Any
> > > reason for
> > > this? From what I see the message about not supporting the
> > > browser is
> > > bogus. My host has 64 gig memory, and E2620-v3 processor.
> >
> > Looks similar to [1]. Adding Simone.
> >
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1160423
> > --
> > Didi
Yes, it is similar. I've skimmed through and will probably add
comments to the bug later but here's my first response.
1. I'm on 3.6.4 and still have the issue.
2. If DNSx= is in ifcfg then it needs to be copied over to the bridge ifcfg file
3. /etc/resolv.conf does NOT keep the original DNS servers. When the bridge is recreated what's in /etc/resolv.conf is removed and you have an empty file with no DNS servers. I checked that.
4. This should be a blocker/ very severe/whatever since it prevents a successful completion of the install and since the deployment process is so touchy you have to wipe everything and start over. Yes, the workaround is to fix ifcfg- file and restart the network but that's not an acceptable solution in the long run.
I'll add my comments to the bug later.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160404/346df7d9/attachment-0001.html>
More information about the Users
mailing list