Design issue when using optional networks for administrative usages
Dan Kenigsberg
danken at redhat.com
Tue Sep 10 08:02:15 UTC 2013
On Tue, Sep 10, 2013 at 01:28:52AM -0400, Mike Kolesnik wrote:
>
> ----- Original Message -----
> > On Sun, Sep 08, 2013 at 05:30:20AM -0400, Mike Kolesnik wrote:
> > > Hi,
> > >
> > > I would like to hear opinions about what I consider a design issue in
> > > oVirt.
> > >
> > > First of all, a short description of the current situation in oVirt 3.3:
> > > Network is a data-center level entity, representing a L2 broadcast domain.
> > > Each network can be attached to one or more clusters, where the attachment
> > > can have several properties:
> > > - Required/Optional - Does the network have to be on all hosts or not?
> > > - Usages (administrative):
> > > - Display network - used for the display traffic
> > > - Migration network - used for the migration traffic
> > >
> > > Now, what bothers me is the affinity between these two properties - if a
> > > network is defined "optional", can is be used for an "administrative"
> > > usage?
> > >
> > > Currently I can have the following situation:
> > > 0. Fresh install with some hosts and a shared storage, and no networks
> > > other than default.
> > > 1. Create a network X.
> > > 2. Attach to a cluster as "migration", "display", "optional".
> > > 3. Create a VM in the same cluster.
> > >
> > > Now all is well and everything is green across the board, BUT:
> > > 1. The VM can't be run on any host in that cluster if the host doesn't have
> > > the display network.
> > > 2. VM will migrate over the default network if the network is not present
> > > on the source host.
> > > 3. Migration will not work if the network is not present on the destination
> > > host.
> > >
> > > I find this situation very troublesome!
> > > We give the admin the impression that everything is fine and dandy, but
> > > underneath the surface everything is NOT.
> > >
> > > If we look at the previous points we can see that:
> > > 1. No VM can run in that cluster, but hosts and network seem A-OK - this is
> > > intrinsically awful as we don't reflect the real problem anywhere in the
> > > network nor the host statuses but rather postpone it until someone makes
> > > an attempt to actually use the VM.
> > > 2. Migration network is NOT being used, which was obviously not the intent
> > > of the admin who set it up.
> > > 3. There is still an open bug for it ( https://bugzilla.redhat.com/983515 )
> > > and it's unclear as to what should happen, but it would be either what
> > > happens in case #1 or in case #2.
> > >
> > > What I suggest is to have any network with usage be "required".
> > > This will utilize the existing logic for required networks:
> > > - Either the network should not be used until its available on all hosts
> > > (reflected in the network status being Non-Operational)
> > > - Or the host should be Non-Operational as it's incapable of
> > > running/migrating VMs
> > >
> > > Therefore reflecting the problem to the admin and giving him a chance to
> > > fix it properly, and not hiding the failure until it occurs or doing some
> > > unexpected behavior.
> > >
> > > I would love to hear your thoughts on the subject.
> >
> > Some history first. Once upon at time, we wanted an Up host to mean
> > "this host is ready to run any of its cluster's VMs". This meant that if
> > a host lost connectivity to one of the cluster networks, it had to be
> > taken down.
> >
> > Customers did not like our over protection, so we've introduced
> > non-required networks. When an admin uses this option he says "I know
> > what I'm doing, let me do stuff on this host even if the network is
> > down."
>
> So what you're saying is non-required networks should not protect the user at all?
>
> In this case I say we shouldn't impose any limitations whatsoever in this situation,
> and if the VM fails to start/migrate then let it fail.
Alona and others thought about it in the context of migration network
and decided that for that case, we'd like a fallback to the management
network. I believe that the main motivation was not to introduce
migration blockage on upgrade from ovirt-3.2 to ovirt-3.3. On ovirt-3.2,
migration was possible as long as ovirtmgmt was up, so we wanted to keep
that. Even with the price of ignoring the user's request to use a
designated migration network.
That's the old protection-vs-comfort equilibrium - we chose for comfort,
where the choice of protection is not preposterous.
(I'm delving into this issue only because we have to deal with bug
975786 VM migration fails when required network, configured with
migration usages is turned down).
>
> >
> > I think that this request is a valid one, even when a network serves
> > other purposes than connecting VMs. When designing migration network,
> > we've decided that if it is missing, migration would be attempted over
> > the management network, as a fallback. I can imagine an admin who says:
> > I don't care much about migrations, most of my VMs are pinned-to-host
> > anyway. so if the migration network is gone, don't make a fuss out of
> > it.
> >
> > The use case for letting a host be Up even if its display network is
> > less obvious. But then again, I can think of an admin who uses a vdsm
> > hook to set the display IP of each VM. He does not care if the display
> > network is up or not.
>
> If the admin uses hooks for his networking needs then I don't see why he
> even needs this support in oVirt, so your point is not clear to me..
The user does not need ovirt's support, it needs ovirt to not get in his way.
Assume the user wants to transport the display of each VM over a
different IP address. And assume that he has the logic to choose this
address tucked in a vdsm hook. He then does not care whether the
ovirt-designated displaynetwork is up or down. Monitoring it is a
liability for him.
>
> >
> > In my opinion, the meaning and the danger on non-req networks should be
> > properly documented and clear to customers, but some of them are
> > expected to find it useful.
>
> I agree, if this is our approach then it should be very very well documented.
It has been our approach since the introduction of non-req networks.
More information about the Arch
mailing list