Design issue when using optional networks for administrative usages

Tue Sep 10 11:57:37 UTC 2013

On Tue, Sep 10, 2013 at 12:21:06PM +0300, Livnat Peer wrote:
> On 09/10/2013 11:02 AM, Dan Kenigsberg wrote:
> > On Tue, Sep 10, 2013 at 01:28:52AM -0400, Mike Kolesnik wrote:
> >>
> >> ----- Original Message -----
> >>> On Sun, Sep 08, 2013 at 05:30:20AM -0400, Mike Kolesnik wrote:
> >>>> Hi,
> >>>>
> >>>> I would like to hear opinions about what I consider a design issue in
> >>>> oVirt.
> >>>>
> >>>> First of all, a short description of the current situation in oVirt 3.3:
> >>>> Network is a data-center level entity, representing a L2 broadcast domain.
> >>>> Each network can be attached to one or more clusters, where the attachment
> >>>> can have several properties:
> >>>> - Required/Optional - Does the network have to be on all hosts or not?
> >>>> - Usages (administrative):
> >>>> - Display network - used for the display traffic
> >>>> - Migration network - used for the migration traffic
> >>>>
> >>>> Now, what bothers me is the affinity between these two properties - if a
> >>>> network is defined "optional", can is be used for an "administrative"
> >>>> usage?
> >>>>
> >>>> Currently I can have the following situation:
> >>>> 0. Fresh install with some hosts and a shared storage, and no networks
> >>>> other than default.
> >>>> 1. Create a network X.
> >>>> 2. Attach to a cluster as "migration", "display", "optional".
> >>>> 3. Create a VM in the same cluster.
> >>>>
> >>>> Now all is well and everything is green across the board, BUT:
> >>>> 1. The VM can't be run on any host in that cluster if the host doesn't have
> >>>> the display network.
> >>>> 2. VM will migrate over the default network if the network is not present
> >>>> on the source host.
> >>>> 3. Migration will not work if the network is not present on the destination
> >>>> host.
> >>>>
> >>>> I find this situation very troublesome!
> >>>> We give the admin the impression that everything is fine and dandy, but
> >>>> underneath the surface everything is NOT.
> >>>>
> >>>> If we look at the previous points we can see that:
> >>>> 1. No VM can run in that cluster, but hosts and network seem A-OK - this is
> >>>> intrinsically awful as we don't reflect the real problem anywhere in the
> >>>> network nor the host statuses but rather postpone it until someone makes
> >>>> an attempt to actually use the VM.
> >>>> 2. Migration network is NOT being used, which was obviously not the intent
> >>>> of the admin who set it up.
> >>>> 3. There is still an open bug for it ( https://bugzilla.redhat.com/983515 )
> >>>> and it's unclear as to what should happen, but it would be either what
> >>>> happens in case #1 or in case #2.
> >>>>
> >>>> What I suggest is to have any network with usage be "required".
> >>>> This will utilize the existing logic for required networks:
> >>>> - Either the network should not be used until its available on all hosts
> >>>> (reflected in the network status being Non-Operational)
> >>>> - Or the host should be Non-Operational as it's incapable of
> >>>> running/migrating VMs
> >>>>
> >>>> Therefore reflecting the problem to the admin and giving him a chance to
> >>>> fix it properly, and not hiding the failure until it occurs or doing some
> >>>> unexpected behavior.
> >>>>
> >>>> I would love to hear your thoughts on the subject.
> >>>
> >>> Some history first. Once upon at time, we wanted an Up host to mean
> >>> "this host is ready to run any of its cluster's VMs". This meant that if
> >>> a host lost connectivity to one of the cluster networks, it had to be
> >>> taken down.
> >>>
> >>> Customers did not like our over protection, so we've introduced
> >>> non-required networks. When an admin uses this option he says "I know
> >>> what I'm doing, let me do stuff on this host even if the network is
> >>> down."
> >>
> >> So what you're saying is non-required networks should not protect the user at all?
> >>
> >> In this case I say we shouldn't impose any limitations whatsoever in this situation,
> >> and if the VM fails to start/migrate then let it fail.
> > 
> > Alona and others thought about it in the context of migration network
> > and decided that for that case, we'd like a fallback to the management
> > network. I believe that the main motivation was not to introduce
> > migration blockage on upgrade from ovirt-3.2 to ovirt-3.3. On ovirt-3.2,
> > migration was possible as long as ovirtmgmt was up, so we wanted to keep
> > that. Even with the price of ignoring the user's request to use a
> > designated migration network.
> > That's the old protection-vs-comfort equilibrium - we chose for comfort,
> > where the choice of protection is not preposterous.
> > (I'm delving into this issue only because we have to deal with bug
> > 975786 VM migration fails when required network, configured with
> > migration usages is turned down).
> > 
> 
> I think that VM network and administrative network are substantially
> different.
> In the case of administrative network it makes the system behavior
> unpredictable and cumbersome, since this was never asked by our users I
> don't see the need to complicate the common use case.

Maybe Simon Grinberg would chime in. He once requested to have a
migration network that is VM specific. I have a vague memory of a
customer asking for VM-specific display network. And once we have
storage networks? Are they considered "administrative" and must-have on
all hosts, or would you allow an admin to be more flexible?

Not all "administrative networks" are alike. Some are important, some
are less critical.

> 
> >>
> >>>
> >>> I think that this request is a valid one, even when a network serves
> >>> other purposes than connecting VMs. When designing migration network,
> >>> we've decided that if it is missing, migration would be attempted over
> >>> the management network, as a fallback. I can imagine an admin who says:
> >>> I don't care much about migrations, most of my VMs are pinned-to-host
> >>> anyway. so if the migration network is gone, don't make a fuss out of
> >>> it.
> >>>
> >>> The use case for letting a host be Up even if its display network is
> >>> less obvious. But then again, I can think of an admin who uses a vdsm
> >>> hook to set the display IP of each VM. He does not care if the display
> >>> network is up or not.
> >>
> >> If the admin uses hooks for his networking needs then I don't see why he
> >> even needs this support in oVirt, so your point is not clear to me..
> > 
> > The user does not need ovirt's support, it needs ovirt to not get in his way.
> > Assume the user wants to transport the display of each VM over a
> > different IP address. And assume that he has the logic to choose this
> > address tucked in a vdsm hook. He then does not care whether the
> > ovirt-designated displaynetwork is up or down. Monitoring it is a
> > liability for him.
> > 
> 
> As I explained above in the above case the user can use the management
> network as the display network which is the default and would not block
> him while rewriting the display network in the hook.

It's easy to think of an admin who wants to have some of his
non-critical VMs use a non-ovirtmgmt network for display, and do an
uber-complex hook for several critical VMs. He likes to use a display
network, but would not want to take a host down if the net is gone.