Design issue when using optional networks for administrative usages

Mike Kolesnik mkolesni at redhat.com
Thu Sep 12 04:56:37 UTC 2013


----- Original Message -----
> 
> On Sep 10, 2013, at 2:57 PM, Dan Kenigsberg <danken at redhat.com> wrote:
> 
> > On Tue, Sep 10, 2013 at 12:21:06PM +0300, Livnat Peer wrote:
> >> On 09/10/2013 11:02 AM, Dan Kenigsberg wrote:
> >>> On Tue, Sep 10, 2013 at 01:28:52AM -0400, Mike Kolesnik wrote:
> >>>> 
> >>>> ----- Original Message -----
> >>>>> On Sun, Sep 08, 2013 at 05:30:20AM -0400, Mike Kolesnik wrote:
> >>>>>> Hi,
> >>>>>> 
> >>>>>> I would like to hear opinions about what I consider a design issue in
> >>>>>> oVirt.
> >>>>>> 
> >>>>>> First of all, a short description of the current situation in oVirt
> >>>>>> 3.3:
> >>>>>> Network is a data-center level entity, representing a L2 broadcast
> >>>>>> domain.
> >>>>>> Each network can be attached to one or more clusters, where the
> >>>>>> attachment
> >>>>>> can have several properties:
> >>>>>> - Required/Optional - Does the network have to be on all hosts or not?
> >>>>>> - Usages (administrative):
> >>>>>> - Display network - used for the display traffic
> >>>>>> - Migration network - used for the migration traffic
> >>>>>> 
> >>>>>> Now, what bothers me is the affinity between these two properties - if
> >>>>>> a
> >>>>>> network is defined "optional", can is be used for an "administrative"
> >>>>>> usage?
> >>>>>> 
> >>>>>> Currently I can have the following situation:
> >>>>>> 0. Fresh install with some hosts and a shared storage, and no networks
> >>>>>> other than default.
> >>>>>> 1. Create a network X.
> >>>>>> 2. Attach to a cluster as "migration", "display", "optional".
> >>>>>> 3. Create a VM in the same cluster.
> >>>>>> 
> >>>>>> Now all is well and everything is green across the board, BUT:
> >>>>>> 1. The VM can't be run on any host in that cluster if the host doesn't
> >>>>>> have
> >>>>>> the display network.
> >>>>>> 2. VM will migrate over the default network if the network is not
> >>>>>> present
> >>>>>> on the source host.
> >>>>>> 3. Migration will not work if the network is not present on the
> >>>>>> destination
> >>>>>> host.
> >>>>>> 
> >>>>>> I find this situation very troublesome!
> >>>>>> We give the admin the impression that everything is fine and dandy,
> >>>>>> but
> >>>>>> underneath the surface everything is NOT.
> >>>>>> 
> >>>>>> If we look at the previous points we can see that:
> >>>>>> 1. No VM can run in that cluster, but hosts and network seem A-OK -
> >>>>>> this is
> >>>>>> intrinsically awful as we don't reflect the real problem anywhere in
> >>>>>> the
> >>>>>> network nor the host statuses but rather postpone it until someone
> >>>>>> makes
> >>>>>> an attempt to actually use the VM.
> >>>>>> 2. Migration network is NOT being used, which was obviously not the
> >>>>>> intent
> >>>>>> of the admin who set it up.
> >>>>>> 3. There is still an open bug for it (
> >>>>>> https://bugzilla.redhat.com/983515 )
> >>>>>> and it's unclear as to what should happen, but it would be either what
> >>>>>> happens in case #1 or in case #2.
> >>>>>> 
> >>>>>> What I suggest is to have any network with usage be "required".
> >>>>>> This will utilize the existing logic for required networks:
> >>>>>> - Either the network should not be used until its available on all
> >>>>>> hosts
> >>>>>> (reflected in the network status being Non-Operational)
> >>>>>> - Or the host should be Non-Operational as it's incapable of
> >>>>>> running/migrating VMs
> >>>>>> 
> >>>>>> Therefore reflecting the problem to the admin and giving him a chance
> >>>>>> to
> >>>>>> fix it properly, and not hiding the failure until it occurs or doing
> >>>>>> some
> >>>>>> unexpected behavior.
> >>>>>> 
> >>>>>> I would love to hear your thoughts on the subject.
> >>>>> 
> >>>>> Some history first. Once upon at time, we wanted an Up host to mean
> >>>>> "this host is ready to run any of its cluster's VMs". This meant that
> >>>>> if
> >>>>> a host lost connectivity to one of the cluster networks, it had to be
> >>>>> taken down.
> >>>>> 
> >>>>> Customers did not like our over protection, so we've introduced
> >>>>> non-required networks. When an admin uses this option he says "I know
> >>>>> what I'm doing, let me do stuff on this host even if the network is
> >>>>> down."
> >>>> 
> >>>> So what you're saying is non-required networks should not protect the
> >>>> user at all?
> >>>> 
> >>>> In this case I say we shouldn't impose any limitations whatsoever in
> >>>> this situation,
> >>>> and if the VM fails to start/migrate then let it fail.
> >>> 
> >>> Alona and others thought about it in the context of migration network
> >>> and decided that for that case, we'd like a fallback to the management
> >>> network. I believe that the main motivation was not to introduce
> >>> migration blockage on upgrade from ovirt-3.2 to ovirt-3.3. On ovirt-3.2,
> >>> migration was possible as long as ovirtmgmt was up, so we wanted to keep
> >>> that. Even with the price of ignoring the user's request to use a
> >>> designated migration network.
> >>> That's the old protection-vs-comfort equilibrium - we chose for comfort,
> >>> where the choice of protection is not preposterous.
> >>> (I'm delving into this issue only because we have to deal with bug
> >>> 975786 VM migration fails when required network, configured with
> >>> migration usages is turned down).
> >>> 
> >> 
> >> I think that VM network and administrative network are substantially
> >> different.
> >> In the case of administrative network it makes the system behavior
> >> unpredictable and cumbersome, since this was never asked by our users I
> >> don't see the need to complicate the common use case.
> > 
> > Maybe Simon Grinberg would chime in. He once requested to have a
> > migration network that is VM specific. I have a vague memory of a
> > customer asking for VM-specific display network. And once we have
> > storage networks? Are they considered "administrative" and must-have on
> > all hosts, or would you allow an admin to be more flexible?
> > 
> > Not all "administrative networks" are alike. Some are important, some
> > are less critical.
> > 
> >> 
> >>>> 
> >>>>> 
> >>>>> I think that this request is a valid one, even when a network serves
> >>>>> other purposes than connecting VMs. When designing migration network,
> >>>>> we've decided that if it is missing, migration would be attempted over
> >>>>> the management network, as a fallback. I can imagine an admin who says:
> >>>>> I don't care much about migrations, most of my VMs are pinned-to-host
> >>>>> anyway. so if the migration network is gone, don't make a fuss out of
> >>>>> it.
> >>>>> 
> >>>>> The use case for letting a host be Up even if its display network is
> >>>>> less obvious. But then again, I can think of an admin who uses a vdsm
> >>>>> hook to set the display IP of each VM. He does not care if the display
> >>>>> network is up or not.
> >>>> 
> >>>> If the admin uses hooks for his networking needs then I don't see why he
> >>>> even needs this support in oVirt, so your point is not clear to me..
> >>> 
> >>> The user does not need ovirt's support, it needs ovirt to not get in his
> >>> way.
> >>> Assume the user wants to transport the display of each VM over a
> >>> different IP address. And assume that he has the logic to choose this
> >>> address tucked in a vdsm hook. He then does not care whether the
> >>> ovirt-designated displaynetwork is up or down. Monitoring it is a
> >>> liability for him.
> >>> 
> >> 
> >> As I explained above in the above case the user can use the management
> >> network as the display network which is the default and would not block
> >> him while rewriting the display network in the hook.
> > 
> > It's easy to think of an admin who wants to have some of his
> > non-critical VMs use a non-ovirtmgmt network for display, and do an
> > uber-complex hook for several critical VMs. He likes to use a display
> > network, but would not want to take a host down if the net is gone.
> > 
> 
> 
> Indeed,
> I think that my opinion is known that I didn't like the notion of the
> not-required network in the first place. I think it's a poor substitute for
> dynamic networks.

I agree, but currently there are no dynamic networks that can be used for
"VM services" networks that you mention.

> A network is either dynamic or static where:
> Static = expect to find the network there
> Dynamic = expect this to be created per need bases
> Both should be part of the host configuration while dynamic is a place holder
> that says this host is capable to accommodate this dynamic network
> 
> Another missing notion is redundancy group.
> Networks may be part of redundancy groups, no redundancy group settings means
> redundancy group of one.
> 
> As I see it there are few types of networks:
> 1. Facilities (Storage, Management, other) - Static by nature
> 2. VM services (Migration, Display)
> 3. VM connectivity networks
> The first group should effect host operational status  - meaning they are
> required (as on failure the host just can't run ANY VM) but will not effect
> host operational state unless all the networks in the redundancy group are
> down.
> The last are not required by nature and could be dynamic and be setup on
> demand or use static, based on the underlying technology (like Mellanox vs
> linux bridge), and should only effect scheduling decisions.
> 
> Now let's discuss the confusing group #2 - VM Services, I claim that they are
> basically not required since like the VM networks they effect a certain
> aspect of the VM.

They affect an aspect of the VM, that's correct, but they're not explicitly
used by it which makes them different from "VM" networks.

> - Display: I've raised in the past that this should be a per VM property that
> takes it's default from the clusters' display default, but can be any
> network in the cluster.
> -- None should be allowed value (KVM support a VM with no display device)
> -- If set then only allow to run the VM on hosts that support the selected
> network (either as dynamic or static)
> 
> This also answers an RFE to support multiple display networks for
> multi-tenants, where multiple tenants may even be two departments using
> different VLANS in the organizations

Do you know if such an RFE is filed?

> 
> - Migration: Again a per VM property that takes it's default from the
> clusters' migration network default, but can be any network in the cluster.
> -- None should be allowed, it's like never migrate.
> -- If set then if the network is down the worst case is that the VM can't be
> migrate via this network
> --- on manual migration ask to select a network for migration
> --- on host maintenance use the cluster default - if it's not available then
> maintenance with fail
>  
> I hope this makes sense (and that arc@ won't reject this mail)

I think currently we can ease the life of the admin by being more strict, it's not
nannying him IMHO but giving him the right status for his data center.
In the future, when and if we decide to implement these ideas (which currently are
not implemented) we can revisit and probably find a better way to accomplish this.

> 
> Regards,
> Simon.
> 
> 



More information about the Arch mailing list