[node-devel] oVirt Node designs for stateless operation and 3rd party plugins

Tue Dec 6 13:52:52 UTC 2011

Comments Inline

On Tue, 2011-12-06 at 21:18 +1100, Geoff O'Callaghan wrote:
> On Thu, 2011-12-01 at 09:32 -0500, Perry Myers wrote: 
> > the Node development team has been trying to write up rough requirements
> > around the stateless and plugins concepts.  And also some working high
> > level design.
> > 
> > They can be reviewed on these two wiki pages:
> > 
> > http://ovirt.org/wiki/Node_plugins
> > http://ovirt.org/wiki/Node_Stateless
> > 
> > Since the plugin model and the stateless model affect more than just the
> > oVirt Node itself, we definitely would like to get input from other
> > teams on the oVirt project.
> > 
> > Please add comments here or directly to the wiki.
> > 
> 
> Hi There
> 
> I work for a *large* organisation, I have issues with the goal of a
> stateless design.

Thanks for the feedback overall.  I'll try to address all your points
below.

> 
> * Being able to install without a local disk
> 
> I don't see this as a compelling reason for doing anything.   In fact,
> in many cases for other nameless hypervisors we use local disk as a
> source for logging / dumps etc.

That may be the case in your environment, but when we presented this at
the oVirt Workshop, the idea of a diskless deployment was very well
received.  I suppose that what we're calling stateless is really more of
a diskless feature rather than truly stateless since we're keeping the
stateful information in a configuration server.

> 
> I think the goal for stateless should be instead be configuration
> neutral.  ie.  if the node is destroyed the configuration can be
> re-deployed without issue.

Redeployed on the same machine?  Or redeployed on a different machine?
We already provide autoinstallation options that will do redeployments
easily and one of the goals or ideas along with the proposed stateless
model is that the machine gets re-provisioned and downloads its config
bundle.  This would successfully recover the node if someone were to
power it off or destroy it somehow.  If you're looking to move the
config to a new machine, then that's not quite as simple.  The easiest
would be to simply install it again from scratch.

> 
> The other issue is that the node should continue to be re-bootable even
> if the configuration server is unavailable, which is a reason for having
> the configuration on a local disk or a san attached LUN.   This should
> apply to the entire operational environment - if the engine is
> unavailable during a restart I should continue working the way I was
> configured to do so - that implies state is retained.  It needs to be
> easily refreshable :-)

I will admit that the thought of the config server being unavailable
hadn't come up previously.  If this is something that you're
legitimately concerned about, then it sounds like you'd want to continue
doing local installations and not stateless installs.

Currently, node images will install to local disk and they will boot
fine without a management server or config server.  But they won't be
truly functional unless there is a management server available to tell
it what to do.  This is the case for all hypervisors, whether they're
ovirt-node images or Fedora 16 images with VDSM installed or any of the
other distributions.  It's a limitation the VDSM and Engine need to
solve outside the scope of ovirt-node.
> 
> The configuration bundle should be refreshable from a configuration
> server (part of the engine) and that could either be just configuration
> or agents or even s/w images - all would be preferred and it's pretty
> simple conceptually to have an active/backup image on local disk concept
> to allow easy rollbacks etc.  Yes all this , except for the logging /
> swap could be in a usb key.

We do provide a RootBackup partition that we automatically activate if
something goes wrong with an upgrade.  It would make sense that we
should keep a backup configuration bundle on the management server as
well.  The actual image itself is a livecd, so updating that would be a
matter of changing the usb stick/cd-rom/pxe image to the old/new version

> 
> The bundle should all be pushed via a SSL encrypted RESTful api using
> known non-priv credentials, preferably with rotating passwords or some
> cert based approach.   The server should also know who previously
> managed it to reduce hostile attempts to change ownership of the node.

Yes, the security issues are something that we're definitely aware of
and not taking lightly.  The actual process for how we do this is
something that still would need to be worked out.  The initial design
was something along the lines of a free posting to the config server
that the admin has to approve.  The thought was that we would have
different levels of security that could be configured depending on your
deployment and the strictness of the rules in your environment.

> 
> * DHCP and PXE booting
> 
> Many corporate security policies prohibit the use of DHCP or PXE booting
> servers for production environments.   I don't see it as a big issue to
> boot an install image and be a good woodpecker and hit enter a few times
> and configure a management IP address.   It should be possible to script
> the complete configuration / addition of the node after that step.   I
> see the initial install as a trivial part of the complete node
> lifecycle.

So a couple thoughts here:

1.  If only pxe is restricted, then you could have a usb stick or cd-rom
with the image in each machine and still do stateless as defined
otherwise.
2.  If just DHCP, then you could have a pxe profile per machine that
sets up the static networking options needed
3.  If both are restricted, then you would have to go with a stateful
installation.  It's not going away, just another mode that we will
provide.

Actual installation and configuration can be completed automatically
using kernel command line options.  That is independent of whether
you're using a stateful or stateless installation.

>   
> * DNS SRV records
> 
> Sorry,  I hate the idea.  Large corporates have so many different teams
> doing little things that adding this in as a requirement simply adds
> delays to the deployments and opportunities for misconfiguration.

Sure, that's a valid possibility.  Perhaps another commandline option
that allows someone to specify the config server manually.

> 
> Having the node image and config on local disk (or usb) avoids this
> requirement as the node knows who manages it.   A complete rebuild could
> occur and the configuration reloaded once added back into the engine.

Yes, this is a valid use case.  And if that's the way you want to deploy
your environment, then use the install to disk option and not stateless.
We will provide both

> 
> * Previously configured state
> 
> Yes,  the node should remember the previous operational state if it
> can't talk to the engine.   This is not a bad thing.   
> 
> *  Configuration server
> 
> This should be part of the engine.   It should know the complete
> configuration of a node, right down to hypervisor 'firmware' image.  The
> process should be 2-way.  An admin should be able to 'pull' the
> image/config from an operational and accessible node and new
> configurations/images should be pushable to it.
> 
> I really don't think this needs to be a separate server to the engine.

I agree, it should be part of the engine, probably will be.  Depending
on time frames and availability, it might be developed separate
initially, but long term we probably want to integrate with the
management server.
> 
> *  New bundle deployments / Upgrades
> 
> The engine should keep track of what images are on a node.   If a new
> config / image is to be deployed then for example, the node would be
> tagged with the new image.  If the node was online, an alternate image
> would be pushed, vm's migrated to an alternate node and the node
> restarted implementing the new image when requested.

This is mostly already done, I think.  I know the functionality is there
in RHEV-M, but not sure if it's all in the webadmin UI yet.  I know the
backend pieces are all there though.

A running node has it's version info that vdsm reads initially and
reports back to the engine.  An admin logs into the engine, and can see
the details of the node including the version that it's currently
running.  There is an option to push out a new image to the node and
have it upgrade itself.  The node does have to be in maintenance mode to
start the process which causes all VMs to be migrated away.  

> 
> If the node was offline at the time the new image was configured in the
> engine or if the node was built say with an old image then when it
> connects to the engine the image would be refreshed and the node
> recycled.

Automatic upgrades like this aren't done at the moment.  There probably
needs to be some policy engine that can control it so all machines don't
suddenly try to upgrade themselves.  

This whole section really applies to stateful installations though.  In
Stateless, you just need to refresh the image in the PXE
server/cd-rom/usb stick and reboot the machine (after putting it in
maintenance mode)

> 
> * Swap
> 
> Local disk swap is likely to be required.  Overcommit is common and SSD
> local disk is something that is quite useful :-)

I agree, in general.  I did talk to one person at the workshop that had
a machine with 300+GB RAM and had 0 interest in doing overcommit.  So
there is certainly a use case for being able to support both.  

> 
> So in summary,  I prefer to think that the target should be
> configuration neutrality or even just plain old distributed
> configuration from a central source rather than completely stateless.
> The goal should be toleration of complete destruction of a node image
> and configuration and a simple process to re-add it and automatically
> re-apply the configuration/sw image.

I like the thought of storing the configuration to a central location
even when having the image installed locally.  I definitely think there
will be people that can't or won't go with stateless for various reasons
many of which you state above.  But I also think there are some that
will want it as well.  

The simplest use case for wanting a stateless model like we designed is
someone that has a rack of blades without local disks.  The setup pxe
and dhcp, and just turn on the blades.

Mike
> 
> Just some thoughts for discussion / abuse ;-)
> 
> Tks
> Geoff
> 
> > Cheers,
> > 
> > Perry
> > _______________________________________________
> > Arch mailing list
> > Arch at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/arch
> 
> 
> _______________________________________________
> Arch mailing list
> Arch at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch