[node-devel] oVirt Node designs for stateless operation and 3rd party plugins

Tue Dec 6 10:18:56 UTC 2011

On Thu, 2011-12-01 at 09:32 -0500, Perry Myers wrote: 
> the Node development team has been trying to write up rough requirements
> around the stateless and plugins concepts.  And also some working high
> level design.
> 
> They can be reviewed on these two wiki pages:
> 
> http://ovirt.org/wiki/Node_plugins
> http://ovirt.org/wiki/Node_Stateless
> 
> Since the plugin model and the stateless model affect more than just the
> oVirt Node itself, we definitely would like to get input from other
> teams on the oVirt project.
> 
> Please add comments here or directly to the wiki.
> 

Hi There

I work for a *large* organisation, I have issues with the goal of a
stateless design.

* Being able to install without a local disk

I don't see this as a compelling reason for doing anything.   In fact,
in many cases for other nameless hypervisors we use local disk as a
source for logging / dumps etc.

I think the goal for stateless should be instead be configuration
neutral.  ie.  if the node is destroyed the configuration can be
re-deployed without issue.

The other issue is that the node should continue to be re-bootable even
if the configuration server is unavailable, which is a reason for having
the configuration on a local disk or a san attached LUN.   This should
apply to the entire operational environment - if the engine is
unavailable during a restart I should continue working the way I was
configured to do so - that implies state is retained.  It needs to be
easily refreshable :-)

The configuration bundle should be refreshable from a configuration
server (part of the engine) and that could either be just configuration
or agents or even s/w images - all would be preferred and it's pretty
simple conceptually to have an active/backup image on local disk concept
to allow easy rollbacks etc.  Yes all this , except for the logging /
swap could be in a usb key.

The bundle should all be pushed via a SSL encrypted RESTful api using
known non-priv credentials, preferably with rotating passwords or some
cert based approach.   The server should also know who previously
managed it to reduce hostile attempts to change ownership of the node.

* DHCP and PXE booting

Many corporate security policies prohibit the use of DHCP or PXE booting
servers for production environments.   I don't see it as a big issue to
boot an install image and be a good woodpecker and hit enter a few times
and configure a management IP address.   It should be possible to script
the complete configuration / addition of the node after that step.   I
see the initial install as a trivial part of the complete node
lifecycle.

* DNS SRV records

Sorry,  I hate the idea.  Large corporates have so many different teams
doing little things that adding this in as a requirement simply adds
delays to the deployments and opportunities for misconfiguration.

Having the node image and config on local disk (or usb) avoids this
requirement as the node knows who manages it.   A complete rebuild could
occur and the configuration reloaded once added back into the engine.

* Previously configured state

Yes,  the node should remember the previous operational state if it
can't talk to the engine.   This is not a bad thing.   

*  Configuration server

This should be part of the engine.   It should know the complete
configuration of a node, right down to hypervisor 'firmware' image.  The
process should be 2-way.  An admin should be able to 'pull' the
image/config from an operational and accessible node and new
configurations/images should be pushable to it.

I really don't think this needs to be a separate server to the engine.

*  New bundle deployments / Upgrades

The engine should keep track of what images are on a node.   If a new
config / image is to be deployed then for example, the node would be
tagged with the new image.  If the node was online, an alternate image
would be pushed, vm's migrated to an alternate node and the node
restarted implementing the new image when requested.

If the node was offline at the time the new image was configured in the
engine or if the node was built say with an old image then when it
connects to the engine the image would be refreshed and the node
recycled.

* Swap

Local disk swap is likely to be required.  Overcommit is common and SSD
local disk is something that is quite useful :-)

So in summary,  I prefer to think that the target should be
configuration neutrality or even just plain old distributed
configuration from a central source rather than completely stateless.
The goal should be toleration of complete destruction of a node image
and configuration and a simple process to re-add it and automatically
re-apply the configuration/sw image.

Just some thoughts for discussion / abuse ;-)

Tks
Geoff

> Cheers,
> 
> Perry
> _______________________________________________
> Arch mailing list
> Arch at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch