[Engine-devel] [node-devel] Support for stateless nodes

Wed Feb 22 16:06:40 UTC 2012

> * Just stating the obvious, which is users need
> to remove-add the host on every reboot. This will
> not make this feature a lovable one from user's point of view.

I think the point mburns is trying to make in his initial email is that
we're going to need to do some joint work between node and vdsm teams to
change the registration process so that this is no longer necessary.

It will require some redesigning of the registration process

> * During initial boot, vdsm-reg configures the networking
> and creates a management network bridge. This is a very
> delicate process which may fail due to networking issues
> such as resolution, routing, etc. So re-doing this on
> every boot increases the chances of loosing a node due
> to network problems.

Well, if the network is busted which leads to the bridge rename failing,
wouldn't the fact that the network is broken cause other problems anyhow?

So I don't see this as a problem.  If your network doesn't work
properly, don't expect hosts in the network to subsequently work properly.

As an aside, why is reverse DNS lookup a requirement?  If we remove that
it makes things a lot easier, no?

> * CA pollution; generating a certificate on each reboot
> for each node will create a huge number of certificates
> in the engine side, which eventually may damage the CA.
> (Unsure if there's a limitation to certificates number,
> but having hundreds of junk cert's can't be good).

I don't think we should regenerate a new certificate on each boot.  I
think we need a way for 'an already registered host to retrieve it's
certificate from the oVirt Engine server'

Using an embedded encryption key (if you trust your mgmt network or are
booting from embedded flash), or for the paranoid a key stored in TPM
can be used to have vdsm safely retrieve this from the oVirt Engine
server on each boot so that it's not required to regenerate/reregister
on each boot

> * Today there's a supported flow that for nodes with
> password, the user is allowed to use the "add host"
> scenario. For stateless, it means re-configuring a password
> on every boot...

This flow would still be applicable.  We are going to allow setting of
the admin password embedded in the core ISO via an offline process.
Once vdsm is fixed to use a non-root account for installation flow, this
is no longer a problem

Also, if we (as described above) make registrations persistent across
reboots by changing the registration flow a bit, then the install user
password only need be set for the initial boot anyhow.

Therefore I think as a requirement for stateless oVirt Node, we must
have as a prerequsite removing root account usage for
registration/installation

> - Other issues
> 
> * Local storage; so far we were able to define a local
> storage in ovirt node. Stateless will block this ability.

It shouldn't.  The Node should be able to automatically scan locally
attached disks to look for a well defined VG or partition label and
based on that automatically activate/mount

Stateless doesn't imply diskless.  It is a requirement even for
stateless node usage to be able to leverage locally attached disks both
for VM storage and also for Swap.

> * Node upgrade; currently it's possible to upgrade a node
> from the engine. In stateless it will error, since no where
> to d/l the iso file to.

Upgrades are no longer needed with stateless.  To upgrade a stateless
node all you need to do is 'reboot from a newer image'.  i.e. all
upgrades would be done via PXE server image replacement.  So the flow of
'upload ISO to running oVirt Node' is no longer even necessary

> * Collecting information; core dumps and logging may not
> be available due to lack of space? Or will it cause kernel
> panic if all space is consumed?

We already provide ability to send kdumps to remote ssh/NFS location and
already provide the ability to use both collectd and rsyslogs to pipe
logs/stats to remote server(s).  Local logs can be set to logrotate to a
reasonable size so that local RAM FS always contains recent log
information for quick triage, but long term historical logging would be
maintained on the rsyslog server

Perry