Re: [Engine-devel] [node-devel] Support for stateless nodes

22 Feb 2012


      On 22/02/12 17:58, Mike Burns wrote:
...
On Wed, 2012-02-22 at 17:33 +0200, Doron Fediuck wrote:
...
On 22/02/12 16:57, Mike Burns wrote:
...
There has been a lot of interest in being able to run stateless Nodes
with ovirt-engine.  ovirt-node has designed a way [1] to achieve this on
the node side, but we need input from the engine and vdsm teams to see
if we're missing some requirement or if there needs to be changes on the
engine/vdsm side to achieve this.
As it currently stands, every time you reboot an ovirt-node that is
stateless, it would require manually removing the host in engine, then
re-registering/approving it again in engine.
Any thoughts, concerns, input on how to solve this?
Thanks
Mike
[1] http://ovirt.org/wiki/Node_Stateless
Some points need to be considered;
- Installation issues
* Just stating the obvious, which is users need
to remove-add the host on every reboot. This will
not make this feature a lovable one from user's point of view.
Yes, this is something that will cause this to be a non-starter.  We'd
need to change something in the engine/vdsm to make it smoother.
Perhaps, a flag in engine on the host saying that it's stateless.  Then
if a host comes up with the same information, but no certs, etc, it
would validate some other embedded key (TPM, key embedded in the node
itself), and auto-approve it to be the same state as the previous boot
This will require some thinking.
...
...
* During initial boot, vdsm-reg configures the networking
and creates a management network bridge. This is a very
delicate process which may fail due to networking issues
such as resolution, routing, etc. So re-doing this on
every boot increases the chances of loosing a node due
to network problems.
vdsm-reg runs on *every* boot anyway and renames the bridge.  This is
something that was debated previously, but it was decided to re-run it
every boot.
Close, but not exactly; vdsm-reg will run on every boot, but
if the relevant bridge is found, then networking is unchanged.
...
...
* CA pollution; generating a certificate on each reboot
for each node will create a huge number of certificates
in the engine side, which eventually may damage the CA.
(Unsure if there's a limitation to certificates number,
but having hundreds of junk cert's can't be good).
We could have vdsm/engine store the certs on the engine side, and on
boot, after validating the host (however that is done), it will load the
certs onto the node machine.
This is a security issue, since the key pair should be
generated on the node. This will lead us back to your TPM
suggestion, but (although I like it, ) will cause us
to be tpm-dependent, not to mention a non-trivial implementation.
...
...
* Today there's a supported flow that for nodes with
password, the user is allowed to use the "add host"
scenario. For stateless, it means re-configuring a password
on every boot...
Stateless is really targeted for a PXE environment.  There is a
supported kernel param that can be set that will set this password.
Also, if we follow the design mentioned ^^, then it's not an issue since
the host will auto-approve itself when it connects
...
- Other issues
* Local storage; so far we were able to define a local
storage in ovirt node. Stateless will block this ability.
Yes, this would be unavailable if you're running stateless.  I think
that's a fine tradeoff since people want the host to be diskless.
...
* Node upgrade; currently it's possible to upgrade a node
from the engine. In stateless it will error, since no where
to d/l the iso file to.
Upgrade is handled easily by rebooting the host after updating the pxe
server
...
* Collecting information; core dumps and logging may not
be available due to lack of space? Or will it cause kernel
panic if all space is consumed?
A valid concern, but a stateless environment would likely have
collectd/rsyslog/netconsole servers running elsewhere that will collect
the logs.  kdumps can be configured to dump remotely as well.  
This will also need some work on the vdsm side.
...
...
Another concern raised is swap and overcommit.  First version would
likely disable swap completely.  This would disable overcommit as well.
Future versions could enable a local disk to be used completely for
swap, but that is another tradeoff that people would need to evaluate
when choosing between stateless and stateful installs.
Indeed so- completely forgot about swap...
...
Mike
-- 

/d

“Funny,” he intoned funereally, “how just when you think life can't possibly get any worse it suddenly does.” --Douglas Adams, The Hitchhiker's Guide to the Galaxy