[Engine-devel] [node-devel] Support for stateless nodes

Wed Feb 22 16:10:50 UTC 2012

On 22/02/12 17:58, Mike Burns wrote:
> On Wed, 2012-02-22 at 17:33 +0200, Doron Fediuck wrote:
>> On 22/02/12 16:57, Mike Burns wrote:
>>> There has been a lot of interest in being able to run stateless Nodes
>>> with ovirt-engine.  ovirt-node has designed a way [1] to achieve this on
>>> the node side, but we need input from the engine and vdsm teams to see
>>> if we're missing some requirement or if there needs to be changes on the
>>> engine/vdsm side to achieve this.
>>>
>>> As it currently stands, every time you reboot an ovirt-node that is
>>> stateless, it would require manually removing the host in engine, then
>>> re-registering/approving it again in engine.  
>>>
>>> Any thoughts, concerns, input on how to solve this?
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>> [1] http://ovirt.org/wiki/Node_Stateless
>>>
>>
>> Some points need to be considered;
>>
>> - Installation issues
>>
>> * Just stating the obvious, which is users need
>> to remove-add the host on every reboot. This will
>> not make this feature a lovable one from user's point of view.
> 
> Yes, this is something that will cause this to be a non-starter.  We'd
> need to change something in the engine/vdsm to make it smoother.
> Perhaps, a flag in engine on the host saying that it's stateless.  Then
> if a host comes up with the same information, but no certs, etc, it
> would validate some other embedded key (TPM, key embedded in the node
> itself), and auto-approve it to be the same state as the previous boot
> 
This will require some thinking.

>>
>> * During initial boot, vdsm-reg configures the networking
>> and creates a management network bridge. This is a very
>> delicate process which may fail due to networking issues
>> such as resolution, routing, etc. So re-doing this on
>> every boot increases the chances of loosing a node due
>> to network problems.
> 
> vdsm-reg runs on *every* boot anyway and renames the bridge.  This is
> something that was debated previously, but it was decided to re-run it
> every boot.
> 
Close, but not exactly; vdsm-reg will run on every boot, but
if the relevant bridge is found, then networking is unchanged.

>>
>> * CA pollution; generating a certificate on each reboot
>> for each node will create a huge number of certificates
>> in the engine side, which eventually may damage the CA.
>> (Unsure if there's a limitation to certificates number,
>> but having hundreds of junk cert's can't be good).
> 
> We could have vdsm/engine store the certs on the engine side, and on
> boot, after validating the host (however that is done), it will load the
> certs onto the node machine.  
> 
This is a security issue, since the key pair should be
generated on the node. This will lead us back to your TPM
suggestion, but (although I like it, ) will cause us
to be tpm-dependent, not to mention a non-trivial implementation.

>>
>> * Today there's a supported flow that for nodes with
>> password, the user is allowed to use the "add host"
>> scenario. For stateless, it means re-configuring a password
>> on every boot...
> 
> Stateless is really targeted for a PXE environment.  There is a
> supported kernel param that can be set that will set this password.
> Also, if we follow the design mentioned ^^, then it's not an issue since
> the host will auto-approve itself when it connects
> 
>>
>> - Other issues
>>
>> * Local storage; so far we were able to define a local
>> storage in ovirt node. Stateless will block this ability.
> 
> Yes, this would be unavailable if you're running stateless.  I think
> that's a fine tradeoff since people want the host to be diskless.
>>
>> * Node upgrade; currently it's possible to upgrade a node
>> from the engine. In stateless it will error, since no where
>> to d/l the iso file to.
> 
> Upgrade is handled easily by rebooting the host after updating the pxe
> server
> 
>>
>> * Collecting information; core dumps and logging may not
>> be available due to lack of space? Or will it cause kernel
>> panic if all space is consumed?
> 
> A valid concern, but a stateless environment would likely have
> collectd/rsyslog/netconsole servers running elsewhere that will collect
> the logs.  kdumps can be configured to dump remotely as well.  
This will also need some work on the vdsm side.

>>
> 
> Another concern raised is swap and overcommit.  First version would
> likely disable swap completely.  This would disable overcommit as well.
> Future versions could enable a local disk to be used completely for
> swap, but that is another tradeoff that people would need to evaluate
> when choosing between stateless and stateful installs.
Indeed so- completely forgot about swap...

> 
> Mike
> 


-- 

/d

“Funny,” he intoned funereally, “how just when you think life can't possibly get any worse it suddenly does.” --Douglas Adams, The Hitchhiker's Guide to the Galaxy