On 22/02/12 17:58, Mike Burns wrote:
On Wed, 2012-02-22 at 17:33 +0200, Doron Fediuck wrote:
> On 22/02/12 16:57, Mike Burns wrote:
>> There has been a lot of interest in being able to run stateless Nodes
>> with ovirt-engine. ovirt-node has designed a way [1] to achieve this on
>> the node side, but we need input from the engine and vdsm teams to see
>> if we're missing some requirement or if there needs to be changes on the
>> engine/vdsm side to achieve this.
>>
>> As it currently stands, every time you reboot an ovirt-node that is
>> stateless, it would require manually removing the host in engine, then
>> re-registering/approving it again in engine.
>>
>> Any thoughts, concerns, input on how to solve this?
>>
>> Thanks
>>
>> Mike
>>
>> [1]
http://ovirt.org/wiki/Node_Stateless
>>
>
> Some points need to be considered;
>
> - Installation issues
>
> * Just stating the obvious, which is users need
> to remove-add the host on every reboot. This will
> not make this feature a lovable one from user's point of view.
Yes, this is something that will cause this to be a non-starter. We'd
need to change something in the engine/vdsm to make it smoother.
Perhaps, a flag in engine on the host saying that it's stateless. Then
if a host comes up with the same information, but no certs, etc, it
would validate some other embedded key (TPM, key embedded in the node
itself), and auto-approve it to be the same state as the previous boot
This will require some thinking.
>
> * During initial boot, vdsm-reg configures the networking
> and creates a management network bridge. This is a very
> delicate process which may fail due to networking issues
> such as resolution, routing, etc. So re-doing this on
> every boot increases the chances of loosing a node due
> to network problems.
vdsm-reg runs on *every* boot anyway and renames the bridge. This is
something that was debated previously, but it was decided to re-run it
every boot.
Close, but not exactly; vdsm-reg will run on every boot, but
if the relevant bridge is found, then networking is unchanged.
>
> * CA pollution; generating a certificate on each reboot
> for each node will create a huge number of certificates
> in the engine side, which eventually may damage the CA.
> (Unsure if there's a limitation to certificates number,
> but having hundreds of junk cert's can't be good).
We could have vdsm/engine store the certs on the engine side, and on
boot, after validating the host (however that is done), it will load the
certs onto the node machine.
This is a security issue, since the key pair should be
generated on the node. This will lead us back to your TPM
suggestion, but (although I like it, ) will cause us
to be tpm-dependent, not to mention a non-trivial implementation.
>
> * Today there's a supported flow that for nodes with
> password, the user is allowed to use the "add host"
> scenario. For stateless, it means re-configuring a password
> on every boot...
Stateless is really targeted for a PXE environment. There is a
supported kernel param that can be set that will set this password.
Also, if we follow the design mentioned ^^, then it's not an issue since
the host will auto-approve itself when it connects
>
> - Other issues
>
> * Local storage; so far we were able to define a local
> storage in ovirt node. Stateless will block this ability.
Yes, this would be unavailable if you're running stateless. I think
that's a fine tradeoff since people want the host to be diskless.
>
> * Node upgrade; currently it's possible to upgrade a node
> from the engine. In stateless it will error, since no where
> to d/l the iso file to.
Upgrade is handled easily by rebooting the host after updating the pxe
server
>
> * Collecting information; core dumps and logging may not
> be available due to lack of space? Or will it cause kernel
> panic if all space is consumed?
A valid concern, but a stateless environment would likely have
collectd/rsyslog/netconsole servers running elsewhere that will collect
the logs. kdumps can be configured to dump remotely as well.
This will also need
some work on the vdsm side.
>
Another concern raised is swap and overcommit. First version would
likely disable swap completely. This would disable overcommit as well.
Future versions could enable a local disk to be used completely for
swap, but that is another tradeoff that people would need to evaluate
when choosing between stateless and stateful installs.
Indeed so- completely forgot
about swap...
Mike
--
/d
“Funny,” he intoned funereally, “how just when you think life can't possibly get any
worse it suddenly does.” --Douglas Adams, The Hitchhiker's Guide to the Galaxy