
On Wed, 2012-02-22 at 17:33 +0200, Doron Fediuck wrote:
On 22/02/12 16:57, Mike Burns wrote:
There has been a lot of interest in being able to run stateless Nodes with ovirt-engine. ovirt-node has designed a way [1] to achieve this on the node side, but we need input from the engine and vdsm teams to see if we're missing some requirement or if there needs to be changes on the engine/vdsm side to achieve this.
As it currently stands, every time you reboot an ovirt-node that is stateless, it would require manually removing the host in engine, then re-registering/approving it again in engine.
Any thoughts, concerns, input on how to solve this?
Thanks
Mike
Some points need to be considered;
- Installation issues
* Just stating the obvious, which is users need to remove-add the host on every reboot. This will not make this feature a lovable one from user's point of view.
Yes, this is something that will cause this to be a non-starter. We'd need to change something in the engine/vdsm to make it smoother. Perhaps, a flag in engine on the host saying that it's stateless. Then if a host comes up with the same information, but no certs, etc, it would validate some other embedded key (TPM, key embedded in the node itself), and auto-approve it to be the same state as the previous boot
* During initial boot, vdsm-reg configures the networking and creates a management network bridge. This is a very delicate process which may fail due to networking issues such as resolution, routing, etc. So re-doing this on every boot increases the chances of loosing a node due to network problems.
vdsm-reg runs on *every* boot anyway and renames the bridge. This is something that was debated previously, but it was decided to re-run it every boot.
* CA pollution; generating a certificate on each reboot for each node will create a huge number of certificates in the engine side, which eventually may damage the CA. (Unsure if there's a limitation to certificates number, but having hundreds of junk cert's can't be good).
We could have vdsm/engine store the certs on the engine side, and on boot, after validating the host (however that is done), it will load the certs onto the node machine.
* Today there's a supported flow that for nodes with password, the user is allowed to use the "add host" scenario. For stateless, it means re-configuring a password on every boot...
Stateless is really targeted for a PXE environment. There is a supported kernel param that can be set that will set this password. Also, if we follow the design mentioned ^^, then it's not an issue since the host will auto-approve itself when it connects
- Other issues
* Local storage; so far we were able to define a local storage in ovirt node. Stateless will block this ability.
Yes, this would be unavailable if you're running stateless. I think that's a fine tradeoff since people want the host to be diskless.
* Node upgrade; currently it's possible to upgrade a node from the engine. In stateless it will error, since no where to d/l the iso file to.
Upgrade is handled easily by rebooting the host after updating the pxe server
* Collecting information; core dumps and logging may not be available due to lack of space? Or will it cause kernel panic if all space is consumed?
A valid concern, but a stateless environment would likely have collectd/rsyslog/netconsole servers running elsewhere that will collect the logs. kdumps can be configured to dump remotely as well.
Another concern raised is swap and overcommit. First version would likely disable swap completely. This would disable overcommit as well. Future versions could enable a local disk to be used completely for swap, but that is another tradeoff that people would need to evaluate when choosing between stateless and stateful installs. Mike