[node-devel] Support for stateless nodes

Wed Feb 22 17:10:37 UTC 2012

On 22/02/12 18:06, Perry Myers wrote:
>> * Just stating the obvious, which is users need
>> to remove-add the host on every reboot. This will
>> not make this feature a lovable one from user's point of view.
> 
> I think the point mburns is trying to make in his initial email is that
> we're going to need to do some joint work between node and vdsm teams to
> change the registration process so that this is no longer necessary.
> 
> It will require some redesigning of the registration process
> 
I'm aware of it, and that's why I'm raising my concerns, so we'll
have a (partial) task list ;)

>> * During initial boot, vdsm-reg configures the networking
>> and creates a management network bridge. This is a very
>> delicate process which may fail due to networking issues
>> such as resolution, routing, etc. So re-doing this on
>> every boot increases the chances of loosing a node due
>> to network problems.
> 
> Well, if the network is busted which leads to the bridge rename failing,
> wouldn't the fact that the network is broken cause other problems anyhow?
> 
Perry, my point is that we're increasing the chances to get
into these holes. Network is not busted most of the time, but occasionally
there's a glitch and we'd like to stay away from it. I'm sure
you know what I'm talking about.

> So I don't see this as a problem.  If your network doesn't work
> properly, don't expect hosts in the network to subsequently work properly.

See previous answer.

> As an aside, why is reverse DNS lookup a requirement?  If we remove that
> it makes things a lot easier, no?
> 
Not sure I'm the right guy to defend it, but in order to drop reverse-dns,
you need to consider dropping SSL, LDAP and some other important shortcuts...

>> * CA pollution; generating a certificate on each reboot
>> for each node will create a huge number of certificates
>> in the engine side, which eventually may damage the CA.
>> (Unsure if there's a limitation to certificates number,
>> but having hundreds of junk cert's can't be good).
> 
> I don't think we should regenerate a new certificate on each boot.  I
> think we need a way for 'an already registered host to retrieve it's
> certificate from the oVirt Engine server'
> 
> Using an embedded encryption key (if you trust your mgmt network or are
> booting from embedded flash), or for the paranoid a key stored in TPM
> can be used to have vdsm safely retrieve this from the oVirt Engine
> server on each boot so that it's not required to regenerate/reregister
> on each boot
> 
Thoughtful redesign needed here...

>> * Today there's a supported flow that for nodes with
>> password, the user is allowed to use the "add host"
>> scenario. For stateless, it means re-configuring a password
>> on every boot...
> 
> This flow would still be applicable.  We are going to allow setting of
> the admin password embedded in the core ISO via an offline process.
> Once vdsm is fixed to use a non-root account for installation flow, this
> is no longer a problem
This is not exactly vdsm. More like vdsm-bootstrap.

> 
> Also, if we (as described above) make registrations persistent across
> reboots by changing the registration flow a bit, then the install user
> password only need be set for the initial boot anyhow.
> 
> Therefore I think as a requirement for stateless oVirt Node, we must
> have as a prerequsite removing root account usage for
> registration/installation
> 
This is both for vdsm and engine, and I'm not sure it's that trivial.

>> - Other issues
>>
>> * Local storage; so far we were able to define a local
>> storage in ovirt node. Stateless will block this ability.
> 
> It shouldn't.  The Node should be able to automatically scan locally
> attached disks to look for a well defined VG or partition label and
> based on that automatically activate/mount
> 
> Stateless doesn't imply diskless.  It is a requirement even for
> stateless node usage to be able to leverage locally attached disks both
> for VM storage and also for Swap.
> 
Still, in a pure disk-less setup you will not have local storage.
See also Mike's answer.

>> * Node upgrade; currently it's possible to upgrade a node
>> from the engine. In stateless it will error, since no where
>> to d/l the iso file to.
> 
> Upgrades are no longer needed with stateless.  To upgrade a stateless
> node all you need to do is 'reboot from a newer image'.  i.e. all
> upgrades would be done via PXE server image replacement.  So the flow of
> 'upload ISO to running oVirt Node' is no longer even necessary
> 
This is assuming PXE only use-case. I'm not sure it's the only one.

>> * Collecting information; core dumps and logging may not
>> be available due to lack of space? Or will it cause kernel
>> panic if all space is consumed?
> 
> We already provide ability to send kdumps to remote ssh/NFS location and
> already provide the ability to use both collectd and rsyslogs to pipe
> logs/stats to remote server(s).  Local logs can be set to logrotate to a
> reasonable size so that local RAM FS always contains recent log
> information for quick triage, but long term historical logging would be
> maintained on the rsyslog server
> 
This needs to be co-ordinated with log-collection, as well as the bootstrapping
code.

> Perry

-- 

/d

"Willyoupleasehelpmefixmykeyboard?Thespacebarisbroken!"