Hello All.
This is a follow up to the meeting minutes. Just to record my thoughts to
consider during the actual design.
To make it straight. I think we need to target creation of 3 (yes, three)
independent and completely similar setups with as less shared parts as
possible.
If we choose to go with reliability on service level than we do need 3
because:
1. If we mess up with one environment (e,g, storage will be completely dead
there) we will have 2 left working that gives us a reliability still
because one of them can fail. So it will move us out of crunch mode into
the regular work mode.
2. All consensus based algorithms generally require at least 2N+1 instances
unless they utilize some special mode. The lowest is N=1 that is 3 and it
would make sense to distribute them into different environments.
I know the concern for having even 2 envs was that we will spend more
effort to maintain them. But I think the opposite is true. Having 3 is
actually less effort to maintain if we make them similar because of:
1. We can do gradual canary update, Same as with failure. You can test
update on 1 instance leaving 2 left running that still provides
reliability. So upgrade is no longer time constrained and safe.
2. If environments are similar then once we establish the correct playbook
for one we can just apply it for second and later for third. So this
overhead is not tripled in fact and if automated than it is no additional
effort at all.
3. We are more open to test and play with one. We can even destroy it
recreate from scratch, etc. Indirectly this will reduce our effort.
I think the only real problem with it is the initial step when we should
design an ideal hardware and network layout for that. But once it is done
it will be easier to go with 3 environments. Also it may be possible to
design the plan the way that we start with just one and later convert it
into three.
Anton.
--
Anton Marchukov
Senior Software Engineer - RHEV CI - Red Hat