On Tue, Nov 12, 2019 at 3:27 AM <thomas(a)hoberg.net> wrote:
I guess this is a little late now... but:
I wanted to do the same, especially because the additional servers (beyond
3) would lower the relative storage cost overhead when using erasure
coding, but it's 'off the trodden path' and I cannot recommend it, after
seeing just how fragile the whole house of cards is already.
Right, we only support replica 3 or replica 2 + arbiter, not erasure coded
gluster volumes or replica 4.
The ability to extend a HCI cluster with additional storage nodes is
very
much in the genes of Gluster, but oVirt is a little more than just Gluster
and while this capability is on the roadmap according to the RHEV roadmap
RH Summit video, it's not there today.
But here is what you can already do:
* Save your pennies until you have enough for an extra two servers: HCI is
supposed to do great with 6 or 9 boxes (das rutschte einfach so
raus...sorry!)
* You can always add any additional host as a compute host. I guess you
shouldn't add it as a host for the hosted-engine VM (extra tick mark), not
because it's impossible, but because it doesn't help making things more
resilient. The extra compute nodes are well managed with all capabilities
of oVirt (HA, live-migration etc.), because that's actually very close to
the original usage model. Of course, you won't benefit from the local
storage capacity. There is a facility for adding local storage to hosts,
but I didn't find any documentation on usage or recommendations and would
represent an obstacle to things like live-migration.
* You can turn the forth host into a 'single-node HCI' for disaster
recovery testing (that's what I did, except that I didn't get very far
yet). Single node-HCI obviously has zero redundancy, but it also loses
benefits like GUI based upgrades (a chicken and egg issue). But the ability
to play with an additional DC and replication may be well worth allocating
the fourth machine to that purpose. Once you have lost a three-way HCI to
operator error or because of a software issue (as I have), that backup
seems mightily attractive, even if there doesn't seem to be any easy way to
extend that into a triple (wouldn't it be nice if you could do that without
even an interruption?). But a manual rebuild of the primary triple and
back-migration from the surviving single node HCI could save your job and
work.
* You can always add nodes to the existing Gluster, but perhaps best not
extend the pre-configured Gluster volumes across the extra nodes. While
that sounds easy and attractive, I seriously doubt it is supported by the
current code base, making the management engine and the VDSM agent
essentially blind to the extra nodes or incapable of properly handling
faults and failures. But extra volumes to be used as application level
container storage or similar should be fine, even if they won't be properly
managed by oVirt.
You can expand the existing gluster volumes across the extra nodes. The
configuration of the volume is still a replica 3 or replica 2 + arbiter
depending on how you initially set it up - and that would mean that it can
only tolerate 1 node failure in the replica set subvolume.
In my lab installations I moved the disks around on the four hosts so
that
the two primary nodes and the fourth (DR) node had a full complement, while
the arbiter node had to manage with a subset of the boot SSD being used to
host the arbitration brick. In theory you should be able to manage better
in a three node HCI by spreading the three "normal" volumes
"clockwise"
around the nodes. I used that setup for my initial Gluster based tests (not
using the HCI wizard), but that has oVirt treat the Gluster like a SAN.
The HCI wizard doesn't support the clockwise or round-robin allocation,
but if you feel adventurous you should be able to turn a standard
installation into that, purely operating with brick
additions/changes/removals but again, given the fragility of the stack and
the extend of the documentation, I'd recommend against it.
You can now also edit your ansible playbook to specify the placement of the
bricks across nodes during deployment - see
https://github.com/gluster/gluster-ansible-features/pull/34, but does
require some tweaking.
A full three-way replication (instead of 2+1 arbitration) can be
beneficial if your cluster runs on standard sized "Lego" left-overs and you
can get an extra box easily. But it will actually add write amplification
as a bit of a CPU overhead, but perhaps more importantly, at the network
level. If you got 10Gbit or better and slower storage, that may not be an
issue, if you got SSDs and Gbit, you can easily loose 33% write bandwidth,
while read-bandwith should be unaffected.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4K7ZZ2WL57G...