[ovirt-users] Re: Beginning oVirt / Hyperconverged

Monday, 11 November 2019

I guess this is a little late now... but:

I wanted to do the same, especially because the additional servers (beyond 3) would lower
the relative storage cost overhead when using erasure coding, but it's 'off the
trodden path' and I cannot recommend it, after seeing just how fragile the whole house
of cards is already.

The ability to extend a HCI cluster with additional storage nodes is very much in the
genes of Gluster, but oVirt is a little more than just Gluster and while this capability
is on the roadmap according to the RHEV roadmap RH Summit video, it's not there
today.

But here is what you can already do:
* Save your pennies until you have enough for an extra two servers: HCI is supposed to do
great with 6 or 9 boxes (das rutschte einfach so raus...sorry!)

* You can always add any additional host as a compute host. I guess you shouldn't add
it as a host for the hosted-engine VM (extra tick mark), not because it's impossible,
but because it doesn't help making things more resilient. The extra compute nodes are
well managed with all capabilities of oVirt (HA, live-migration etc.), because that's
actually very close to the original usage model. Of course, you won't benefit from the
local storage capacity. There is a facility for adding local storage to hosts, but I
didn't find any documentation on usage or recommendations and would represent an
obstacle to things like live-migration.

* You can turn the forth host into a 'single-node HCI' for disaster recovery
testing (that's what I did, except that I didn't get very far yet). Single
node-HCI obviously has zero redundancy, but it also loses benefits like GUI based upgrades
(a chicken and egg issue). But the ability to play with an additional DC and replication
may be well worth allocating the fourth machine to that purpose. Once you have lost a
three-way HCI to operator error or because of a software issue (as I have), that backup
seems mightily attractive, even if there doesn't seem to be any easy way to extend
that into a triple (wouldn't it be nice if you could do that without even an
interruption?). But a manual rebuild of the primary triple and back-migration from the
surviving single node HCI could save your job and work.

* You can always add nodes to the existing Gluster, but perhaps best not extend the
pre-configured Gluster volumes across the extra nodes. While that sounds easy and
attractive, I seriously doubt it is supported by the current code base, making the
management engine and the VDSM agent essentially blind to the extra nodes or incapable of
properly handling faults and failures. But extra volumes to be used as application level
container storage or similar should be fine, even if they won't be properly managed by
oVirt.

In my lab installations I moved the disks around on the four hosts so that the two primary
nodes and the fourth (DR) node had a full complement, while the arbiter node had to manage
with a subset of the boot SSD being used to host the arbitration brick. In theory you
should be able to manage better in a three node HCI by spreading the three
"normal" volumes "clockwise" around the nodes. I used that setup for
my initial Gluster based tests (not using the HCI wizard), but that has oVirt treat the
Gluster like a SAN.

The HCI wizard doesn't support the clockwise or round-robin allocation, but if you
feel adventurous you should be able to turn a standard installation into that, purely
operating with brick additions/changes/removals but again, given the fragility of the
stack and the extend of the documentation, I'd recommend against it.

A full three-way replication (instead of 2+1 arbitration) can be beneficial if your
cluster runs on standard sized "Lego" left-overs and you can get an extra box
easily. But it will actually add write amplification as a bit of a CPU overhead, but
perhaps more importantly, at the network level. If you got 10Gbit or better and slower
storage, that may not be an issue, if you got SSDs and Gbit, you can easily loose 33%
write bandwidth, while read-bandwith should be unaffected.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: Beginning oVirt / Hyperconverged