(I built it about 10 months ago)
I used their recipe for automated gluster node creation. Originally I
thought I had 3 replicas, then I started realizing that node 3's disk usage
was essentially nothing compared to node 1 and 2, and eventually on this
list discovered that I had an arbiter. Currently I am running on a 1Gbps
backbone, but I can dedicate a gig port (or even do bonded gig -- my
servers have 4 1Gbps interfaces, and my switch is only used for this
cluster, so it has the ports to hook them all up). I am planning on a
10gbps upgrade once I bring in some more cash to pay for it.
Last night, node 2 and 3 were up, and I rebooted node 1 for updates. As
soon as it shut down, my cluster halted (including the hosted engine), and
everything went messy. When the node came back up, I still had to recover
the hosted engine via command line, then could go in and start unpausing my
VMs. I'm glad it happened at 8pm at night...That would have been very ugly
if it happened during the day. I had thought I had enough redundancy in
the cluster that I could take down any 1 node and not have an issue...That
definitely is not what happened.
--Jim
On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii(a)gmail.com>
wrote:
These can get a little confusing but this explains it best:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-
volumes-and-quorum/#replica-2-and-replica-3-volumes
Basically in the first paragraph they are explaining why you cant have HA
with quorum for 2 nodes. Here is another overview doc that explains some
more
http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
From my understanding arbiter is good for resolving split brains. Quorum
and arbiter are two different things though quorum is a mechanism to help
you **avoid** split brain and the arbiter is to help gluster resolve split
brain by voting and other internal mechanics (as outlined in link 1). How
did you create the volume exactly - what command? It looks to me like you
created it with 'gluster volume create replica 2 arbiter 1 {....}' per your
earlier mention of "replica 2 arbiter 1". That being said, if you did that
and then setup quorum in the volume configuration, this would cause your
gluster to halt up since quorum was lost (as you saw until you recovered
node 1)
As you can see from the docs, there is still a corner case for getting in
to split brain with replica 3, which again, is where arbiter would help
gluster resolve it
I need to amend my previous statement: I was told that arbiter volume does
not store data, only metadata. I cannot find anything in the docs backing
this up however it would make sense for it to be. That being said, in my
setup, I would not include my arbiter or my third node in my ovirt VM
cluster component. I would keep it completely separate
On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim(a)palousetech.com> wrote:
> I'm now also confused as to what the point of an arbiter is / what it
> does / why one would use it.
>
> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim(a)palousetech.com> wrote:
>
>> Thanks for the help!
>>
>> Here's my gluster volume info for the data export/brick (I have 3: data,
>> engine, and iso, but they're all configured the same):
>>
>> Volume Name: data
>> Type: Replicate
>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>> Options Reconfigured:
>> performance.strict-o-direct: on
>> nfs.disable: on
>> user.cifs: off
>> network.ping-timeout: 30
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> performance.low-prio-threads: 32
>> features.shard-block-size: 512MB
>> features.shard: on
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.stat-prefetch: off
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.readdir-ahead: on
>> server.allow-insecure: on
>> [root@ovirt1 ~]#
>>
>>
>> all 3 of my brick nodes ARE also members of the virtualization cluster
>> (including ovirt3). How can I convert it into a full replica instead of
>> just an arbiter?
>>
>> Thanks!
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii(a)gmail.com>
>> wrote:
>>
>>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>>> stayed up in their appropriate places. Thanks!
>>>
>>> < Anyone on this list please feel free to correct my response to Jim if
>>> its wrong>
>>>
>>> @ Jim - If you can share your gluster volume info / status I can
>>> confirm (to the best of my knowledge). From my understanding, If you setup
>>> the volume with something like 'gluster volume set <vol> group
virt' this
>>> will configure some quorum options as well, Ex:
>>>
http://i.imgur.com/Mya4N5o.png
>>>
>>> While, yes, you are configured for arbiter node you're still losing
>>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being
arbiter
>>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>>> one gluster node drops, you still have 2 up. Although in this case, you
>>> probably wouldnt need arbiter at all
>>>
>>> If you are configured, you can drop quorum settings and just let
>>> arbiter run since you're not using arbiter node in your VM cluster part
(I
>>> believe), just storage cluster part. When using quorum, you need > 50% of
>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>>> you're actually losing 1/2 which == 50 which == degraded / hindered
gluster
>>>
>>> Again, this is to the best of my knowledge based on other quorum backed
>>> software....and this is what I understand from testing with gluster and
>>> ovirt thus far
>>>
>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim(a)palousetech.com>
>>> wrote:
>>>
>>>> Huh...Ok., how do I convert the arbitrar to full replica, then? I was
>>>> misinformed when I created this setup. I thought the arbitrator held
>>>> enough metadata that it could validate or refudiate any one replica
(kinda
>>>> like the parity drive for a RAID-4 array). I was also under the
impression
>>>> that one replica + Arbitrator is enough to keep the array online and
>>>> functional.
>>>>
>>>> --Jim
>>>>
>>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler
<ckozleriii(a)gmail.com>
>>>> wrote:
>>>>
>>>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator
>>>>> only stores metadata, no actual files. So yes, you were running in
degraded
>>>>> mode so some operations were hindered.
>>>>>
>>>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>>>> However, the issue I am still facing, is when I go to create a new
gluster
>>>>> storage domain (replica 3, hyperconverged) and I tell it "Host
to use" and
>>>>> I select that host. If I fail that host, all VMs halt. I do not
recall this
>>>>> in 3.6 or early 4.0. This to me makes it seem like this is
"pinning" a node
>>>>> to a volume and vice versa like you could, for instance, for a
singular
>>>>> hyperconverged to ex: export a local disk via NFS and then mount it
via
>>>>> ovirt domain. But of course, this has its caveats. To that end, I am
using
>>>>> gluster replica 3, when configuring it I say "host to use:
" node 1, then
>>>>> in the connection details I give it node1:/data. I fail node1, all
VMs
>>>>> halt. Did I miss something?
>>>>>
>>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose
<sabose(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>> To the OP question, when you set up a gluster storage domain,
you
>>>>>> need to specify
backup-volfile-servers=<server2>:<server3> where
>>>>>> server2 and server3 also have bricks running. When server1 is
down, and the
>>>>>> volume is mounted again - server2 or server3 are queried to get
the gluster
>>>>>> volfiles.
>>>>>>
>>>>>> @Jim, if this does not work, are you using 4.1.5 build with
libgfapi
>>>>>> access? If not, please provide the vdsm and gluster mount logs to
analyse
>>>>>>
>>>>>> If VMs go to paused state - this could mean the storage is not
>>>>>> available. You can check "gluster volume status
<volname>" to see if
>>>>>> atleast 2 bricks are running.
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson
<johan(a)kafit.se>
>>>>>> wrote:
>>>>>>
>>>>>>> If gluster drops in quorum so that it has less votes than it
should
>>>>>>> it will stop file operations until quorum is back to
normal.If i rember it
>>>>>>> right you need two bricks to write for quorum to be met and
that the
>>>>>>> arbiter only is a vote to avoid split brain.
>>>>>>>
>>>>>>>
>>>>>>> Basically what you have is a raid5 solution without a spare.
And
>>>>>>> when one disk dies it will run in degraded mode. And some
raid systems will
>>>>>>> stop the raid until you have removed the disk or forced it to
run anyway.
>>>>>>>
>>>>>>> You can read up on it here:
https://gluster.readthed
>>>>>>>
ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>>>>>
>>>>>>> /Johan
>>>>>>>
>>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>>>>>
>>>>>>> Hi all:
>>>>>>>
>>>>>>> Sorry to hijack the thread, but I was about to start
essentially
>>>>>>> the same thread.
>>>>>>>
>>>>>>> I have a 3 node cluster, all three are hosts and gluster
nodes
>>>>>>> (replica 2 + arbitrar). I DO have the
mnt_options=backup-volfile-servers=
>>>>>>> set:
>>>>>>>
>>>>>>> storage=192.168.8.11:/engine
>>>>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>>>>>
>>>>>>> I had an issue today where 192.168.8.11 went down. ALL VMs
>>>>>>> immediately paused, including the engine (all VMs were
running on
>>>>>>> host2:192.168.8.12). I couldn't get any gluster stuff
working until host1
>>>>>>> (192.168.8.11) was restored.
>>>>>>>
>>>>>>> What's wrong / what did I miss?
>>>>>>>
>>>>>>> (this was set up "manually" through the article on
setting up
>>>>>>> self-hosted gluster cluster back when 4.0 was new..I've
upgraded it to 4.1
>>>>>>> since).
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --Jim
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <
>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>
>>>>>>> Typo..."Set it up and then failed that **HOST**"
>>>>>>>
>>>>>>> And upon that host going down, the storage domain went down.
I only
>>>>>>> have hosted storage domain and this new one - is this why the
DC went down
>>>>>>> and no SPM could be elected?
>>>>>>>
>>>>>>> I dont recall this working this way in early 4.0 or 3.6
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <
>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>
>>>>>>> So I've tested this today and I failed a node.
Specifically, I
>>>>>>> setup a glusterfs domain and selected "host to use:
node1". Set it up and
>>>>>>> then failed that VM
>>>>>>>
>>>>>>> However, this did not work and the datacenter went down. My
engine
>>>>>>> stayed up, however, it seems configuring a domain to pin to a
host to use
>>>>>>> will obviously cause it to fail
>>>>>>>
>>>>>>> This seems counter-intuitive to the point of glusterfs or
any
>>>>>>> redundant storage. If a single host has to be tied to its
function, this
>>>>>>> introduces a single point of failure
>>>>>>>
>>>>>>> Am I missing something obvious?
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra
<knarra(a)redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> yes, right. What you can do is edit the hosted-engine.conf
file
>>>>>>> and there is a parameter as shown below [1] and replace h2
and h3 with your
>>>>>>> second and third storage servers. Then you will need to
restart
>>>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes
.
>>>>>>>
>>>>>>> [1]
'mnt_options=backup-volfile-servers=<h2>:<h3>'
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <
>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Kasturi -
>>>>>>>
>>>>>>> Thanks for feedback
>>>>>>>
>>>>>>> > If cockpit+gdeploy plugin would be have been used then
that
>>>>>>> would have automatically detected glusterfs replica 3 volume
created during
>>>>>>> Hosted Engine deployment and this question would not have
been asked
>>>>>>>
>>>>>>> Actually, doing hosted-engine --deploy it too also auto
detects
>>>>>>> glusterfs. I know glusterfs fuse client has the ability to
failover
>>>>>>> between all nodes in cluster, but I am still curious given
the fact that I
>>>>>>> see in ovirt config node1:/engine (being node1 I set it to in
hosted-engine
>>>>>>> --deploy). So my concern was to ensure and find out exactly
how engine
>>>>>>> works when one node goes away and the fuse client moves over
to the other
>>>>>>> node in the gluster cluster
>>>>>>>
>>>>>>> But you did somewhat answer my question, the answer seems to
be no
>>>>>>> (as default) and I will have to use hosted-engine.conf and
change the
>>>>>>> parameter as you list
>>>>>>>
>>>>>>> So I need to do something manual to create HA for engine on
>>>>>>> gluster? Yes?
>>>>>>>
>>>>>>> Thanks so much!
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra
<knarra(a)redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> During Hosted Engine setup question about glusterfs volume
is
>>>>>>> being asked because you have setup the volumes yourself. If
cockpit+gdeploy
>>>>>>> plugin would be have been used then that would have
automatically detected
>>>>>>> glusterfs replica 3 volume created during Hosted Engine
deployment and this
>>>>>>> question would not have been asked.
>>>>>>>
>>>>>>> During new storage domain creation when glusterfs is
selected
>>>>>>> there is a feature called 'use managed gluster
volumes' and upon checking
>>>>>>> this all glusterfs volumes managed will be listed and you
could choose the
>>>>>>> volume of your choice from the dropdown list.
>>>>>>>
>>>>>>> There is a conf file called
/etc/hosted-engine/hosted-engine.conf
>>>>>>> where there is a parameter called
backup-volfile-servers="h1:h2" and if one
>>>>>>> of the gluster node goes down engine uses this parameter to
provide ha /
>>>>>>> failover.
>>>>>>>
>>>>>>> Hope this helps !!
>>>>>>>
>>>>>>> Thanks
>>>>>>> kasturi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <
>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>
>>>>>>> Hello -
>>>>>>>
>>>>>>> I have successfully created a hyperconverged hosted engine
setup
>>>>>>> consisting of 3 nodes - 2 for VM's and the third purely
for storage. I
>>>>>>> manually configured it all, did not use ovirt node or
anything. Built the
>>>>>>> gluster volumes myself
>>>>>>>
>>>>>>> However, I noticed that when setting up the hosted engine and
even
>>>>>>> when adding a new storage domain with glusterfs type, it
still asks for
>>>>>>> hostname:/volumename
>>>>>>>
>>>>>>> This leads me to believe that if that one node goes down
(ex:
>>>>>>> node1:/data), then ovirt engine wont be able to communicate
with that
>>>>>>> volume because its trying to reach it on node 1 and thus, go
down
>>>>>>>
>>>>>>> I know glusterfs fuse client can connect to all nodes to
provide
>>>>>>> failover/ha but how does the engine handle this?
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing
listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>
>