@ Jim - here is my setup which I will test in a few (brand new cluster) and
report back what I found in my tests
- 3x servers direct connected via 10Gb
- 2 of those 3 setup in ovirt as hosts
- Hosted engine
- Gluster replica 3 (no arbiter) for all volumes
- 1x engine volume gluster replica 3 manually configured (not using ovirt
managed gluster)
- 1x datatest volume (20gb) replica 3 manually configured (not using ovirt
managed gluster)
- 1x nfstest domain served from some other server in my infrastructure
which, at the time of my original testing, was master domain
I tested this earlier and all VMs stayed online. However, ovirt cluster
reported DC/cluster down, all VM's stayed up
As I am now typing this, can you confirm you setup your gluster storage
domain with backupvol? Also, confirm you updated hosted-engine.conf with
backupvol mount option as well?
On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim(a)palousetech.com> wrote:
So, after reading the first document twice and the 2nd link
thoroughly
once, I believe that the arbitrator volume should be sufficient and count
for replica / split brain. EG, if any one full replica is down, and the
arbitrator and the other replica is up, then it should have quorum and all
should be good.
I think my underlying problem has to do more with config than the replica
state. That said, I did size the drive on my 3rd node planning to have an
identical copy of all data on it, so I'm still not opposed to making it a
full replica.
Did I miss something here?
Thanks!
On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii(a)gmail.com>
wrote:
> These can get a little confusing but this explains it best:
>
https://gluster.readthedocs.io/en/latest/Administrator
> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have HA
> with quorum for 2 nodes. Here is another overview doc that explains some
> more
>
>
http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains. Quorum
> and arbiter are two different things though quorum is a mechanism to help
> you **avoid** split brain and the arbiter is to help gluster resolve split
> brain by voting and other internal mechanics (as outlined in link 1). How
> did you create the volume exactly - what command? It looks to me like you
> created it with 'gluster volume create replica 2 arbiter 1 {....}' per your
> earlier mention of "replica 2 arbiter 1". That being said, if you did that
> and then setup quorum in the volume configuration, this would cause your
> gluster to halt up since quorum was lost (as you saw until you recovered
> node 1)
>
> As you can see from the docs, there is still a corner case for getting in
> to split brain with replica 3, which again, is where arbiter would help
> gluster resolve it
>
> I need to amend my previous statement: I was told that arbiter volume
> does not store data, only metadata. I cannot find anything in the docs
> backing this up however it would make sense for it to be. That being said,
> in my setup, I would not include my arbiter or my third node in my ovirt VM
> cluster component. I would keep it completely separate
>
>
> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim(a)palousetech.com> wrote:
>
>> I'm now also confused as to what the point of an arbiter is / what it
>> does / why one would use it.
>>
>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim(a)palousetech.com>
>> wrote:
>>
>>> Thanks for the help!
>>>
>>> Here's my gluster volume info for the data export/brick (I have 3:
>>> data, engine, and iso, but they're all configured the same):
>>>
>>> Volume Name: data
>>> Type: Replicate
>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>> Options Reconfigured:
>>> performance.strict-o-direct: on
>>> nfs.disable: on
>>> user.cifs: off
>>> network.ping-timeout: 30
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10000
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> performance.low-prio-threads: 32
>>> features.shard-block-size: 512MB
>>> features.shard: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>> server.allow-insecure: on
>>> [root@ovirt1 ~]#
>>>
>>>
>>> all 3 of my brick nodes ARE also members of the virtualization cluster
>>> (including ovirt3). How can I convert it into a full replica instead of
>>> just an arbiter?
>>>
>>> Thanks!
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii(a)gmail.com>
>>> wrote:
>>>
>>>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>>>> stayed up in their appropriate places. Thanks!
>>>>
>>>> < Anyone on this list please feel free to correct my response to Jim
>>>> if its wrong>
>>>>
>>>> @ Jim - If you can share your gluster volume info / status I can
>>>> confirm (to the best of my knowledge). From my understanding, If you
setup
>>>> the volume with something like 'gluster volume set <vol> group
virt' this
>>>> will configure some quorum options as well, Ex:
>>>>
http://i.imgur.com/Mya4N5o.png
>>>>
>>>> While, yes, you are configured for arbiter node you're still losing
>>>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being
arbiter
>>>> to configure quorum which is in effect 3 writable nodes and 1 arbiter.
If
>>>> one gluster node drops, you still have 2 up. Although in this case, you
>>>> probably wouldnt need arbiter at all
>>>>
>>>> If you are configured, you can drop quorum settings and just let
>>>> arbiter run since you're not using arbiter node in your VM cluster
part (I
>>>> believe), just storage cluster part. When using quorum, you need > 50%
of
>>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>>>> you're actually losing 1/2 which == 50 which == degraded / hindered
gluster
>>>>
>>>> Again, this is to the best of my knowledge based on other quorum
>>>> backed software....and this is what I understand from testing with
gluster
>>>> and ovirt thus far
>>>>
>>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim(a)palousetech.com>
>>>> wrote:
>>>>
>>>>> Huh...Ok., how do I convert the arbitrar to full replica, then? I
>>>>> was misinformed when I created this setup. I thought the arbitrator
held
>>>>> enough metadata that it could validate or refudiate any one replica
(kinda
>>>>> like the parity drive for a RAID-4 array). I was also under the
impression
>>>>> that one replica + Arbitrator is enough to keep the array online
and
>>>>> functional.
>>>>>
>>>>> --Jim
>>>>>
>>>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler
<ckozleriii(a)gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @ Jim - you have only two data volumes and lost quorum.
Arbitrator
>>>>>> only stores metadata, no actual files. So yes, you were running
in degraded
>>>>>> mode so some operations were hindered.
>>>>>>
>>>>>> @ Sahina - Yes, this actually worked fine for me once I did
that.
>>>>>> However, the issue I am still facing, is when I go to create a
new gluster
>>>>>> storage domain (replica 3, hyperconverged) and I tell it
"Host to use" and
>>>>>> I select that host. If I fail that host, all VMs halt. I do not
recall this
>>>>>> in 3.6 or early 4.0. This to me makes it seem like this is
"pinning" a node
>>>>>> to a volume and vice versa like you could, for instance, for a
singular
>>>>>> hyperconverged to ex: export a local disk via NFS and then mount
it via
>>>>>> ovirt domain. But of course, this has its caveats. To that end, I
am using
>>>>>> gluster replica 3, when configuring it I say "host to use:
" node 1, then
>>>>>> in the connection details I give it node1:/data. I fail node1,
all VMs
>>>>>> halt. Did I miss something?
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose
<sabose(a)redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> To the OP question, when you set up a gluster storage domain,
you
>>>>>>> need to specify
backup-volfile-servers=<server2>:<server3> where
>>>>>>> server2 and server3 also have bricks running. When server1 is
down, and the
>>>>>>> volume is mounted again - server2 or server3 are queried to
get the gluster
>>>>>>> volfiles.
>>>>>>>
>>>>>>> @Jim, if this does not work, are you using 4.1.5 build with
>>>>>>> libgfapi access? If not, please provide the vdsm and gluster
mount logs to
>>>>>>> analyse
>>>>>>>
>>>>>>> If VMs go to paused state - this could mean the storage is
not
>>>>>>> available. You can check "gluster volume status
<volname>" to see if
>>>>>>> atleast 2 bricks are running.
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson
<johan(a)kafit.se
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> If gluster drops in quorum so that it has less votes than
it
>>>>>>>> should it will stop file operations until quorum is back
to normal.If i
>>>>>>>> rember it right you need two bricks to write for quorum
to be met and that
>>>>>>>> the arbiter only is a vote to avoid split brain.
>>>>>>>>
>>>>>>>>
>>>>>>>> Basically what you have is a raid5 solution without a
spare. And
>>>>>>>> when one disk dies it will run in degraded mode. And some
raid systems will
>>>>>>>> stop the raid until you have removed the disk or forced
it to run anyway.
>>>>>>>>
>>>>>>>> You can read up on it here:
https://gluster.readthed
>>>>>>>>
ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>>>>>>
>>>>>>>> /Johan
>>>>>>>>
>>>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>>>>>>
>>>>>>>> Hi all:
>>>>>>>>
>>>>>>>> Sorry to hijack the thread, but I was about to start
essentially
>>>>>>>> the same thread.
>>>>>>>>
>>>>>>>> I have a 3 node cluster, all three are hosts and gluster
nodes
>>>>>>>> (replica 2 + arbitrar). I DO have the
mnt_options=backup-volfile-servers=
>>>>>>>> set:
>>>>>>>>
>>>>>>>> storage=192.168.8.11:/engine
>>>>>>>>
mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>>>>>>
>>>>>>>> I had an issue today where 192.168.8.11 went down. ALL
VMs
>>>>>>>> immediately paused, including the engine (all VMs were
running on
>>>>>>>> host2:192.168.8.12). I couldn't get any gluster
stuff working until host1
>>>>>>>> (192.168.8.11) was restored.
>>>>>>>>
>>>>>>>> What's wrong / what did I miss?
>>>>>>>>
>>>>>>>> (this was set up "manually" through the article
on setting up
>>>>>>>> self-hosted gluster cluster back when 4.0 was
new..I've upgraded it to 4.1
>>>>>>>> since).
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> --Jim
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <
>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>
>>>>>>>> Typo..."Set it up and then failed that
**HOST**"
>>>>>>>>
>>>>>>>> And upon that host going down, the storage domain went
down. I
>>>>>>>> only have hosted storage domain and this new one - is
this why the DC went
>>>>>>>> down and no SPM could be elected?
>>>>>>>>
>>>>>>>> I dont recall this working this way in early 4.0 or 3.6
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <
>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>
>>>>>>>> So I've tested this today and I failed a node.
Specifically, I
>>>>>>>> setup a glusterfs domain and selected "host to use:
node1". Set it up and
>>>>>>>> then failed that VM
>>>>>>>>
>>>>>>>> However, this did not work and the datacenter went down.
My engine
>>>>>>>> stayed up, however, it seems configuring a domain to pin
to a host to use
>>>>>>>> will obviously cause it to fail
>>>>>>>>
>>>>>>>> This seems counter-intuitive to the point of glusterfs or
any
>>>>>>>> redundant storage. If a single host has to be tied to its
function, this
>>>>>>>> introduces a single point of failure
>>>>>>>>
>>>>>>>> Am I missing something obvious?
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra
<knarra(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> yes, right. What you can do is edit the
hosted-engine.conf file
>>>>>>>> and there is a parameter as shown below [1] and replace
h2 and h3 with your
>>>>>>>> second and third storage servers. Then you will need to
restart
>>>>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the
nodes .
>>>>>>>>
>>>>>>>> [1]
'mnt_options=backup-volfile-servers=<h2>:<h3>'
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <
>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Kasturi -
>>>>>>>>
>>>>>>>> Thanks for feedback
>>>>>>>>
>>>>>>>> > If cockpit+gdeploy plugin would be have been used
then that
>>>>>>>> would have automatically detected glusterfs replica 3
volume created during
>>>>>>>> Hosted Engine deployment and this question would not have
been asked
>>>>>>>>
>>>>>>>> Actually, doing hosted-engine --deploy it too also auto
detects
>>>>>>>> glusterfs. I know glusterfs fuse client has the ability
to failover
>>>>>>>> between all nodes in cluster, but I am still curious
given the fact that I
>>>>>>>> see in ovirt config node1:/engine (being node1 I set it
to in hosted-engine
>>>>>>>> --deploy). So my concern was to ensure and find out
exactly how engine
>>>>>>>> works when one node goes away and the fuse client moves
over to the other
>>>>>>>> node in the gluster cluster
>>>>>>>>
>>>>>>>> But you did somewhat answer my question, the answer seems
to be no
>>>>>>>> (as default) and I will have to use hosted-engine.conf
and change the
>>>>>>>> parameter as you list
>>>>>>>>
>>>>>>>> So I need to do something manual to create HA for engine
on
>>>>>>>> gluster? Yes?
>>>>>>>>
>>>>>>>> Thanks so much!
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra
<knarra(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> During Hosted Engine setup question about glusterfs
volume is
>>>>>>>> being asked because you have setup the volumes yourself.
If cockpit+gdeploy
>>>>>>>> plugin would be have been used then that would have
automatically detected
>>>>>>>> glusterfs replica 3 volume created during Hosted Engine
deployment and this
>>>>>>>> question would not have been asked.
>>>>>>>>
>>>>>>>> During new storage domain creation when glusterfs is
selected
>>>>>>>> there is a feature called 'use managed gluster
volumes' and upon checking
>>>>>>>> this all glusterfs volumes managed will be listed and you
could choose the
>>>>>>>> volume of your choice from the dropdown list.
>>>>>>>>
>>>>>>>> There is a conf file called
/etc/hosted-engine/hosted-engine.conf
>>>>>>>> where there is a parameter called
backup-volfile-servers="h1:h2" and if one
>>>>>>>> of the gluster node goes down engine uses this parameter
to provide ha /
>>>>>>>> failover.
>>>>>>>>
>>>>>>>> Hope this helps !!
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> kasturi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <
>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello -
>>>>>>>>
>>>>>>>> I have successfully created a hyperconverged hosted
engine setup
>>>>>>>> consisting of 3 nodes - 2 for VM's and the third
purely for storage. I
>>>>>>>> manually configured it all, did not use ovirt node or
anything. Built the
>>>>>>>> gluster volumes myself
>>>>>>>>
>>>>>>>> However, I noticed that when setting up the hosted engine
and even
>>>>>>>> when adding a new storage domain with glusterfs type, it
still asks for
>>>>>>>> hostname:/volumename
>>>>>>>>
>>>>>>>> This leads me to believe that if that one node goes down
(ex:
>>>>>>>> node1:/data), then ovirt engine wont be able to
communicate with that
>>>>>>>> volume because its trying to reach it on node 1 and thus,
go down
>>>>>>>>
>>>>>>>> I know glusterfs fuse client can connect to all nodes to
provide
>>>>>>>> failover/ha but how does the engine handle this?
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing
listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>