Jim -
here is my test:
- All VM's on node2: hosted engine and 1 test VM
- Test VM on gluster storage domain (with mount options set)
- hosted engine is on gluster as well, with settings persisted to
hosted-engine.conf for backupvol
All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
an issue or anything
However, what I did notice during this, is my /datatest volume doesnt have
quorum set. So I will set that now and report back what happens
# gluster volume info datatest
Volume Name: datatest
Type: Replicate
Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/data/datatest/brick1
Brick2: node2:/gluster/data/datatest/brick1
Brick3: node3:/gluster/data/datatest/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Perhaps quorum may be more trouble than its worth when you have 3 nodes
and/or 2 nodes + arbiter?
Since I am keeping my 3rd node out of ovirt, I am more content on keeping
it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
my storage 100% quorum
On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim(a)palousetech.com> wrote:
I can confirm that I did set it up manually, and I did specify
backupvol,
and in the "manage domain" storage settings, I do have under mount
options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was
done at initial install time).
The "used managed gluster" checkbox is NOT checked, and if I check it and
save settings, next time I go in it is not checked.
--Jim
On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii(a)gmail.com>
wrote:
> @ Jim - here is my setup which I will test in a few (brand new cluster)
> and report back what I found in my tests
>
> - 3x servers direct connected via 10Gb
> - 2 of those 3 setup in ovirt as hosts
> - Hosted engine
> - Gluster replica 3 (no arbiter) for all volumes
> - 1x engine volume gluster replica 3 manually configured (not using ovirt
> managed gluster)
> - 1x datatest volume (20gb) replica 3 manually configured (not using
> ovirt managed gluster)
> - 1x nfstest domain served from some other server in my infrastructure
> which, at the time of my original testing, was master domain
>
> I tested this earlier and all VMs stayed online. However, ovirt cluster
> reported DC/cluster down, all VM's stayed up
>
> As I am now typing this, can you confirm you setup your gluster storage
> domain with backupvol? Also, confirm you updated hosted-engine.conf with
> backupvol mount option as well?
>
> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim(a)palousetech.com> wrote:
>
>> So, after reading the first document twice and the 2nd link thoroughly
>> once, I believe that the arbitrator volume should be sufficient and count
>> for replica / split brain. EG, if any one full replica is down, and the
>> arbitrator and the other replica is up, then it should have quorum and all
>> should be good.
>>
>> I think my underlying problem has to do more with config than the
>> replica state. That said, I did size the drive on my 3rd node planning to
>> have an identical copy of all data on it, so I'm still not opposed to
>> making it a full replica.
>>
>> Did I miss something here?
>>
>> Thanks!
>>
>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii(a)gmail.com>
>> wrote:
>>
>>> These can get a little confusing but this explains it best:
>>>
https://gluster.readthedocs.io/en/latest/Administrator
>>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>>
>>> Basically in the first paragraph they are explaining why you cant have
>>> HA with quorum for 2 nodes. Here is another overview doc that explains some
>>> more
>>>
>>>
http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>>
>>> From my understanding arbiter is good for resolving split brains.
>>> Quorum and arbiter are two different things though quorum is a mechanism to
>>> help you **avoid** split brain and the arbiter is to help gluster resolve
>>> split brain by voting and other internal mechanics (as outlined in link 1).
>>> How did you create the volume exactly - what command? It looks to me like
>>> you created it with 'gluster volume create replica 2 arbiter 1
{....}' per
>>> your earlier mention of "replica 2 arbiter 1". That being said, if
you did
>>> that and then setup quorum in the volume configuration, this would cause
>>> your gluster to halt up since quorum was lost (as you saw until you
>>> recovered node 1)
>>>
>>> As you can see from the docs, there is still a corner case for getting
>>> in to split brain with replica 3, which again, is where arbiter would help
>>> gluster resolve it
>>>
>>> I need to amend my previous statement: I was told that arbiter volume
>>> does not store data, only metadata. I cannot find anything in the docs
>>> backing this up however it would make sense for it to be. That being said,
>>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>>> cluster component. I would keep it completely separate
>>>
>>>
>>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim(a)palousetech.com>
>>> wrote:
>>>
>>>> I'm now also confused as to what the point of an arbiter is / what
it
>>>> does / why one would use it.
>>>>
>>>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim(a)palousetech.com>
>>>> wrote:
>>>>
>>>>> Thanks for the help!
>>>>>
>>>>> Here's my gluster volume info for the data export/brick (I have
3:
>>>>> data, engine, and iso, but they're all configured the same):
>>>>>
>>>>> Volume Name: data
>>>>> Type: Replicate
>>>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>>>> Options Reconfigured:
>>>>> performance.strict-o-direct: on
>>>>> nfs.disable: on
>>>>> user.cifs: off
>>>>> network.ping-timeout: 30
>>>>> cluster.shd-max-threads: 8
>>>>> cluster.shd-wait-qlength: 10000
>>>>> cluster.locking-scheme: granular
>>>>> cluster.data-self-heal-algorithm: full
>>>>> performance.low-prio-threads: 32
>>>>> features.shard-block-size: 512MB
>>>>> features.shard: on
>>>>> storage.owner-gid: 36
>>>>> storage.owner-uid: 36
>>>>> cluster.server-quorum-type: server
>>>>> cluster.quorum-type: auto
>>>>> network.remote-dio: enable
>>>>> cluster.eager-lock: enable
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.read-ahead: off
>>>>> performance.quick-read: off
>>>>> performance.readdir-ahead: on
>>>>> server.allow-insecure: on
>>>>> [root@ovirt1 ~]#
>>>>>
>>>>>
>>>>> all 3 of my brick nodes ARE also members of the virtualization
>>>>> cluster (including ovirt3). How can I convert it into a full
replica
>>>>> instead of just an arbiter?
>>>>>
>>>>> Thanks!
>>>>> --Jim
>>>>>
>>>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler
<ckozleriii(a)gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @Kasturi - Looks good now. Cluster showed down for a moment but
VM's
>>>>>> stayed up in their appropriate places. Thanks!
>>>>>>
>>>>>> < Anyone on this list please feel free to correct my response
to Jim
>>>>>> if its wrong>
>>>>>>
>>>>>> @ Jim - If you can share your gluster volume info / status I can
>>>>>> confirm (to the best of my knowledge). From my understanding, If
you setup
>>>>>> the volume with something like 'gluster volume set
<vol> group virt' this
>>>>>> will configure some quorum options as well, Ex:
>>>>>>
http://i.imgur.com/Mya4N5o.png
>>>>>>
>>>>>> While, yes, you are configured for arbiter node you're still
losing
>>>>>> quorum by dropping from 2 -> 1. You would need 4 node with 1
being arbiter
>>>>>> to configure quorum which is in effect 3 writable nodes and 1
arbiter. If
>>>>>> one gluster node drops, you still have 2 up. Although in this
case, you
>>>>>> probably wouldnt need arbiter at all
>>>>>>
>>>>>> If you are configured, you can drop quorum settings and just let
>>>>>> arbiter run since you're not using arbiter node in your VM
cluster part (I
>>>>>> believe), just storage cluster part. When using quorum, you need
> 50% of
>>>>>> the cluster being up at one time. Since you have 3 nodes with 1
arbiter,
>>>>>> you're actually losing 1/2 which == 50 which == degraded /
hindered gluster
>>>>>>
>>>>>> Again, this is to the best of my knowledge based on other quorum
>>>>>> backed software....and this is what I understand from testing
with gluster
>>>>>> and ovirt thus far
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir
<jim(a)palousetech.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Huh...Ok., how do I convert the arbitrar to full replica,
then? I
>>>>>>> was misinformed when I created this setup. I thought the
arbitrator held
>>>>>>> enough metadata that it could validate or refudiate any one
replica (kinda
>>>>>>> like the parity drive for a RAID-4 array). I was also under
the impression
>>>>>>> that one replica + Arbitrator is enough to keep the array
online and
>>>>>>> functional.
>>>>>>>
>>>>>>> --Jim
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <
>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>
>>>>>>>> @ Jim - you have only two data volumes and lost quorum.
Arbitrator
>>>>>>>> only stores metadata, no actual files. So yes, you were
running in degraded
>>>>>>>> mode so some operations were hindered.
>>>>>>>>
>>>>>>>> @ Sahina - Yes, this actually worked fine for me once I
did that.
>>>>>>>> However, the issue I am still facing, is when I go to
create a new gluster
>>>>>>>> storage domain (replica 3, hyperconverged) and I tell it
"Host to use" and
>>>>>>>> I select that host. If I fail that host, all VMs halt. I
do not recall this
>>>>>>>> in 3.6 or early 4.0. This to me makes it seem like this
is "pinning" a node
>>>>>>>> to a volume and vice versa like you could, for instance,
for a singular
>>>>>>>> hyperconverged to ex: export a local disk via NFS and
then mount it via
>>>>>>>> ovirt domain. But of course, this has its caveats. To
that end, I am using
>>>>>>>> gluster replica 3, when configuring it I say "host
to use: " node 1, then
>>>>>>>> in the connection details I give it node1:/data. I fail
node1, all VMs
>>>>>>>> halt. Did I miss something?
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose
<sabose(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> To the OP question, when you set up a gluster storage
domain, you
>>>>>>>>> need to specify
backup-volfile-servers=<server2>:<server3> where
>>>>>>>>> server2 and server3 also have bricks running. When
server1 is down, and the
>>>>>>>>> volume is mounted again - server2 or server3 are
queried to get the gluster
>>>>>>>>> volfiles.
>>>>>>>>>
>>>>>>>>> @Jim, if this does not work, are you using 4.1.5
build with
>>>>>>>>> libgfapi access? If not, please provide the vdsm and
gluster mount logs to
>>>>>>>>> analyse
>>>>>>>>>
>>>>>>>>> If VMs go to paused state - this could mean the
storage is not
>>>>>>>>> available. You can check "gluster volume status
<volname>" to see if
>>>>>>>>> atleast 2 bricks are running.
>>>>>>>>>
>>>>>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson
<
>>>>>>>>> johan(a)kafit.se> wrote:
>>>>>>>>>
>>>>>>>>>> If gluster drops in quorum so that it has less
votes than it
>>>>>>>>>> should it will stop file operations until quorum
is back to normal.If i
>>>>>>>>>> rember it right you need two bricks to write for
quorum to be met and that
>>>>>>>>>> the arbiter only is a vote to avoid split brain.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Basically what you have is a raid5 solution
without a spare. And
>>>>>>>>>> when one disk dies it will run in degraded mode.
And some raid systems will
>>>>>>>>>> stop the raid until you have removed the disk or
forced it to run anyway.
>>>>>>>>>>
>>>>>>>>>> You can read up on it here:
https://gluster.readthed
>>>>>>>>>>
ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q
>>>>>>>>>> uorum/
>>>>>>>>>>
>>>>>>>>>> /Johan
>>>>>>>>>>
>>>>>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir
wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all:
>>>>>>>>>>
>>>>>>>>>> Sorry to hijack the thread, but I was about to
start essentially
>>>>>>>>>> the same thread.
>>>>>>>>>>
>>>>>>>>>> I have a 3 node cluster, all three are hosts and
gluster nodes
>>>>>>>>>> (replica 2 + arbitrar). I DO have the
mnt_options=backup-volfile-servers=
>>>>>>>>>> set:
>>>>>>>>>>
>>>>>>>>>> storage=192.168.8.11:/engine
>>>>>>>>>>
mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>>>>>>>>
>>>>>>>>>> I had an issue today where 192.168.8.11 went
down. ALL VMs
>>>>>>>>>> immediately paused, including the engine (all VMs
were running on
>>>>>>>>>> host2:192.168.8.12). I couldn't get any
gluster stuff working until host1
>>>>>>>>>> (192.168.8.11) was restored.
>>>>>>>>>>
>>>>>>>>>> What's wrong / what did I miss?
>>>>>>>>>>
>>>>>>>>>> (this was set up "manually" through the
article on setting up
>>>>>>>>>> self-hosted gluster cluster back when 4.0 was
new..I've upgraded it to 4.1
>>>>>>>>>> since).
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> --Jim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler
<
>>>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Typo..."Set it up and then failed that
**HOST**"
>>>>>>>>>>
>>>>>>>>>> And upon that host going down, the storage domain
went down. I
>>>>>>>>>> only have hosted storage domain and this new one
- is this why the DC went
>>>>>>>>>> down and no SPM could be elected?
>>>>>>>>>>
>>>>>>>>>> I dont recall this working this way in early 4.0
or 3.6
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler
<
>>>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> So I've tested this today and I failed a
node. Specifically, I
>>>>>>>>>> setup a glusterfs domain and selected "host
to use: node1". Set it up and
>>>>>>>>>> then failed that VM
>>>>>>>>>>
>>>>>>>>>> However, this did not work and the datacenter
went down. My
>>>>>>>>>> engine stayed up, however, it seems configuring a
domain to pin to a host
>>>>>>>>>> to use will obviously cause it to fail
>>>>>>>>>>
>>>>>>>>>> This seems counter-intuitive to the point of
glusterfs or any
>>>>>>>>>> redundant storage. If a single host has to be
tied to its function, this
>>>>>>>>>> introduces a single point of failure
>>>>>>>>>>
>>>>>>>>>> Am I missing something obvious?
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra
<
>>>>>>>>>> knarra(a)redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> yes, right. What you can do is edit the
hosted-engine.conf file
>>>>>>>>>> and there is a parameter as shown below [1] and
replace h2 and h3 with your
>>>>>>>>>> second and third storage servers. Then you will
need to restart
>>>>>>>>>> ovirt-ha-agent and ovirt-ha-broker services in
all the nodes .
>>>>>>>>>>
>>>>>>>>>> [1]
'mnt_options=backup-volfile-servers=<h2>:<h3>'
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler
<
>>>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Kasturi -
>>>>>>>>>>
>>>>>>>>>> Thanks for feedback
>>>>>>>>>>
>>>>>>>>>> > If cockpit+gdeploy plugin would be have been
used then that
>>>>>>>>>> would have automatically detected glusterfs
replica 3 volume created during
>>>>>>>>>> Hosted Engine deployment and this question would
not have been asked
>>>>>>>>>>
>>>>>>>>>> Actually, doing hosted-engine --deploy it too
also auto detects
>>>>>>>>>> glusterfs. I know glusterfs fuse client has the
ability to failover
>>>>>>>>>> between all nodes in cluster, but I am still
curious given the fact that I
>>>>>>>>>> see in ovirt config node1:/engine (being node1 I
set it to in hosted-engine
>>>>>>>>>> --deploy). So my concern was to ensure and find
out exactly how engine
>>>>>>>>>> works when one node goes away and the fuse client
moves over to the other
>>>>>>>>>> node in the gluster cluster
>>>>>>>>>>
>>>>>>>>>> But you did somewhat answer my question, the
answer seems to be
>>>>>>>>>> no (as default) and I will have to use
hosted-engine.conf and change the
>>>>>>>>>> parameter as you list
>>>>>>>>>>
>>>>>>>>>> So I need to do something manual to create HA for
engine on
>>>>>>>>>> gluster? Yes?
>>>>>>>>>>
>>>>>>>>>> Thanks so much!
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra
<
>>>>>>>>>> knarra(a)redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> During Hosted Engine setup question about
glusterfs volume is
>>>>>>>>>> being asked because you have setup the volumes
yourself. If cockpit+gdeploy
>>>>>>>>>> plugin would be have been used then that would
have automatically detected
>>>>>>>>>> glusterfs replica 3 volume created during Hosted
Engine deployment and this
>>>>>>>>>> question would not have been asked.
>>>>>>>>>>
>>>>>>>>>> During new storage domain creation when
glusterfs is selected
>>>>>>>>>> there is a feature called 'use managed
gluster volumes' and upon checking
>>>>>>>>>> this all glusterfs volumes managed will be listed
and you could choose the
>>>>>>>>>> volume of your choice from the dropdown list.
>>>>>>>>>>
>>>>>>>>>> There is a conf file called
/etc/hosted-engine/hosted-engine.conf
>>>>>>>>>> where there is a parameter called
backup-volfile-servers="h1:h2" and if one
>>>>>>>>>> of the gluster node goes down engine uses this
parameter to provide ha /
>>>>>>>>>> failover.
>>>>>>>>>>
>>>>>>>>>> Hope this helps !!
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> kasturi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler
<
>>>>>>>>>> ckozleriii(a)gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello -
>>>>>>>>>>
>>>>>>>>>> I have successfully created a hyperconverged
hosted engine setup
>>>>>>>>>> consisting of 3 nodes - 2 for VM's and the
third purely for storage. I
>>>>>>>>>> manually configured it all, did not use ovirt
node or anything. Built the
>>>>>>>>>> gluster volumes myself
>>>>>>>>>>
>>>>>>>>>> However, I noticed that when setting up the
hosted engine and
>>>>>>>>>> even when adding a new storage domain with
glusterfs type, it still asks
>>>>>>>>>> for hostname:/volumename
>>>>>>>>>>
>>>>>>>>>> This leads me to believe that if that one node
goes down (ex:
>>>>>>>>>> node1:/data), then ovirt engine wont be able to
communicate with that
>>>>>>>>>> volume because its trying to reach it on node 1
and thus, go down
>>>>>>>>>>
>>>>>>>>>> I know glusterfs fuse client can connect to all
nodes to provide
>>>>>>>>>> failover/ha but how does the engine handle this?
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing
listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>