[ovirt-users] hyperconverged question

Charles Kozler ckozleriii at gmail.com
Fri Sep 1 23:20:48 UTC 2017


Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM
- Test VM on gluster storage domain (with mount options set)
- hosted engine is on gluster as well, with settings persisted to
hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have
quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest
Type: Replicate
Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/data/datatest/brick1
Brick2: node2:/gluster/data/datatest/brick1
Brick3: node3:/gluster/data/datatest/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes
and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping
it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim at palousetech.com> wrote:

> I can confirm that I did set it up manually, and I did specify backupvol,
> and in the "manage domain" storage settings, I do have under mount
> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
> done at initial install time).
>
> The "used managed gluster" checkbox is NOT checked, and if I check it and
> save settings, next time I go in it is not checked.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii at gmail.com>
> wrote:
>
>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>> and report back what I found in my tests
>>
>> - 3x servers direct connected via 10Gb
>> - 2 of those 3 setup in ovirt as hosts
>> - Hosted engine
>> - Gluster replica 3 (no arbiter) for all volumes
>> - 1x engine volume gluster replica 3 manually configured (not using ovirt
>> managed gluster)
>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>> ovirt managed gluster)
>> - 1x nfstest domain served from some other server in my infrastructure
>> which, at the time of my original testing, was master domain
>>
>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>> reported DC/cluster down, all VM's stayed up
>>
>> As I am now typing this, can you confirm you setup your gluster storage
>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>> backupvol mount option as well?
>>
>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim at palousetech.com> wrote:
>>
>>> So, after reading the first document twice and the 2nd link thoroughly
>>> once, I believe that the arbitrator volume should be sufficient and count
>>> for replica / split brain.  EG, if any one full replica is down, and the
>>> arbitrator and the other replica is up, then it should have quorum and all
>>> should be good.
>>>
>>> I think my underlying problem has to do more with config than the
>>> replica state.  That said, I did size the drive on my 3rd node planning to
>>> have an identical copy of all data on it, so I'm still not opposed to
>>> making it a full replica.
>>>
>>> Did I miss something here?
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii at gmail.com>
>>> wrote:
>>>
>>>> These can get a little confusing but this explains it best:
>>>> https://gluster.readthedocs.io/en/latest/Administrator
>>>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>>>
>>>> Basically in the first paragraph they are explaining why you cant have
>>>> HA with quorum for 2 nodes. Here is another overview doc that explains some
>>>> more
>>>>
>>>> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>>>
>>>> From my understanding arbiter is good for resolving split brains.
>>>> Quorum and arbiter are two different things though quorum is a mechanism to
>>>> help you **avoid** split brain and the arbiter is to help gluster resolve
>>>> split brain by voting and other internal mechanics (as outlined in link 1).
>>>> How did you create the volume exactly - what command? It looks to me like
>>>> you created it with 'gluster volume create replica 2 arbiter 1 {....}' per
>>>> your earlier mention of "replica 2 arbiter 1". That being said, if you did
>>>> that and then setup quorum in the volume configuration, this would cause
>>>> your gluster to halt up since quorum was lost (as you saw until you
>>>> recovered node 1)
>>>>
>>>> As you can see from the docs, there is still a corner case for getting
>>>> in to split brain with replica 3, which again, is where arbiter would help
>>>> gluster resolve it
>>>>
>>>> I need to amend my previous statement: I was told that arbiter volume
>>>> does not store data, only metadata. I cannot find anything in the docs
>>>> backing this up however it would make sense for it to be. That being said,
>>>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>>>> cluster component. I would keep it completely separate
>>>>
>>>>
>>>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim at palousetech.com>
>>>> wrote:
>>>>
>>>>> I'm now also confused as to what the point of an arbiter is / what it
>>>>> does / why one would use it.
>>>>>
>>>>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim at palousetech.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the help!
>>>>>>
>>>>>> Here's my gluster volume info for the data export/brick (I have 3:
>>>>>> data, engine, and iso, but they're all configured the same):
>>>>>>
>>>>>> Volume Name: data
>>>>>> Type: Replicate
>>>>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>>>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>>>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>>>>> Options Reconfigured:
>>>>>> performance.strict-o-direct: on
>>>>>> nfs.disable: on
>>>>>> user.cifs: off
>>>>>> network.ping-timeout: 30
>>>>>> cluster.shd-max-threads: 8
>>>>>> cluster.shd-wait-qlength: 10000
>>>>>> cluster.locking-scheme: granular
>>>>>> cluster.data-self-heal-algorithm: full
>>>>>> performance.low-prio-threads: 32
>>>>>> features.shard-block-size: 512MB
>>>>>> features.shard: on
>>>>>> storage.owner-gid: 36
>>>>>> storage.owner-uid: 36
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> network.remote-dio: enable
>>>>>> cluster.eager-lock: enable
>>>>>> performance.stat-prefetch: off
>>>>>> performance.io-cache: off
>>>>>> performance.read-ahead: off
>>>>>> performance.quick-read: off
>>>>>> performance.readdir-ahead: on
>>>>>> server.allow-insecure: on
>>>>>> [root at ovirt1 ~]#
>>>>>>
>>>>>>
>>>>>> all 3 of my brick nodes ARE also members of the virtualization
>>>>>> cluster (including ovirt3).  How can I convert it into a full replica
>>>>>> instead of just an arbiter?
>>>>>>
>>>>>> Thanks!
>>>>>> --Jim
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>>>>>>> stayed up in their appropriate places. Thanks!
>>>>>>>
>>>>>>> < Anyone on this list please feel free to correct my response to Jim
>>>>>>> if its wrong>
>>>>>>>
>>>>>>> @ Jim - If you can share your gluster volume info / status I can
>>>>>>> confirm (to the best of my knowledge). From my understanding, If you setup
>>>>>>> the volume with something like 'gluster volume set <vol> group virt' this
>>>>>>> will configure some quorum options as well, Ex:
>>>>>>> http://i.imgur.com/Mya4N5o.png
>>>>>>>
>>>>>>> While, yes, you are configured for arbiter node you're still losing
>>>>>>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
>>>>>>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>>>>>>> one gluster node drops, you still have 2 up. Although in this case, you
>>>>>>> probably wouldnt need arbiter at all
>>>>>>>
>>>>>>> If you are configured, you can drop quorum settings and just let
>>>>>>> arbiter run since you're not using arbiter node in your VM cluster part (I
>>>>>>> believe), just storage cluster part. When using quorum, you need > 50% of
>>>>>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>>>>>>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>>>>>>>
>>>>>>> Again, this is to the best of my knowledge based on other quorum
>>>>>>> backed software....and this is what I understand from testing with gluster
>>>>>>> and ovirt thus far
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim at palousetech.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I
>>>>>>>> was misinformed when I created this setup.  I thought the arbitrator held
>>>>>>>> enough metadata that it could validate or refudiate  any one replica (kinda
>>>>>>>> like the parity drive for a RAID-4 array).  I was also under the impression
>>>>>>>> that one replica  + Arbitrator is enough to keep the array online and
>>>>>>>> functional.
>>>>>>>>
>>>>>>>> --Jim
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <
>>>>>>>> ckozleriii at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator
>>>>>>>>> only stores metadata, no actual files. So yes, you were running in degraded
>>>>>>>>> mode so some operations were hindered.
>>>>>>>>>
>>>>>>>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>>>>>>>> However, the issue I am still facing, is when I go to create a new gluster
>>>>>>>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>>>>>>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>>>>>>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>>>>>>>> to a volume and vice versa like you could, for instance, for a singular
>>>>>>>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>>>>>>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>>>>>>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>>>>>>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>>>>>>>> halt. Did I miss something?
>>>>>>>>>
>>>>>>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> To the OP question, when you set up a gluster storage domain, you
>>>>>>>>>> need to specify backup-volfile-servers=<server2>:<server3> where
>>>>>>>>>> server2 and server3 also have bricks running. When server1 is down, and the
>>>>>>>>>> volume is mounted again - server2 or server3 are queried to get the gluster
>>>>>>>>>> volfiles.
>>>>>>>>>>
>>>>>>>>>> @Jim, if this does not work, are you using 4.1.5 build with
>>>>>>>>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to
>>>>>>>>>> analyse
>>>>>>>>>>
>>>>>>>>>> If VMs go to paused state - this could mean the storage is not
>>>>>>>>>> available. You can check "gluster volume status <volname>" to see if
>>>>>>>>>> atleast 2 bricks are running.
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <
>>>>>>>>>> johan at kafit.se> wrote:
>>>>>>>>>>
>>>>>>>>>>> If gluster drops in quorum so that it has less votes than it
>>>>>>>>>>> should it will stop file operations until quorum is back to normal.If i
>>>>>>>>>>> rember it right you need two bricks to write for quorum to be met and that
>>>>>>>>>>> the arbiter only is a vote to avoid split brain.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Basically what you have is a raid5 solution without a spare. And
>>>>>>>>>>> when one disk dies it will run in degraded mode. And some raid systems will
>>>>>>>>>>> stop the raid until you have removed the disk or forced it to run anyway.
>>>>>>>>>>>
>>>>>>>>>>> You can read up on it here: https://gluster.readthed
>>>>>>>>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q
>>>>>>>>>>> uorum/
>>>>>>>>>>>
>>>>>>>>>>> /Johan
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi all:
>>>>>>>>>>>
>>>>>>>>>>> Sorry to hijack the thread, but I was about to start essentially
>>>>>>>>>>> the same thread.
>>>>>>>>>>>
>>>>>>>>>>> I have a 3 node cluster, all three are hosts and gluster nodes
>>>>>>>>>>> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers=
>>>>>>>>>>> set:
>>>>>>>>>>>
>>>>>>>>>>> storage=192.168.8.11:/engine
>>>>>>>>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>>>>>>>>>
>>>>>>>>>>> I had an issue today where 192.168.8.11 went down.  ALL VMs
>>>>>>>>>>> immediately paused, including the engine (all VMs were running on
>>>>>>>>>>> host2:192.168.8.12).  I couldn't get any gluster stuff working until host1
>>>>>>>>>>> (192.168.8.11) was restored.
>>>>>>>>>>>
>>>>>>>>>>> What's wrong / what did I miss?
>>>>>>>>>>>
>>>>>>>>>>> (this was set up "manually" through the article on setting up
>>>>>>>>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>>>>>>>>>> since).
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> --Jim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <
>>>>>>>>>>> ckozleriii at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Typo..."Set it up and then failed that **HOST**"
>>>>>>>>>>>
>>>>>>>>>>> And upon that host going down, the storage domain went down. I
>>>>>>>>>>> only have hosted storage domain and this new one - is this why the DC went
>>>>>>>>>>> down and no SPM could be elected?
>>>>>>>>>>>
>>>>>>>>>>> I dont recall this working this way in early 4.0 or 3.6
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <
>>>>>>>>>>> ckozleriii at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> So I've tested this today and I failed a node. Specifically, I
>>>>>>>>>>> setup a glusterfs domain and selected "host to use: node1". Set it up and
>>>>>>>>>>> then failed that VM
>>>>>>>>>>>
>>>>>>>>>>> However, this did not work and the datacenter went down. My
>>>>>>>>>>> engine stayed up, however, it seems configuring a domain to pin to a host
>>>>>>>>>>> to use will obviously cause it to fail
>>>>>>>>>>>
>>>>>>>>>>> This seems counter-intuitive to the point of glusterfs or any
>>>>>>>>>>> redundant storage. If a single host has to be tied to its function, this
>>>>>>>>>>> introduces a single point of failure
>>>>>>>>>>>
>>>>>>>>>>> Am I missing something obvious?
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <
>>>>>>>>>>> knarra at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> yes, right.  What you can do is edit the hosted-engine.conf file
>>>>>>>>>>> and there is a parameter as shown below [1] and replace h2 and h3 with your
>>>>>>>>>>> second and third storage servers. Then you will need to restart
>>>>>>>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>>>>>>>>>>
>>>>>>>>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <
>>>>>>>>>>> ckozleriii at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Kasturi -
>>>>>>>>>>>
>>>>>>>>>>> Thanks for feedback
>>>>>>>>>>>
>>>>>>>>>>> > If cockpit+gdeploy plugin would be have been used then that
>>>>>>>>>>> would have automatically detected glusterfs replica 3 volume created during
>>>>>>>>>>> Hosted Engine deployment and this question would not have been asked
>>>>>>>>>>>
>>>>>>>>>>> Actually, doing hosted-engine --deploy it too also auto detects
>>>>>>>>>>> glusterfs.  I know glusterfs fuse client has the ability to failover
>>>>>>>>>>> between all nodes in cluster, but I am still curious given the fact that I
>>>>>>>>>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
>>>>>>>>>>> --deploy). So my concern was to ensure and find out exactly how engine
>>>>>>>>>>> works when one node goes away and the fuse client moves over to the other
>>>>>>>>>>> node in the gluster cluster
>>>>>>>>>>>
>>>>>>>>>>> But you did somewhat answer my question, the answer seems to be
>>>>>>>>>>> no (as default) and I will have to use hosted-engine.conf and change the
>>>>>>>>>>> parameter as you list
>>>>>>>>>>>
>>>>>>>>>>> So I need to do something manual to create HA for engine on
>>>>>>>>>>> gluster? Yes?
>>>>>>>>>>>
>>>>>>>>>>> Thanks so much!
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <
>>>>>>>>>>> knarra at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>    During Hosted Engine setup question about glusterfs volume is
>>>>>>>>>>> being asked because you have setup the volumes yourself. If cockpit+gdeploy
>>>>>>>>>>> plugin would be have been used then that would have automatically detected
>>>>>>>>>>> glusterfs replica 3 volume created during Hosted Engine deployment and this
>>>>>>>>>>> question would not have been asked.
>>>>>>>>>>>
>>>>>>>>>>>    During new storage domain creation when glusterfs is selected
>>>>>>>>>>> there is a feature called 'use managed gluster volumes' and upon checking
>>>>>>>>>>> this all glusterfs volumes managed will be listed and you could choose the
>>>>>>>>>>> volume of your choice from the dropdown list.
>>>>>>>>>>>
>>>>>>>>>>>     There is a conf file called /etc/hosted-engine/hosted-engine.conf
>>>>>>>>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one
>>>>>>>>>>> of the gluster node goes down engine uses this parameter to provide ha /
>>>>>>>>>>> failover.
>>>>>>>>>>>
>>>>>>>>>>>  Hope this helps !!
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> kasturi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <
>>>>>>>>>>> ckozleriii at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello -
>>>>>>>>>>>
>>>>>>>>>>> I have successfully created a hyperconverged hosted engine setup
>>>>>>>>>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
>>>>>>>>>>> manually configured it all, did not use ovirt node or anything. Built the
>>>>>>>>>>> gluster volumes myself
>>>>>>>>>>>
>>>>>>>>>>> However, I noticed that when setting up the hosted engine and
>>>>>>>>>>> even when adding a new storage domain with glusterfs type, it still asks
>>>>>>>>>>> for hostname:/volumename
>>>>>>>>>>>
>>>>>>>>>>> This leads me to believe that if that one node goes down (ex:
>>>>>>>>>>> node1:/data), then ovirt engine wont be able to communicate with that
>>>>>>>>>>> volume because its trying to reach it on node 1 and thus, go down
>>>>>>>>>>>
>>>>>>>>>>> I know glusterfs fuse client can connect to all nodes to provide
>>>>>>>>>>> failover/ha but how does the engine handle this?
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list
>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list
>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing listUsers at ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list
>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at ovirt.org
>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170901/77857b1d/attachment.html>


More information about the Users mailing list