[ovirt-users] hyperconverged question

Jim Kusznir jim at palousetech.com
Fri Sep 1 15:53:22 UTC 2017


Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
misinformed when I created this setup.  I thought the arbitrator held
enough metadata that it could validate or refudiate  any one replica (kinda
like the parity drive for a RAID-4 array).  I was also under the impression
that one replica  + Arbitrator is enough to keep the array online and
functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii at gmail.com> wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
> stores metadata, no actual files. So yes, you were running in degraded mode
> so some operations were hindered.
>
> @ Sahina - Yes, this actually worked fine for me once I did that. However,
> the issue I am still facing, is when I go to create a new gluster storage
> domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
> that host. If I fail that host, all VMs halt. I do not recall this in 3.6
> or early 4.0. This to me makes it seem like this is "pinning" a node to a
> volume and vice versa like you could, for instance, for a singular
> hyperconverged to ex: export a local disk via NFS and then mount it via
> ovirt domain. But of course, this has its caveats. To that end, I am using
> gluster replica 3, when configuring it I say "host to use: " node 1, then
> in the connection details I give it node1:/data. I fail node1, all VMs
> halt. Did I miss something?
>
> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose at redhat.com> wrote:
>
>> To the OP question, when you set up a gluster storage domain, you need to
>> specify backup-volfile-servers=<server2>:<server3> where server2 and
>> server3 also have bricks running. When server1 is down, and the volume is
>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>
>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>
>> If VMs go to paused state - this could mean the storage is not available.
>> You can check "gluster volume status <volname>" to see if atleast 2 bricks
>> are running.
>>
>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan at kafit.se>
>> wrote:
>>
>>> If gluster drops in quorum so that it has less votes than it should it
>>> will stop file operations until quorum is back to normal.If i rember it
>>> right you need two bricks to write for quorum to be met and that the
>>> arbiter only is a vote to avoid split brain.
>>>
>>>
>>> Basically what you have is a raid5 solution without a spare. And when
>>> one disk dies it will run in degraded mode. And some raid systems will stop
>>> the raid until you have removed the disk or forced it to run anyway.
>>>
>>> You can read up on it here: https://gluster.readthed
>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>
>>> /Johan
>>>
>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>
>>> Hi all:
>>>
>>> Sorry to hijack the thread, but I was about to start essentially the
>>> same thread.
>>>
>>> I have a 3 node cluster, all three are hosts and gluster nodes (replica
>>> 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>>
>>> storage=192.168.8.11:/engine
>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>
>>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>>> restored.
>>>
>>> What's wrong / what did I miss?
>>>
>>> (this was set up "manually" through the article on setting up
>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>> since).
>>>
>>> Thanks!
>>> --Jim
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii at gmail.com>
>>> wrote:
>>>
>>> Typo..."Set it up and then failed that **HOST**"
>>>
>>> And upon that host going down, the storage domain went down. I only have
>>> hosted storage domain and this new one - is this why the DC went down and
>>> no SPM could be elected?
>>>
>>> I dont recall this working this way in early 4.0 or 3.6
>>>
>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii at gmail.com>
>>> wrote:
>>>
>>> So I've tested this today and I failed a node. Specifically, I setup a
>>> glusterfs domain and selected "host to use: node1". Set it up and then
>>> failed that VM
>>>
>>> However, this did not work and the datacenter went down. My engine
>>> stayed up, however, it seems configuring a domain to pin to a host to use
>>> will obviously cause it to fail
>>>
>>> This seems counter-intuitive to the point of glusterfs or any redundant
>>> storage. If a single host has to be tied to its function, this introduces a
>>> single point of failure
>>>
>>> Am I missing something obvious?
>>>
>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra at redhat.com>
>>> wrote:
>>>
>>> yes, right.  What you can do is edit the hosted-engine.conf file and
>>> there is a parameter as shown below [1] and replace h2 and h3 with your
>>> second and third storage servers. Then you will need to restart
>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>>
>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'
>>>
>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii at gmail.com>
>>> wrote:
>>>
>>> Hi Kasturi -
>>>
>>> Thanks for feedback
>>>
>>> > If cockpit+gdeploy plugin would be have been used then that would
>>> have automatically detected glusterfs replica 3 volume created during
>>> Hosted Engine deployment and this question would not have been asked
>>>
>>> Actually, doing hosted-engine --deploy it too also auto detects
>>> glusterfs.  I know glusterfs fuse client has the ability to failover
>>> between all nodes in cluster, but I am still curious given the fact that I
>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
>>> --deploy). So my concern was to ensure and find out exactly how engine
>>> works when one node goes away and the fuse client moves over to the other
>>> node in the gluster cluster
>>>
>>> But you did somewhat answer my question, the answer seems to be no (as
>>> default) and I will have to use hosted-engine.conf and change the parameter
>>> as you list
>>>
>>> So I need to do something manual to create HA for engine on gluster? Yes?
>>>
>>> Thanks so much!
>>>
>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra at redhat.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>>    During Hosted Engine setup question about glusterfs volume is being
>>> asked because you have setup the volumes yourself. If cockpit+gdeploy
>>> plugin would be have been used then that would have automatically detected
>>> glusterfs replica 3 volume created during Hosted Engine deployment and this
>>> question would not have been asked.
>>>
>>>    During new storage domain creation when glusterfs is selected there
>>> is a feature called 'use managed gluster volumes' and upon checking this
>>> all glusterfs volumes managed will be listed and you could choose the
>>> volume of your choice from the dropdown list.
>>>
>>>     There is a conf file called /etc/hosted-engine/hosted-engine.conf
>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one
>>> of the gluster node goes down engine uses this parameter to provide ha /
>>> failover.
>>>
>>>  Hope this helps !!
>>>
>>> Thanks
>>> kasturi
>>>
>>>
>>>
>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii at gmail.com>
>>> wrote:
>>>
>>> Hello -
>>>
>>> I have successfully created a hyperconverged hosted engine setup
>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
>>> manually configured it all, did not use ovirt node or anything. Built the
>>> gluster volumes myself
>>>
>>> However, I noticed that when setting up the hosted engine and even when
>>> adding a new storage domain with glusterfs type, it still asks for
>>> hostname:/volumename
>>>
>>> This leads me to believe that if that one node goes down (ex:
>>> node1:/data), then ovirt engine wont be able to communicate with that
>>> volume because its trying to reach it on node 1 and thus, go down
>>>
>>> I know glusterfs fuse client can connect to all nodes to provide
>>> failover/ha but how does the engine handle this?
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> Users mailing listUsers at ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170901/17b5d225/attachment.html>


More information about the Users mailing list