hyperconverged question

Charles Kozler

30 Aug 2017 30 Aug '17

4:39 p.m.

Hello - I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

Attachments:

attachment.html (text/html — 838 bytes)

Show replies by date

Kasturi Narra

31 Aug 31 Aug

9:03 a.m.

Hi, During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked. During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list. There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover. Hope this helps !! Thanks kasturi On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

2:24 p.m.

Hi Kasturi - Thanks for feedback

...

If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list So I need to do something manual to create HA for engine on gluster? Yes? Thanks so much! On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

...

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Kasturi Narra

3:43 p.m.

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes . [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

9:30 p.m.

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure Am I missing something obvious? On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

...

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

9:31 p.m.

Typo..."Set it up and then failed that **HOST**" And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected? I dont recall this working this way in early 4.0 or 3.6 On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

1 Sep 1 Sep

7:33 a.m.

Hi all: Sorry to hijack the thread, but I was about to start essentially the same thread. I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set: storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored. What's wrong / what did I miss? (this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since). Thanks! --Jim On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Johan Bernhardsson

8:01 a.m.

If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain. Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway. You can read up on it here: https://gluster.readthedocs.io/en/latest/Ad ministrator%20Guide/arbiter-volumes-and-quorum/ /JohanOn Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

...

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile- servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self- hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler m> wrote:

...
Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler om> wrote:

...
So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com

...
wrote: yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler il.com> wrote:

...
Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine -- deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra .com> wrote:

...
Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted- engine.conf where there is a parameter called backup- volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler @gmail.com> wrote: > Hello - > > I have successfully created a hyperconverged hosted > engine setup consisting of 3 nodes - 2 for VM's and the > third purely for storage. I manually configured it all, > did not use ovirt node or anything. Built the gluster > volumes myself > > However, I noticed that when setting up the hosted engine > and even when adding a new storage domain with glusterfs > type, it still asks for hostname:/volumename > > This leads me to believe that if that one node goes down > (ex: node1:/data), then ovirt engine wont be able to > communicate with that volume because its trying to reach > it on node 1 and thus, go down > > I know glusterfs fuse client can connect to all nodes to > provide failover/ha but how does the engine handle this? > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Sahina Bose

8:13 a.m.

To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles. @Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running. On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...

If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthedocs.io/en/latest/ Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

2:22 p.m.

@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered. @ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something? On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...

To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Kasturi Narra

2:52 p.m.

Hi Charles, One question, while configuring a storage domain you are saying "host to use: " node1, then in the connection details you say node1:/data. What about the backup-volfile-servers option in the UI while configuring storage domain? Are you specifying that too? Thanks kasturi On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

3:10 p.m.

Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png Then no, but that would explain why it wasnt working :-). I guess I had a silly assumption that oVirt would have detected it and automatically taken up the redundancy that was configured inside the replica set / brick detection. I will test and let you know Thanks! On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra <knarra@redhat.com> wrote:

...

Hi Charles,

One question, while configuring a storage domain you are saying "host to use: " node1, then in the connection details you say node1:/data. What about the backup-volfile-servers option in the UI while configuring storage domain? Are you specifying that too?

Thanks kasturi

On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Kasturi Narra

3:21 p.m.

yes, that is the same option i was asking about. Apologies that i had mentioned a different name. So, ovirt will automatically detect it if you select the option 'use managed gluster volume'. While adding a storage domain after specifying the host , you could just select the checkbox and that will list all the volumes managed from ovirt UI + that will fill the mount options for you. On Fri, Sep 1, 2017 at 6:40 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png

Then no, but that would explain why it wasnt working :-). I guess I had a silly assumption that oVirt would have detected it and automatically taken up the redundancy that was configured inside the replica set / brick detection.

I will test and let you know

Thanks!

On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra <knarra@redhat.com> wrote:

...
Hi Charles,

One question, while configuring a storage domain you are saying "host to use: " node1, then in the connection details you say node1:/data. What about the backup-volfile-servers option in the UI while configuring storage domain? Are you specifying that too?

Thanks kasturi

On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com

...
wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

5:53 p.m.

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional. --Jim On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

5:57 p.m.

Speaking of the "use managed gluster", I created this gluster setup under ovirt 4.0 when that wasn't there. I've gone into my settings and checked the box and saved it at least twice, but when I go back into the storage settings, its not checked again. The "about" box in the gui reports that I'm using this version: oVirt Engine Version: 4.1.1.8-1.el7.centos I thought I was staying up to date, but I'm not sure if I'm doing everything right on the upgrade...The documentation says to click for hosted engine upgrade instructions, which takes me to a page not found error...For several versions now, and I haven't found those instructions, so I've been "winging it". --Jim On Fri, Sep 1, 2017 at 8:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

6:09 p.m.

@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks! < Anyone on this list please feel free to correct my response to Jim if its wrong> @ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

8:44 p.m.

Thanks for the help! Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same): Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]# all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter? Thanks! --Jim On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com

...
wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

8:46 p.m.

I'm now also confused as to what the point of an arbiter is / what it does / why one would use it. On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...

Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com > wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com > wrote:

Hi Kasturi -

Thanks for feedback

> If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com > wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

8:59 p.m.

These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volum... Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

...

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...

I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o .png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

> If gluster drops in quorum so that it has less votes than it should > it will stop file operations until quorum is back to normal.If i rember it > right you need two bricks to write for quorum to be met and that the > arbiter only is a vote to avoid split brain. > > > Basically what you have is a raid5 solution without a spare. And > when one disk dies it will run in degraded mode. And some raid systems will > stop the raid until you have removed the disk or forced it to run anyway. > > You can read up on it here: https://gluster.readthed > ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ > > /Johan > > On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: > > Hi all: > > Sorry to hijack the thread, but I was about to start essentially the > same thread. > > I have a 3 node cluster, all three are hosts and gluster nodes > (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= > set: > > storage=192.168.8.11:/engine > mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 > > I had an issue today where 192.168.8.11 went down. ALL VMs > immediately paused, including the engine (all VMs were running on > host2:192.168.8.12). I couldn't get any gluster stuff working until host1 > (192.168.8.11) was restored. > > What's wrong / what did I miss? > > (this was set up "manually" through the article on setting up > self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 > since). > > Thanks! > --Jim > > > On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < > ckozleriii@gmail.com> wrote: > > Typo..."Set it up and then failed that **HOST**" > > And upon that host going down, the storage domain went down. I only > have hosted storage domain and this new one - is this why the DC went down > and no SPM could be elected? > > I dont recall this working this way in early 4.0 or 3.6 > > On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < > ckozleriii@gmail.com> wrote: > > So I've tested this today and I failed a node. Specifically, I setup > a glusterfs domain and selected "host to use: node1". Set it up and then > failed that VM > > However, this did not work and the datacenter went down. My engine > stayed up, however, it seems configuring a domain to pin to a host to use > will obviously cause it to fail > > This seems counter-intuitive to the point of glusterfs or any > redundant storage. If a single host has to be tied to its function, this > introduces a single point of failure > > Am I missing something obvious? > > On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> > wrote: > > yes, right. What you can do is edit the hosted-engine.conf file and > there is a parameter as shown below [1] and replace h2 and h3 with your > second and third storage servers. Then you will need to restart > ovirt-ha-agent and ovirt-ha-broker services in all the nodes . > > [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' > > On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < > ckozleriii@gmail.com> wrote: > > Hi Kasturi - > > Thanks for feedback > > > If cockpit+gdeploy plugin would be have been used then that would > have automatically detected glusterfs replica 3 volume created during > Hosted Engine deployment and this question would not have been asked > > Actually, doing hosted-engine --deploy it too also auto detects > glusterfs. I know glusterfs fuse client has the ability to failover > between all nodes in cluster, but I am still curious given the fact that I > see in ovirt config node1:/engine (being node1 I set it to in hosted-engine > --deploy). So my concern was to ensure and find out exactly how engine > works when one node goes away and the fuse client moves over to the other > node in the gluster cluster > > But you did somewhat answer my question, the answer seems to be no > (as default) and I will have to use hosted-engine.conf and change the > parameter as you list > > So I need to do something manual to create HA for engine on gluster? > Yes? > > Thanks so much! > > On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> > wrote: > > Hi, > > During Hosted Engine setup question about glusterfs volume is > being asked because you have setup the volumes yourself. If cockpit+gdeploy > plugin would be have been used then that would have automatically detected > glusterfs replica 3 volume created during Hosted Engine deployment and this > question would not have been asked. > > During new storage domain creation when glusterfs is selected > there is a feature called 'use managed gluster volumes' and upon checking > this all glusterfs volumes managed will be listed and you could choose the > volume of your choice from the dropdown list. > > There is a conf file called /etc/hosted-engine/hosted-engine.conf > where there is a parameter called backup-volfile-servers="h1:h2" and if one > of the gluster node goes down engine uses this parameter to provide ha / > failover. > > Hope this helps !! > > Thanks > kasturi > > > > On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < > ckozleriii@gmail.com> wrote: > > Hello - > > I have successfully created a hyperconverged hosted engine setup > consisting of 3 nodes - 2 for VM's and the third purely for storage. I > manually configured it all, did not use ovirt node or anything. Built the > gluster volumes myself > > However, I noticed that when setting up the hosted engine and even > when adding a new storage domain with glusterfs type, it still asks for > hostname:/volumename > > This leads me to believe that if that one node goes down (ex: > node1:/data), then ovirt engine wont be able to communicate with that > volume because its trying to reach it on node 1 and thus, go down > > I know glusterfs fuse client can connect to all nodes to provide > failover/ha but how does the engine handle this? > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

10:07 p.m.

Thank you! I created my cluster following these instructions: https://www.ovirt.org/blog/2016/08/up-and-running-with-ovirt-4-0-and-gluster... (I built it about 10 months ago) I used their recipe for automated gluster node creation. Originally I thought I had 3 replicas, then I started realizing that node 3's disk usage was essentially nothing compared to node 1 and 2, and eventually on this list discovered that I had an arbiter. Currently I am running on a 1Gbps backbone, but I can dedicate a gig port (or even do bonded gig -- my servers have 4 1Gbps interfaces, and my switch is only used for this cluster, so it has the ports to hook them all up). I am planning on a 10gbps upgrade once I bring in some more cash to pay for it. Last night, node 2 and 3 were up, and I rebooted node 1 for updates. As soon as it shut down, my cluster halted (including the hosted engine), and everything went messy. When the node came back up, I still had to recover the hosted engine via command line, then could go in and start unpausing my VMs. I'm glad it happened at 8pm at night...That would have been very ugly if it happened during the day. I had thought I had enough redundancy in the cluster that I could take down any 1 node and not have an issue...That definitely is not what happened. --Jim On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter- volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

> To the OP question, when you set up a gluster storage domain, you > need to specify backup-volfile-servers=<server2>:<server3> where > server2 and server3 also have bricks running. When server1 is down, and the > volume is mounted again - server2 or server3 are queried to get the gluster > volfiles. > > @Jim, if this does not work, are you using 4.1.5 build with libgfapi > access? If not, please provide the vdsm and gluster mount logs to analyse > > If VMs go to paused state - this could mean the storage is not > available. You can check "gluster volume status <volname>" to see if > atleast 2 bricks are running. > > On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> > wrote: > >> If gluster drops in quorum so that it has less votes than it should >> it will stop file operations until quorum is back to normal.If i rember it >> right you need two bricks to write for quorum to be met and that the >> arbiter only is a vote to avoid split brain. >> >> >> Basically what you have is a raid5 solution without a spare. And >> when one disk dies it will run in degraded mode. And some raid systems will >> stop the raid until you have removed the disk or forced it to run anyway. >> >> You can read up on it here: https://gluster.readthed >> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ >> >> /Johan >> >> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >> >> Hi all: >> >> Sorry to hijack the thread, but I was about to start essentially >> the same thread. >> >> I have a 3 node cluster, all three are hosts and gluster nodes >> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >> set: >> >> storage=192.168.8.11:/engine >> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >> >> I had an issue today where 192.168.8.11 went down. ALL VMs >> immediately paused, including the engine (all VMs were running on >> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >> (192.168.8.11) was restored. >> >> What's wrong / what did I miss? >> >> (this was set up "manually" through the article on setting up >> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >> since). >> >> Thanks! >> --Jim >> >> >> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Typo..."Set it up and then failed that **HOST**" >> >> And upon that host going down, the storage domain went down. I only >> have hosted storage domain and this new one - is this why the DC went down >> and no SPM could be elected? >> >> I dont recall this working this way in early 4.0 or 3.6 >> >> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> So I've tested this today and I failed a node. Specifically, I >> setup a glusterfs domain and selected "host to use: node1". Set it up and >> then failed that VM >> >> However, this did not work and the datacenter went down. My engine >> stayed up, however, it seems configuring a domain to pin to a host to use >> will obviously cause it to fail >> >> This seems counter-intuitive to the point of glusterfs or any >> redundant storage. If a single host has to be tied to its function, this >> introduces a single point of failure >> >> Am I missing something obvious? >> >> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> >> wrote: >> >> yes, right. What you can do is edit the hosted-engine.conf file >> and there is a parameter as shown below [1] and replace h2 and h3 with your >> second and third storage servers. Then you will need to restart >> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >> >> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >> >> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Hi Kasturi - >> >> Thanks for feedback >> >> > If cockpit+gdeploy plugin would be have been used then that >> would have automatically detected glusterfs replica 3 volume created during >> Hosted Engine deployment and this question would not have been asked >> >> Actually, doing hosted-engine --deploy it too also auto detects >> glusterfs. I know glusterfs fuse client has the ability to failover >> between all nodes in cluster, but I am still curious given the fact that I >> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >> --deploy). So my concern was to ensure and find out exactly how engine >> works when one node goes away and the fuse client moves over to the other >> node in the gluster cluster >> >> But you did somewhat answer my question, the answer seems to be no >> (as default) and I will have to use hosted-engine.conf and change the >> parameter as you list >> >> So I need to do something manual to create HA for engine on >> gluster? Yes? >> >> Thanks so much! >> >> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> >> wrote: >> >> Hi, >> >> During Hosted Engine setup question about glusterfs volume is >> being asked because you have setup the volumes yourself. If cockpit+gdeploy >> plugin would be have been used then that would have automatically detected >> glusterfs replica 3 volume created during Hosted Engine deployment and this >> question would not have been asked. >> >> During new storage domain creation when glusterfs is selected >> there is a feature called 'use managed gluster volumes' and upon checking >> this all glusterfs volumes managed will be listed and you could choose the >> volume of your choice from the dropdown list. >> >> There is a conf file called /etc/hosted-engine/hosted-engine.conf >> where there is a parameter called backup-volfile-servers="h1:h2" and if one >> of the gluster node goes down engine uses this parameter to provide ha / >> failover. >> >> Hope this helps !! >> >> Thanks >> kasturi >> >> >> >> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Hello - >> >> I have successfully created a hyperconverged hosted engine setup >> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >> manually configured it all, did not use ovirt node or anything. Built the >> gluster volumes myself >> >> However, I noticed that when setting up the hosted engine and even >> when adding a new storage domain with glusterfs type, it still asks for >> hostname:/volumename >> >> This leads me to believe that if that one node goes down (ex: >> node1:/data), then ovirt engine wont be able to communicate with that >> volume because its trying to reach it on node 1 and thus, go down >> >> I know glusterfs fuse client can connect to all nodes to provide >> failover/ha but how does the engine handle this? >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> _______________________________________________ >> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Jim Kusznir

10:22 p.m.

So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good. I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica. Did I miss something here? Thanks! On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter- volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

> To the OP question, when you set up a gluster storage domain, you > need to specify backup-volfile-servers=<server2>:<server3> where > server2 and server3 also have bricks running. When server1 is down, and the > volume is mounted again - server2 or server3 are queried to get the gluster > volfiles. > > @Jim, if this does not work, are you using 4.1.5 build with libgfapi > access? If not, please provide the vdsm and gluster mount logs to analyse > > If VMs go to paused state - this could mean the storage is not > available. You can check "gluster volume status <volname>" to see if > atleast 2 bricks are running. > > On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> > wrote: > >> If gluster drops in quorum so that it has less votes than it should >> it will stop file operations until quorum is back to normal.If i rember it >> right you need two bricks to write for quorum to be met and that the >> arbiter only is a vote to avoid split brain. >> >> >> Basically what you have is a raid5 solution without a spare. And >> when one disk dies it will run in degraded mode. And some raid systems will >> stop the raid until you have removed the disk or forced it to run anyway. >> >> You can read up on it here: https://gluster.readthed >> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ >> >> /Johan >> >> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >> >> Hi all: >> >> Sorry to hijack the thread, but I was about to start essentially >> the same thread. >> >> I have a 3 node cluster, all three are hosts and gluster nodes >> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >> set: >> >> storage=192.168.8.11:/engine >> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >> >> I had an issue today where 192.168.8.11 went down. ALL VMs >> immediately paused, including the engine (all VMs were running on >> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >> (192.168.8.11) was restored. >> >> What's wrong / what did I miss? >> >> (this was set up "manually" through the article on setting up >> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >> since). >> >> Thanks! >> --Jim >> >> >> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Typo..."Set it up and then failed that **HOST**" >> >> And upon that host going down, the storage domain went down. I only >> have hosted storage domain and this new one - is this why the DC went down >> and no SPM could be elected? >> >> I dont recall this working this way in early 4.0 or 3.6 >> >> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> So I've tested this today and I failed a node. Specifically, I >> setup a glusterfs domain and selected "host to use: node1". Set it up and >> then failed that VM >> >> However, this did not work and the datacenter went down. My engine >> stayed up, however, it seems configuring a domain to pin to a host to use >> will obviously cause it to fail >> >> This seems counter-intuitive to the point of glusterfs or any >> redundant storage. If a single host has to be tied to its function, this >> introduces a single point of failure >> >> Am I missing something obvious? >> >> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> >> wrote: >> >> yes, right. What you can do is edit the hosted-engine.conf file >> and there is a parameter as shown below [1] and replace h2 and h3 with your >> second and third storage servers. Then you will need to restart >> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >> >> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >> >> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Hi Kasturi - >> >> Thanks for feedback >> >> > If cockpit+gdeploy plugin would be have been used then that >> would have automatically detected glusterfs replica 3 volume created during >> Hosted Engine deployment and this question would not have been asked >> >> Actually, doing hosted-engine --deploy it too also auto detects >> glusterfs. I know glusterfs fuse client has the ability to failover >> between all nodes in cluster, but I am still curious given the fact that I >> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >> --deploy). So my concern was to ensure and find out exactly how engine >> works when one node goes away and the fuse client moves over to the other >> node in the gluster cluster >> >> But you did somewhat answer my question, the answer seems to be no >> (as default) and I will have to use hosted-engine.conf and change the >> parameter as you list >> >> So I need to do something manual to create HA for engine on >> gluster? Yes? >> >> Thanks so much! >> >> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> >> wrote: >> >> Hi, >> >> During Hosted Engine setup question about glusterfs volume is >> being asked because you have setup the volumes yourself. If cockpit+gdeploy >> plugin would be have been used then that would have automatically detected >> glusterfs replica 3 volume created during Hosted Engine deployment and this >> question would not have been asked. >> >> During new storage domain creation when glusterfs is selected >> there is a feature called 'use managed gluster volumes' and upon checking >> this all glusterfs volumes managed will be listed and you could choose the >> volume of your choice from the dropdown list. >> >> There is a conf file called /etc/hosted-engine/hosted-engine.conf >> where there is a parameter called backup-volfile-servers="h1:h2" and if one >> of the gluster node goes down engine uses this parameter to provide ha / >> failover. >> >> Hope this helps !! >> >> Thanks >> kasturi >> >> >> >> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >> Hello - >> >> I have successfully created a hyperconverged hosted engine setup >> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >> manually configured it all, did not use ovirt node or anything. Built the >> gluster volumes myself >> >> However, I noticed that when setting up the hosted engine and even >> when adding a new storage domain with glusterfs type, it still asks for >> hostname:/volumename >> >> This leads me to believe that if that one node goes down (ex: >> node1:/data), then ovirt engine wont be able to communicate with that >> volume because its trying to reach it on node 1 and thus, go down >> >> I know glusterfs fuse client can connect to all nodes to provide >> failover/ha but how does the engine handle this? >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> _______________________________________________ >> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

11:08 p.m.

@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests - 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well? On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...

So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator > only stores metadata, no actual files. So yes, you were running in degraded > mode so some operations were hindered. > > @ Sahina - Yes, this actually worked fine for me once I did that. > However, the issue I am still facing, is when I go to create a new gluster > storage domain (replica 3, hyperconverged) and I tell it "Host to use" and > I select that host. If I fail that host, all VMs halt. I do not recall this > in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node > to a volume and vice versa like you could, for instance, for a singular > hyperconverged to ex: export a local disk via NFS and then mount it via > ovirt domain. But of course, this has its caveats. To that end, I am using > gluster replica 3, when configuring it I say "host to use: " node 1, then > in the connection details I give it node1:/data. I fail node1, all VMs > halt. Did I miss something? > > On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> > wrote: > >> To the OP question, when you set up a gluster storage domain, you >> need to specify backup-volfile-servers=<server2>:<server3> where >> server2 and server3 also have bricks running. When server1 is down, and the >> volume is mounted again - server2 or server3 are queried to get the gluster >> volfiles. >> >> @Jim, if this does not work, are you using 4.1.5 build with >> libgfapi access? If not, please provide the vdsm and gluster mount logs to >> analyse >> >> If VMs go to paused state - this could mean the storage is not >> available. You can check "gluster volume status <volname>" to see if >> atleast 2 bricks are running. >> >> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se >> > wrote: >> >>> If gluster drops in quorum so that it has less votes than it >>> should it will stop file operations until quorum is back to normal.If i >>> rember it right you need two bricks to write for quorum to be met and that >>> the arbiter only is a vote to avoid split brain. >>> >>> >>> Basically what you have is a raid5 solution without a spare. And >>> when one disk dies it will run in degraded mode. And some raid systems will >>> stop the raid until you have removed the disk or forced it to run anyway. >>> >>> You can read up on it here: https://gluster.readthed >>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ >>> >>> /Johan >>> >>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>> >>> Hi all: >>> >>> Sorry to hijack the thread, but I was about to start essentially >>> the same thread. >>> >>> I have a 3 node cluster, all three are hosts and gluster nodes >>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>> set: >>> >>> storage=192.168.8.11:/engine >>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>> >>> I had an issue today where 192.168.8.11 went down. ALL VMs >>> immediately paused, including the engine (all VMs were running on >>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>> (192.168.8.11) was restored. >>> >>> What's wrong / what did I miss? >>> >>> (this was set up "manually" through the article on setting up >>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>> since). >>> >>> Thanks! >>> --Jim >>> >>> >>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>> Typo..."Set it up and then failed that **HOST**" >>> >>> And upon that host going down, the storage domain went down. I >>> only have hosted storage domain and this new one - is this why the DC went >>> down and no SPM could be elected? >>> >>> I dont recall this working this way in early 4.0 or 3.6 >>> >>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>> So I've tested this today and I failed a node. Specifically, I >>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>> then failed that VM >>> >>> However, this did not work and the datacenter went down. My engine >>> stayed up, however, it seems configuring a domain to pin to a host to use >>> will obviously cause it to fail >>> >>> This seems counter-intuitive to the point of glusterfs or any >>> redundant storage. If a single host has to be tied to its function, this >>> introduces a single point of failure >>> >>> Am I missing something obvious? >>> >>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> >>> wrote: >>> >>> yes, right. What you can do is edit the hosted-engine.conf file >>> and there is a parameter as shown below [1] and replace h2 and h3 with your >>> second and third storage servers. Then you will need to restart >>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>> >>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>> >>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>> Hi Kasturi - >>> >>> Thanks for feedback >>> >>> > If cockpit+gdeploy plugin would be have been used then that >>> would have automatically detected glusterfs replica 3 volume created during >>> Hosted Engine deployment and this question would not have been asked >>> >>> Actually, doing hosted-engine --deploy it too also auto detects >>> glusterfs. I know glusterfs fuse client has the ability to failover >>> between all nodes in cluster, but I am still curious given the fact that I >>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >>> --deploy). So my concern was to ensure and find out exactly how engine >>> works when one node goes away and the fuse client moves over to the other >>> node in the gluster cluster >>> >>> But you did somewhat answer my question, the answer seems to be no >>> (as default) and I will have to use hosted-engine.conf and change the >>> parameter as you list >>> >>> So I need to do something manual to create HA for engine on >>> gluster? Yes? >>> >>> Thanks so much! >>> >>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> >>> wrote: >>> >>> Hi, >>> >>> During Hosted Engine setup question about glusterfs volume is >>> being asked because you have setup the volumes yourself. If cockpit+gdeploy >>> plugin would be have been used then that would have automatically detected >>> glusterfs replica 3 volume created during Hosted Engine deployment and this >>> question would not have been asked. >>> >>> During new storage domain creation when glusterfs is selected >>> there is a feature called 'use managed gluster volumes' and upon checking >>> this all glusterfs volumes managed will be listed and you could choose the >>> volume of your choice from the dropdown list. >>> >>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>> of the gluster node goes down engine uses this parameter to provide ha / >>> failover. >>> >>> Hope this helps !! >>> >>> Thanks >>> kasturi >>> >>> >>> >>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>> Hello - >>> >>> I have successfully created a hyperconverged hosted engine setup >>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >>> manually configured it all, did not use ovirt node or anything. Built the >>> gluster volumes myself >>> >>> However, I noticed that when setting up the hosted engine and even >>> when adding a new storage domain with glusterfs type, it still asks for >>> hostname:/volumename >>> >>> This leads me to believe that if that one node goes down (ex: >>> node1:/data), then ovirt engine wont be able to communicate with that >>> volume because its trying to reach it on node 1 and thus, go down >>> >>> I know glusterfs fuse client can connect to all nodes to provide >>> failover/ha but how does the engine handle this? >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >

Jim Kusznir

11:18 p.m.

I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time). The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked. --Jim On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its wrong>

@ Jim - If you can share your gluster volume info / status I can confirm (to the best of my knowledge). From my understanding, If you setup the volume with something like 'gluster volume set <vol> group virt' this will configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to configure quorum which is in effect 3 writable nodes and 1 arbiter. If one gluster node drops, you still have 2 up. Although in this case, you probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter run since you're not using arbiter node in your VM cluster part (I believe), just storage cluster part. When using quorum, you need > 50% of the cluster being up at one time. Since you have 3 nodes with 1 arbiter, you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed software....and this is what I understand from testing with gluster and ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> wrote:

> Huh...Ok., how do I convert the arbitrar to full replica, then? I > was misinformed when I created this setup. I thought the arbitrator held > enough metadata that it could validate or refudiate any one replica (kinda > like the parity drive for a RAID-4 array). I was also under the impression > that one replica + Arbitrator is enough to keep the array online and > functional. > > --Jim > > On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com > > wrote: > >> @ Jim - you have only two data volumes and lost quorum. Arbitrator >> only stores metadata, no actual files. So yes, you were running in degraded >> mode so some operations were hindered. >> >> @ Sahina - Yes, this actually worked fine for me once I did that. >> However, the issue I am still facing, is when I go to create a new gluster >> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and >> I select that host. If I fail that host, all VMs halt. I do not recall this >> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node >> to a volume and vice versa like you could, for instance, for a singular >> hyperconverged to ex: export a local disk via NFS and then mount it via >> ovirt domain. But of course, this has its caveats. To that end, I am using >> gluster replica 3, when configuring it I say "host to use: " node 1, then >> in the connection details I give it node1:/data. I fail node1, all VMs >> halt. Did I miss something? >> >> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >> wrote: >> >>> To the OP question, when you set up a gluster storage domain, you >>> need to specify backup-volfile-servers=<server2>:<server3> where >>> server2 and server3 also have bricks running. When server1 is down, and the >>> volume is mounted again - server2 or server3 are queried to get the gluster >>> volfiles. >>> >>> @Jim, if this does not work, are you using 4.1.5 build with >>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>> analyse >>> >>> If VMs go to paused state - this could mean the storage is not >>> available. You can check "gluster volume status <volname>" to see if >>> atleast 2 bricks are running. >>> >>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>> johan@kafit.se> wrote: >>> >>>> If gluster drops in quorum so that it has less votes than it >>>> should it will stop file operations until quorum is back to normal.If i >>>> rember it right you need two bricks to write for quorum to be met and that >>>> the arbiter only is a vote to avoid split brain. >>>> >>>> >>>> Basically what you have is a raid5 solution without a spare. And >>>> when one disk dies it will run in degraded mode. And some raid systems will >>>> stop the raid until you have removed the disk or forced it to run anyway. >>>> >>>> You can read up on it here: https://gluster.readthed >>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>> uorum/ >>>> >>>> /Johan >>>> >>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>> >>>> Hi all: >>>> >>>> Sorry to hijack the thread, but I was about to start essentially >>>> the same thread. >>>> >>>> I have a 3 node cluster, all three are hosts and gluster nodes >>>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>> set: >>>> >>>> storage=192.168.8.11:/engine >>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>> >>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>> immediately paused, including the engine (all VMs were running on >>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>> (192.168.8.11) was restored. >>>> >>>> What's wrong / what did I miss? >>>> >>>> (this was set up "manually" through the article on setting up >>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>> since). >>>> >>>> Thanks! >>>> --Jim >>>> >>>> >>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>> ckozleriii@gmail.com> wrote: >>>> >>>> Typo..."Set it up and then failed that **HOST**" >>>> >>>> And upon that host going down, the storage domain went down. I >>>> only have hosted storage domain and this new one - is this why the DC went >>>> down and no SPM could be elected? >>>> >>>> I dont recall this working this way in early 4.0 or 3.6 >>>> >>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>> ckozleriii@gmail.com> wrote: >>>> >>>> So I've tested this today and I failed a node. Specifically, I >>>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>>> then failed that VM >>>> >>>> However, this did not work and the datacenter went down. My >>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>> to use will obviously cause it to fail >>>> >>>> This seems counter-intuitive to the point of glusterfs or any >>>> redundant storage. If a single host has to be tied to its function, this >>>> introduces a single point of failure >>>> >>>> Am I missing something obvious? >>>> >>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com >>>> > wrote: >>>> >>>> yes, right. What you can do is edit the hosted-engine.conf file >>>> and there is a parameter as shown below [1] and replace h2 and h3 with your >>>> second and third storage servers. Then you will need to restart >>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>> >>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>> >>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>> ckozleriii@gmail.com> wrote: >>>> >>>> Hi Kasturi - >>>> >>>> Thanks for feedback >>>> >>>> > If cockpit+gdeploy plugin would be have been used then that >>>> would have automatically detected glusterfs replica 3 volume created during >>>> Hosted Engine deployment and this question would not have been asked >>>> >>>> Actually, doing hosted-engine --deploy it too also auto detects >>>> glusterfs. I know glusterfs fuse client has the ability to failover >>>> between all nodes in cluster, but I am still curious given the fact that I >>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >>>> --deploy). So my concern was to ensure and find out exactly how engine >>>> works when one node goes away and the fuse client moves over to the other >>>> node in the gluster cluster >>>> >>>> But you did somewhat answer my question, the answer seems to be >>>> no (as default) and I will have to use hosted-engine.conf and change the >>>> parameter as you list >>>> >>>> So I need to do something manual to create HA for engine on >>>> gluster? Yes? >>>> >>>> Thanks so much! >>>> >>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com >>>> > wrote: >>>> >>>> Hi, >>>> >>>> During Hosted Engine setup question about glusterfs volume is >>>> being asked because you have setup the volumes yourself. If cockpit+gdeploy >>>> plugin would be have been used then that would have automatically detected >>>> glusterfs replica 3 volume created during Hosted Engine deployment and this >>>> question would not have been asked. >>>> >>>> During new storage domain creation when glusterfs is selected >>>> there is a feature called 'use managed gluster volumes' and upon checking >>>> this all glusterfs volumes managed will be listed and you could choose the >>>> volume of your choice from the dropdown list. >>>> >>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>> of the gluster node goes down engine uses this parameter to provide ha / >>>> failover. >>>> >>>> Hope this helps !! >>>> >>>> Thanks >>>> kasturi >>>> >>>> >>>> >>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>> ckozleriii@gmail.com> wrote: >>>> >>>> Hello - >>>> >>>> I have successfully created a hyperconverged hosted engine setup >>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >>>> manually configured it all, did not use ovirt node or anything. Built the >>>> gluster volumes myself >>>> >>>> However, I noticed that when setting up the hosted engine and >>>> even when adding a new storage domain with glusterfs type, it still asks >>>> for hostname:/volumename >>>> >>>> This leads me to believe that if that one node goes down (ex: >>>> node1:/data), then ovirt engine wont be able to communicate with that >>>> volume because its trying to reach it on node 1 and thus, go down >>>> >>>> I know glusterfs fuse client can connect to all nodes to provide >>>> failover/ha but how does the engine handle this? >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >

Charles Kozler

2 Sep 2 Sep

1:20 a.m.

Jim - here is my test: - All VM's on node2: hosted engine and 1 test VM - Test VM on gluster storage domain (with mount options set) - hosted engine is on gluster as well, with settings persisted to hosted-engine.conf for backupvol All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or an issue or anything However, what I did notice during this, is my /datatest volume doesnt have quorum set. So I will set that now and report back what happens # gluster volume info datatest Volume Name: datatest Type: Replicate Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/gluster/data/datatest/brick1 Brick2: node2:/gluster/data/datatest/brick1 Brick3: node3:/gluster/data/datatest/brick1 Options Reconfigured: transport.address-family: inet nfs.disable: on Perhaps quorum may be more trouble than its worth when you have 3 nodes and/or 2 nodes + arbiter? Since I am keeping my 3rd node out of ovirt, I am more content on keeping it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps my storage 100% quorum On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim@palousetech.com> wrote:

...

I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

> @Kasturi - Looks good now. Cluster showed down for a moment but VM's > stayed up in their appropriate places. Thanks! > > < Anyone on this list please feel free to correct my response to Jim > if its wrong> > > @ Jim - If you can share your gluster volume info / status I can > confirm (to the best of my knowledge). From my understanding, If you setup > the volume with something like 'gluster volume set <vol> group virt' this > will configure some quorum options as well, Ex: > http://i.imgur.com/Mya4N5o.png > > While, yes, you are configured for arbiter node you're still losing > quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter > to configure quorum which is in effect 3 writable nodes and 1 arbiter. If > one gluster node drops, you still have 2 up. Although in this case, you > probably wouldnt need arbiter at all > > If you are configured, you can drop quorum settings and just let > arbiter run since you're not using arbiter node in your VM cluster part (I > believe), just storage cluster part. When using quorum, you need > 50% of > the cluster being up at one time. Since you have 3 nodes with 1 arbiter, > you're actually losing 1/2 which == 50 which == degraded / hindered gluster > > Again, this is to the best of my knowledge based on other quorum > backed software....and this is what I understand from testing with gluster > and ovirt thus far > > On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> > wrote: > >> Huh...Ok., how do I convert the arbitrar to full replica, then? I >> was misinformed when I created this setup. I thought the arbitrator held >> enough metadata that it could validate or refudiate any one replica (kinda >> like the parity drive for a RAID-4 array). I was also under the impression >> that one replica + Arbitrator is enough to keep the array online and >> functional. >> >> --Jim >> >> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >>> @ Jim - you have only two data volumes and lost quorum. Arbitrator >>> only stores metadata, no actual files. So yes, you were running in degraded >>> mode so some operations were hindered. >>> >>> @ Sahina - Yes, this actually worked fine for me once I did that. >>> However, the issue I am still facing, is when I go to create a new gluster >>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and >>> I select that host. If I fail that host, all VMs halt. I do not recall this >>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node >>> to a volume and vice versa like you could, for instance, for a singular >>> hyperconverged to ex: export a local disk via NFS and then mount it via >>> ovirt domain. But of course, this has its caveats. To that end, I am using >>> gluster replica 3, when configuring it I say "host to use: " node 1, then >>> in the connection details I give it node1:/data. I fail node1, all VMs >>> halt. Did I miss something? >>> >>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >>> wrote: >>> >>>> To the OP question, when you set up a gluster storage domain, you >>>> need to specify backup-volfile-servers=<server2>:<server3> where >>>> server2 and server3 also have bricks running. When server1 is down, and the >>>> volume is mounted again - server2 or server3 are queried to get the gluster >>>> volfiles. >>>> >>>> @Jim, if this does not work, are you using 4.1.5 build with >>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>>> analyse >>>> >>>> If VMs go to paused state - this could mean the storage is not >>>> available. You can check "gluster volume status <volname>" to see if >>>> atleast 2 bricks are running. >>>> >>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>>> johan@kafit.se> wrote: >>>> >>>>> If gluster drops in quorum so that it has less votes than it >>>>> should it will stop file operations until quorum is back to normal.If i >>>>> rember it right you need two bricks to write for quorum to be met and that >>>>> the arbiter only is a vote to avoid split brain. >>>>> >>>>> >>>>> Basically what you have is a raid5 solution without a spare. And >>>>> when one disk dies it will run in degraded mode. And some raid systems will >>>>> stop the raid until you have removed the disk or forced it to run anyway. >>>>> >>>>> You can read up on it here: https://gluster.readthed >>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>>> uorum/ >>>>> >>>>> /Johan >>>>> >>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>>> >>>>> Hi all: >>>>> >>>>> Sorry to hijack the thread, but I was about to start essentially >>>>> the same thread. >>>>> >>>>> I have a 3 node cluster, all three are hosts and gluster nodes >>>>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>>> set: >>>>> >>>>> storage=192.168.8.11:/engine >>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>>> >>>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>>> immediately paused, including the engine (all VMs were running on >>>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>>> (192.168.8.11) was restored. >>>>> >>>>> What's wrong / what did I miss? >>>>> >>>>> (this was set up "manually" through the article on setting up >>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>>> since). >>>>> >>>>> Thanks! >>>>> --Jim >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Typo..."Set it up and then failed that **HOST**" >>>>> >>>>> And upon that host going down, the storage domain went down. I >>>>> only have hosted storage domain and this new one - is this why the DC went >>>>> down and no SPM could be elected? >>>>> >>>>> I dont recall this working this way in early 4.0 or 3.6 >>>>> >>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> So I've tested this today and I failed a node. Specifically, I >>>>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>>>> then failed that VM >>>>> >>>>> However, this did not work and the datacenter went down. My >>>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>>> to use will obviously cause it to fail >>>>> >>>>> This seems counter-intuitive to the point of glusterfs or any >>>>> redundant storage. If a single host has to be tied to its function, this >>>>> introduces a single point of failure >>>>> >>>>> Am I missing something obvious? >>>>> >>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra < >>>>> knarra@redhat.com> wrote: >>>>> >>>>> yes, right. What you can do is edit the hosted-engine.conf file >>>>> and there is a parameter as shown below [1] and replace h2 and h3 with your >>>>> second and third storage servers. Then you will need to restart >>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>>> >>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>>> >>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Hi Kasturi - >>>>> >>>>> Thanks for feedback >>>>> >>>>> > If cockpit+gdeploy plugin would be have been used then that >>>>> would have automatically detected glusterfs replica 3 volume created during >>>>> Hosted Engine deployment and this question would not have been asked >>>>> >>>>> Actually, doing hosted-engine --deploy it too also auto detects >>>>> glusterfs. I know glusterfs fuse client has the ability to failover >>>>> between all nodes in cluster, but I am still curious given the fact that I >>>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >>>>> --deploy). So my concern was to ensure and find out exactly how engine >>>>> works when one node goes away and the fuse client moves over to the other >>>>> node in the gluster cluster >>>>> >>>>> But you did somewhat answer my question, the answer seems to be >>>>> no (as default) and I will have to use hosted-engine.conf and change the >>>>> parameter as you list >>>>> >>>>> So I need to do something manual to create HA for engine on >>>>> gluster? Yes? >>>>> >>>>> Thanks so much! >>>>> >>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra < >>>>> knarra@redhat.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> During Hosted Engine setup question about glusterfs volume is >>>>> being asked because you have setup the volumes yourself. If cockpit+gdeploy >>>>> plugin would be have been used then that would have automatically detected >>>>> glusterfs replica 3 volume created during Hosted Engine deployment and this >>>>> question would not have been asked. >>>>> >>>>> During new storage domain creation when glusterfs is selected >>>>> there is a feature called 'use managed gluster volumes' and upon checking >>>>> this all glusterfs volumes managed will be listed and you could choose the >>>>> volume of your choice from the dropdown list. >>>>> >>>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>>> of the gluster node goes down engine uses this parameter to provide ha / >>>>> failover. >>>>> >>>>> Hope this helps !! >>>>> >>>>> Thanks >>>>> kasturi >>>>> >>>>> >>>>> >>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Hello - >>>>> >>>>> I have successfully created a hyperconverged hosted engine setup >>>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >>>>> manually configured it all, did not use ovirt node or anything. Built the >>>>> gluster volumes myself >>>>> >>>>> However, I noticed that when setting up the hosted engine and >>>>> even when adding a new storage domain with glusterfs type, it still asks >>>>> for hostname:/volumename >>>>> >>>>> This leads me to believe that if that one node goes down (ex: >>>>> node1:/data), then ovirt engine wont be able to communicate with that >>>>> volume because its trying to reach it on node 1 and thus, go down >>>>> >>>>> I know glusterfs fuse client can connect to all nodes to provide >>>>> failover/ha but how does the engine handle this? >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> >

Charles Kozler

1:38 a.m.

Jim - One thing I noticed is that, by accident, I used 'backupvolfile-server=node2:node3' which is apparently a supported setting. It would appear, by reading the man page of mount.glusterfs, the syntax is slightly different. not sure if my setting being different has different impacts hosted-engine.conf: # cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option mnt_options=backup-volfile-servers=node2:node3 And for my datatest gluster domain I have: backupvolfile-server=node2:node3 I am now curious what happens when I move everything to node1 and drop node2 To that end, will follow up with that test On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM - Test VM on gluster storage domain (with mount options set) - hosted engine is on gluster as well, with settings persisted to hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest Type: Replicate Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/gluster/data/datatest/brick1 Brick2: node2:/gluster/data/datatest/brick1 Brick3: node3:/gluster/data/datatest/brick1 Options Reconfigured: transport.address-family: inet nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

> Thanks for the help! > > Here's my gluster volume info for the data export/brick (I have 3: > data, engine, and iso, but they're all configured the same): > > Volume Name: data > Type: Replicate > Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1.nwfiber.com:/gluster/brick2/data > Brick2: ovirt2.nwfiber.com:/gluster/brick2/data > Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) > Options Reconfigured: > performance.strict-o-direct: on > nfs.disable: on > user.cifs: off > network.ping-timeout: 30 > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > cluster.locking-scheme: granular > cluster.data-self-heal-algorithm: full > performance.low-prio-threads: 32 > features.shard-block-size: 512MB > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > cluster.server-quorum-type: server > cluster.quorum-type: auto > network.remote-dio: enable > cluster.eager-lock: enable > performance.stat-prefetch: off > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > performance.readdir-ahead: on > server.allow-insecure: on > [root@ovirt1 ~]# > > > all 3 of my brick nodes ARE also members of the virtualization > cluster (including ovirt3). How can I convert it into a full replica > instead of just an arbiter? > > Thanks! > --Jim > > On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com > > wrote: > >> @Kasturi - Looks good now. Cluster showed down for a moment but >> VM's stayed up in their appropriate places. Thanks! >> >> < Anyone on this list please feel free to correct my response to >> Jim if its wrong> >> >> @ Jim - If you can share your gluster volume info / status I can >> confirm (to the best of my knowledge). From my understanding, If you setup >> the volume with something like 'gluster volume set <vol> group virt' this >> will configure some quorum options as well, Ex: >> http://i.imgur.com/Mya4N5o.png >> >> While, yes, you are configured for arbiter node you're still losing >> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter >> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If >> one gluster node drops, you still have 2 up. Although in this case, you >> probably wouldnt need arbiter at all >> >> If you are configured, you can drop quorum settings and just let >> arbiter run since you're not using arbiter node in your VM cluster part (I >> believe), just storage cluster part. When using quorum, you need > 50% of >> the cluster being up at one time. Since you have 3 nodes with 1 arbiter, >> you're actually losing 1/2 which == 50 which == degraded / hindered gluster >> >> Again, this is to the best of my knowledge based on other quorum >> backed software....and this is what I understand from testing with gluster >> and ovirt thus far >> >> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> >> wrote: >> >>> Huh...Ok., how do I convert the arbitrar to full replica, then? I >>> was misinformed when I created this setup. I thought the arbitrator held >>> enough metadata that it could validate or refudiate any one replica (kinda >>> like the parity drive for a RAID-4 array). I was also under the impression >>> that one replica + Arbitrator is enough to keep the array online and >>> functional. >>> >>> --Jim >>> >>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>>> @ Jim - you have only two data volumes and lost quorum. >>>> Arbitrator only stores metadata, no actual files. So yes, you were running >>>> in degraded mode so some operations were hindered. >>>> >>>> @ Sahina - Yes, this actually worked fine for me once I did that. >>>> However, the issue I am still facing, is when I go to create a new gluster >>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and >>>> I select that host. If I fail that host, all VMs halt. I do not recall this >>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node >>>> to a volume and vice versa like you could, for instance, for a singular >>>> hyperconverged to ex: export a local disk via NFS and then mount it via >>>> ovirt domain. But of course, this has its caveats. To that end, I am using >>>> gluster replica 3, when configuring it I say "host to use: " node 1, then >>>> in the connection details I give it node1:/data. I fail node1, all VMs >>>> halt. Did I miss something? >>>> >>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >>>> wrote: >>>> >>>>> To the OP question, when you set up a gluster storage domain, >>>>> you need to specify backup-volfile-servers=<server2>:<server3> >>>>> where server2 and server3 also have bricks running. When server1 is down, >>>>> and the volume is mounted again - server2 or server3 are queried to get the >>>>> gluster volfiles. >>>>> >>>>> @Jim, if this does not work, are you using 4.1.5 build with >>>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>>>> analyse >>>>> >>>>> If VMs go to paused state - this could mean the storage is not >>>>> available. You can check "gluster volume status <volname>" to see if >>>>> atleast 2 bricks are running. >>>>> >>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>>>> johan@kafit.se> wrote: >>>>> >>>>>> If gluster drops in quorum so that it has less votes than it >>>>>> should it will stop file operations until quorum is back to normal.If i >>>>>> rember it right you need two bricks to write for quorum to be met and that >>>>>> the arbiter only is a vote to avoid split brain. >>>>>> >>>>>> >>>>>> Basically what you have is a raid5 solution without a spare. >>>>>> And when one disk dies it will run in degraded mode. And some raid systems >>>>>> will stop the raid until you have removed the disk or forced it to run >>>>>> anyway. >>>>>> >>>>>> You can read up on it here: https://gluster.readthed >>>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>>>> uorum/ >>>>>> >>>>>> /Johan >>>>>> >>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>>>> >>>>>> Hi all: >>>>>> >>>>>> Sorry to hijack the thread, but I was about to start >>>>>> essentially the same thread. >>>>>> >>>>>> I have a 3 node cluster, all three are hosts and gluster nodes >>>>>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>>>> set: >>>>>> >>>>>> storage=192.168.8.11:/engine >>>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>>>> >>>>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>>>> immediately paused, including the engine (all VMs were running on >>>>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>>>> (192.168.8.11) was restored. >>>>>> >>>>>> What's wrong / what did I miss? >>>>>> >>>>>> (this was set up "manually" through the article on setting up >>>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>>>> since). >>>>>> >>>>>> Thanks! >>>>>> --Jim >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>>>> ckozleriii@gmail.com> wrote: >>>>>> >>>>>> Typo..."Set it up and then failed that **HOST**" >>>>>> >>>>>> And upon that host going down, the storage domain went down. I >>>>>> only have hosted storage domain and this new one - is this why the DC went >>>>>> down and no SPM could be elected? >>>>>> >>>>>> I dont recall this working this way in early 4.0 or 3.6 >>>>>> >>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>>>> ckozleriii@gmail.com> wrote: >>>>>> >>>>>> So I've tested this today and I failed a node. Specifically, I >>>>>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>>>>> then failed that VM >>>>>> >>>>>> However, this did not work and the datacenter went down. My >>>>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>>>> to use will obviously cause it to fail >>>>>> >>>>>> This seems counter-intuitive to the point of glusterfs or any >>>>>> redundant storage. If a single host has to be tied to its function, this >>>>>> introduces a single point of failure >>>>>> >>>>>> Am I missing something obvious? >>>>>> >>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra < >>>>>> knarra@redhat.com> wrote: >>>>>> >>>>>> yes, right. What you can do is edit the hosted-engine.conf >>>>>> file and there is a parameter as shown below [1] and replace h2 and h3 with >>>>>> your second and third storage servers. Then you will need to restart >>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>>>> >>>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>>>> >>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>>>> ckozleriii@gmail.com> wrote: >>>>>> >>>>>> Hi Kasturi - >>>>>> >>>>>> Thanks for feedback >>>>>> >>>>>> > If cockpit+gdeploy plugin would be have been used then that >>>>>> would have automatically detected glusterfs replica 3 volume created during >>>>>> Hosted Engine deployment and this question would not have been asked >>>>>> >>>>>> Actually, doing hosted-engine --deploy it too also auto detects >>>>>> glusterfs. I know glusterfs fuse client has the ability to failover >>>>>> between all nodes in cluster, but I am still curious given the fact that I >>>>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >>>>>> --deploy). So my concern was to ensure and find out exactly how engine >>>>>> works when one node goes away and the fuse client moves over to the other >>>>>> node in the gluster cluster >>>>>> >>>>>> But you did somewhat answer my question, the answer seems to be >>>>>> no (as default) and I will have to use hosted-engine.conf and change the >>>>>> parameter as you list >>>>>> >>>>>> So I need to do something manual to create HA for engine on >>>>>> gluster? Yes? >>>>>> >>>>>> Thanks so much! >>>>>> >>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra < >>>>>> knarra@redhat.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> During Hosted Engine setup question about glusterfs volume >>>>>> is being asked because you have setup the volumes yourself. If >>>>>> cockpit+gdeploy plugin would be have been used then that would have >>>>>> automatically detected glusterfs replica 3 volume created during Hosted >>>>>> Engine deployment and this question would not have been asked. >>>>>> >>>>>> During new storage domain creation when glusterfs is >>>>>> selected there is a feature called 'use managed gluster volumes' and upon >>>>>> checking this all glusterfs volumes managed will be listed and you could >>>>>> choose the volume of your choice from the dropdown list. >>>>>> >>>>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>>>> of the gluster node goes down engine uses this parameter to provide ha / >>>>>> failover. >>>>>> >>>>>> Hope this helps !! >>>>>> >>>>>> Thanks >>>>>> kasturi >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>>>> ckozleriii@gmail.com> wrote: >>>>>> >>>>>> Hello - >>>>>> >>>>>> I have successfully created a hyperconverged hosted engine >>>>>> setup consisting of 3 nodes - 2 for VM's and the third purely for storage. >>>>>> I manually configured it all, did not use ovirt node or anything. Built the >>>>>> gluster volumes myself >>>>>> >>>>>> However, I noticed that when setting up the hosted engine and >>>>>> even when adding a new storage domain with glusterfs type, it still asks >>>>>> for hostname:/volumename >>>>>> >>>>>> This leads me to believe that if that one node goes down (ex: >>>>>> node1:/data), then ovirt engine wont be able to communicate with that >>>>>> volume because its trying to reach it on node 1 and thus, go down >>>>>> >>>>>> I know glusterfs fuse client can connect to all nodes to >>>>>> provide failover/ha but how does the engine handle this? >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >

Charles Kozler

1:53 a.m.

Jim - result of this test...engine crashed but all VM's on the gluster domain (backed by the same physical nodes/hardware/gluster process/etc) stayed up fine I guess there is some functional difference between 'backupvolfile-server' and 'backup-volfile-servers'? Perhaps try latter and see what happens. My next test is going to be to configure hosted-engine.conf with backupvolfile-server=node2:node3 and see if engine VM still shuts down. Seems odd engine VM would shut itself down (or vdsm would shut it down) but not other VMs. Perhaps built in HA functionality of sorts On Fri, Sep 1, 2017 at 7:38 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Jim -

One thing I noticed is that, by accident, I used 'backupvolfile-server=node2:node3' which is apparently a supported setting. It would appear, by reading the man page of mount.glusterfs, the syntax is slightly different. not sure if my setting being different has different impacts

hosted-engine.conf:

# cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option mnt_options=backup-volfile-servers=node2:node3

And for my datatest gluster domain I have:

backupvolfile-server=node2:node3

I am now curious what happens when I move everything to node1 and drop node2

To that end, will follow up with that test

On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM - Test VM on gluster storage domain (with mount options set) - hosted engine is on gluster as well, with settings persisted to hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest Type: Replicate Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/gluster/data/datatest/brick1 Brick2: node2:/gluster/data/datatest/brick1 Brick3: node3:/gluster/data/datatest/brick1 Options Reconfigured: transport.address-family: inet nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

> I'm now also confused as to what the point of an arbiter is / what > it does / why one would use it. > > On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> > wrote: > >> Thanks for the help! >> >> Here's my gluster volume info for the data export/brick (I have 3: >> data, engine, and iso, but they're all configured the same): >> >> Volume Name: data >> Type: Replicate >> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data >> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data >> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) >> Options Reconfigured: >> performance.strict-o-direct: on >> nfs.disable: on >> user.cifs: off >> network.ping-timeout: 30 >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> cluster.locking-scheme: granular >> cluster.data-self-heal-algorithm: full >> performance.low-prio-threads: 32 >> features.shard-block-size: 512MB >> features.shard: on >> storage.owner-gid: 36 >> storage.owner-uid: 36 >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.stat-prefetch: off >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.readdir-ahead: on >> server.allow-insecure: on >> [root@ovirt1 ~]# >> >> >> all 3 of my brick nodes ARE also members of the virtualization >> cluster (including ovirt3). How can I convert it into a full replica >> instead of just an arbiter? >> >> Thanks! >> --Jim >> >> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >>> @Kasturi - Looks good now. Cluster showed down for a moment but >>> VM's stayed up in their appropriate places. Thanks! >>> >>> < Anyone on this list please feel free to correct my response to >>> Jim if its wrong> >>> >>> @ Jim - If you can share your gluster volume info / status I can >>> confirm (to the best of my knowledge). From my understanding, If you setup >>> the volume with something like 'gluster volume set <vol> group virt' this >>> will configure some quorum options as well, Ex: >>> http://i.imgur.com/Mya4N5o.png >>> >>> While, yes, you are configured for arbiter node you're still >>> losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being >>> arbiter to configure quorum which is in effect 3 writable nodes and 1 >>> arbiter. If one gluster node drops, you still have 2 up. Although in this >>> case, you probably wouldnt need arbiter at all >>> >>> If you are configured, you can drop quorum settings and just let >>> arbiter run since you're not using arbiter node in your VM cluster part (I >>> believe), just storage cluster part. When using quorum, you need > 50% of >>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter, >>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster >>> >>> Again, this is to the best of my knowledge based on other quorum >>> backed software....and this is what I understand from testing with gluster >>> and ovirt thus far >>> >>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> >>> wrote: >>> >>>> Huh...Ok., how do I convert the arbitrar to full replica, then? >>>> I was misinformed when I created this setup. I thought the arbitrator held >>>> enough metadata that it could validate or refudiate any one replica (kinda >>>> like the parity drive for a RAID-4 array). I was also under the impression >>>> that one replica + Arbitrator is enough to keep the array online and >>>> functional. >>>> >>>> --Jim >>>> >>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler < >>>> ckozleriii@gmail.com> wrote: >>>> >>>>> @ Jim - you have only two data volumes and lost quorum. >>>>> Arbitrator only stores metadata, no actual files. So yes, you were running >>>>> in degraded mode so some operations were hindered. >>>>> >>>>> @ Sahina - Yes, this actually worked fine for me once I did >>>>> that. However, the issue I am still facing, is when I go to create a new >>>>> gluster storage domain (replica 3, hyperconverged) and I tell it "Host to >>>>> use" and I select that host. If I fail that host, all VMs halt. I do not >>>>> recall this in 3.6 or early 4.0. This to me makes it seem like this is >>>>> "pinning" a node to a volume and vice versa like you could, for instance, >>>>> for a singular hyperconverged to ex: export a local disk via NFS and then >>>>> mount it via ovirt domain. But of course, this has its caveats. To that >>>>> end, I am using gluster replica 3, when configuring it I say "host to use: >>>>> " node 1, then in the connection details I give it node1:/data. I fail >>>>> node1, all VMs halt. Did I miss something? >>>>> >>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >>>>> wrote: >>>>> >>>>>> To the OP question, when you set up a gluster storage domain, >>>>>> you need to specify backup-volfile-servers=<server2>:<server3> >>>>>> where server2 and server3 also have bricks running. When server1 is down, >>>>>> and the volume is mounted again - server2 or server3 are queried to get the >>>>>> gluster volfiles. >>>>>> >>>>>> @Jim, if this does not work, are you using 4.1.5 build with >>>>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>>>>> analyse >>>>>> >>>>>> If VMs go to paused state - this could mean the storage is not >>>>>> available. You can check "gluster volume status <volname>" to see if >>>>>> atleast 2 bricks are running. >>>>>> >>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>>>>> johan@kafit.se> wrote: >>>>>> >>>>>>> If gluster drops in quorum so that it has less votes than it >>>>>>> should it will stop file operations until quorum is back to normal.If i >>>>>>> rember it right you need two bricks to write for quorum to be met and that >>>>>>> the arbiter only is a vote to avoid split brain. >>>>>>> >>>>>>> >>>>>>> Basically what you have is a raid5 solution without a spare. >>>>>>> And when one disk dies it will run in degraded mode. And some raid systems >>>>>>> will stop the raid until you have removed the disk or forced it to run >>>>>>> anyway. >>>>>>> >>>>>>> You can read up on it here: https://gluster.readthed >>>>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>>>>> uorum/ >>>>>>> >>>>>>> /Johan >>>>>>> >>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>>>>> >>>>>>> Hi all: >>>>>>> >>>>>>> Sorry to hijack the thread, but I was about to start >>>>>>> essentially the same thread. >>>>>>> >>>>>>> I have a 3 node cluster, all three are hosts and gluster nodes >>>>>>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>>>>> set: >>>>>>> >>>>>>> storage=192.168.8.11:/engine >>>>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>>>>> >>>>>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>>>>> immediately paused, including the engine (all VMs were running on >>>>>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>>>>> (192.168.8.11) was restored. >>>>>>> >>>>>>> What's wrong / what did I miss? >>>>>>> >>>>>>> (this was set up "manually" through the article on setting up >>>>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>>>>> since). >>>>>>> >>>>>>> Thanks! >>>>>>> --Jim >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>>>>> ckozleriii@gmail.com> wrote: >>>>>>> >>>>>>> Typo..."Set it up and then failed that **HOST**" >>>>>>> >>>>>>> And upon that host going down, the storage domain went down. I >>>>>>> only have hosted storage domain and this new one - is this why the DC went >>>>>>> down and no SPM could be elected? >>>>>>> >>>>>>> I dont recall this working this way in early 4.0 or 3.6 >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>>>>> ckozleriii@gmail.com> wrote: >>>>>>> >>>>>>> So I've tested this today and I failed a node. Specifically, I >>>>>>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>>>>>> then failed that VM >>>>>>> >>>>>>> However, this did not work and the datacenter went down. My >>>>>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>>>>> to use will obviously cause it to fail >>>>>>> >>>>>>> This seems counter-intuitive to the point of glusterfs or any >>>>>>> redundant storage. If a single host has to be tied to its function, this >>>>>>> introduces a single point of failure >>>>>>> >>>>>>> Am I missing something obvious? >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra < >>>>>>> knarra@redhat.com> wrote: >>>>>>> >>>>>>> yes, right. What you can do is edit the hosted-engine.conf >>>>>>> file and there is a parameter as shown below [1] and replace h2 and h3 with >>>>>>> your second and third storage servers. Then you will need to restart >>>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>>>>> >>>>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>>>>> ckozleriii@gmail.com> wrote: >>>>>>> >>>>>>> Hi Kasturi - >>>>>>> >>>>>>> Thanks for feedback >>>>>>> >>>>>>> > If cockpit+gdeploy plugin would be have been used then that >>>>>>> would have automatically detected glusterfs replica 3 volume created during >>>>>>> Hosted Engine deployment and this question would not have been asked >>>>>>> >>>>>>> Actually, doing hosted-engine --deploy it too also auto >>>>>>> detects glusterfs. I know glusterfs fuse client has the ability to >>>>>>> failover between all nodes in cluster, but I am still curious given the >>>>>>> fact that I see in ovirt config node1:/engine (being node1 I set it to in >>>>>>> hosted-engine --deploy). So my concern was to ensure and find out exactly >>>>>>> how engine works when one node goes away and the fuse client moves over to >>>>>>> the other node in the gluster cluster >>>>>>> >>>>>>> But you did somewhat answer my question, the answer seems to >>>>>>> be no (as default) and I will have to use hosted-engine.conf and change the >>>>>>> parameter as you list >>>>>>> >>>>>>> So I need to do something manual to create HA for engine on >>>>>>> gluster? Yes? >>>>>>> >>>>>>> Thanks so much! >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra < >>>>>>> knarra@redhat.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> During Hosted Engine setup question about glusterfs volume >>>>>>> is being asked because you have setup the volumes yourself. If >>>>>>> cockpit+gdeploy plugin would be have been used then that would have >>>>>>> automatically detected glusterfs replica 3 volume created during Hosted >>>>>>> Engine deployment and this question would not have been asked. >>>>>>> >>>>>>> During new storage domain creation when glusterfs is >>>>>>> selected there is a feature called 'use managed gluster volumes' and upon >>>>>>> checking this all glusterfs volumes managed will be listed and you could >>>>>>> choose the volume of your choice from the dropdown list. >>>>>>> >>>>>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>>>>> of the gluster node goes down engine uses this parameter to provide ha / >>>>>>> failover. >>>>>>> >>>>>>> Hope this helps !! >>>>>>> >>>>>>> Thanks >>>>>>> kasturi >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>>>>> ckozleriii@gmail.com> wrote: >>>>>>> >>>>>>> Hello - >>>>>>> >>>>>>> I have successfully created a hyperconverged hosted engine >>>>>>> setup consisting of 3 nodes - 2 for VM's and the third purely for storage. >>>>>>> I manually configured it all, did not use ovirt node or anything. Built the >>>>>>> gluster volumes myself >>>>>>> >>>>>>> However, I noticed that when setting up the hosted engine and >>>>>>> even when adding a new storage domain with glusterfs type, it still asks >>>>>>> for hostname:/volumename >>>>>>> >>>>>>> This leads me to believe that if that one node goes down (ex: >>>>>>> node1:/data), then ovirt engine wont be able to communicate with that >>>>>>> volume because its trying to reach it on node 1 and thus, go down >>>>>>> >>>>>>> I know glusterfs fuse client can connect to all nodes to >>>>>>> provide failover/ha but how does the engine handle this? >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >> >

Kasturi Narra

4 Sep 4 Sep

8:44 a.m.

Hi charles, The right option is backup-volfile-servers and not 'backupvolfile-server'. So can you please use the first one and test ? Thanks kasturi On Sat, Sep 2, 2017 at 5:23 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Jim -

result of this test...engine crashed but all VM's on the gluster domain (backed by the same physical nodes/hardware/gluster process/etc) stayed up fine

I guess there is some functional difference between 'backupvolfile-server' and 'backup-volfile-servers'?

Perhaps try latter and see what happens. My next test is going to be to configure hosted-engine.conf with backupvolfile-server=node2:node3 and see if engine VM still shuts down. Seems odd engine VM would shut itself down (or vdsm would shut it down) but not other VMs. Perhaps built in HA functionality of sorts

On Fri, Sep 1, 2017 at 7:38 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Jim -

One thing I noticed is that, by accident, I used 'backupvolfile-server=node2:node3' which is apparently a supported setting. It would appear, by reading the man page of mount.glusterfs, the syntax is slightly different. not sure if my setting being different has different impacts

hosted-engine.conf:

# cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option mnt_options=backup-volfile-servers=node2:node3

And for my datatest gluster domain I have:

backupvolfile-server=node2:node3

I am now curious what happens when I move everything to node1 and drop node2

To that end, will follow up with that test

On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM - Test VM on gluster storage domain (with mount options set) - hosted engine is on gluster as well, with settings persisted to hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest Type: Replicate Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/gluster/data/datatest/brick1 Brick2: node2:/gluster/data/datatest/brick1 Brick3: node3:/gluster/data/datatest/brick1 Options Reconfigured: transport.address-family: inet nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com > wrote:

> These can get a little confusing but this explains it best: > https://gluster.readthedocs.io/en/latest/Administrator > %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes > > Basically in the first paragraph they are explaining why you cant > have HA with quorum for 2 nodes. Here is another overview doc that explains > some more > > http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/ > > From my understanding arbiter is good for resolving split brains. > Quorum and arbiter are two different things though quorum is a mechanism to > help you **avoid** split brain and the arbiter is to help gluster resolve > split brain by voting and other internal mechanics (as outlined in link 1). > How did you create the volume exactly - what command? It looks to me like > you created it with 'gluster volume create replica 2 arbiter 1 {....}' per > your earlier mention of "replica 2 arbiter 1". That being said, if you did > that and then setup quorum in the volume configuration, this would cause > your gluster to halt up since quorum was lost (as you saw until you > recovered node 1) > > As you can see from the docs, there is still a corner case for > getting in to split brain with replica 3, which again, is where arbiter > would help gluster resolve it > > I need to amend my previous statement: I was told that arbiter > volume does not store data, only metadata. I cannot find anything in the > docs backing this up however it would make sense for it to be. That being > said, in my setup, I would not include my arbiter or my third node in my > ovirt VM cluster component. I would keep it completely separate > > > On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> > wrote: > >> I'm now also confused as to what the point of an arbiter is / what >> it does / why one would use it. >> >> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> >> wrote: >> >>> Thanks for the help! >>> >>> Here's my gluster volume info for the data export/brick (I have 3: >>> data, engine, and iso, but they're all configured the same): >>> >>> Volume Name: data >>> Type: Replicate >>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x (2 + 1) = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data >>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data >>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) >>> Options Reconfigured: >>> performance.strict-o-direct: on >>> nfs.disable: on >>> user.cifs: off >>> network.ping-timeout: 30 >>> cluster.shd-max-threads: 8 >>> cluster.shd-wait-qlength: 10000 >>> cluster.locking-scheme: granular >>> cluster.data-self-heal-algorithm: full >>> performance.low-prio-threads: 32 >>> features.shard-block-size: 512MB >>> features.shard: on >>> storage.owner-gid: 36 >>> storage.owner-uid: 36 >>> cluster.server-quorum-type: server >>> cluster.quorum-type: auto >>> network.remote-dio: enable >>> cluster.eager-lock: enable >>> performance.stat-prefetch: off >>> performance.io-cache: off >>> performance.read-ahead: off >>> performance.quick-read: off >>> performance.readdir-ahead: on >>> server.allow-insecure: on >>> [root@ovirt1 ~]# >>> >>> >>> all 3 of my brick nodes ARE also members of the virtualization >>> cluster (including ovirt3). How can I convert it into a full replica >>> instead of just an arbiter? >>> >>> Thanks! >>> --Jim >>> >>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler < >>> ckozleriii@gmail.com> wrote: >>> >>>> @Kasturi - Looks good now. Cluster showed down for a moment but >>>> VM's stayed up in their appropriate places. Thanks! >>>> >>>> < Anyone on this list please feel free to correct my response to >>>> Jim if its wrong> >>>> >>>> @ Jim - If you can share your gluster volume info / status I can >>>> confirm (to the best of my knowledge). From my understanding, If you setup >>>> the volume with something like 'gluster volume set <vol> group virt' this >>>> will configure some quorum options as well, Ex: >>>> http://i.imgur.com/Mya4N5o.png >>>> >>>> While, yes, you are configured for arbiter node you're still >>>> losing quorum by dropping from 2 -> 1. You would need 4 node with 1 being >>>> arbiter to configure quorum which is in effect 3 writable nodes and 1 >>>> arbiter. If one gluster node drops, you still have 2 up. Although in this >>>> case, you probably wouldnt need arbiter at all >>>> >>>> If you are configured, you can drop quorum settings and just let >>>> arbiter run since you're not using arbiter node in your VM cluster part (I >>>> believe), just storage cluster part. When using quorum, you need > 50% of >>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter, >>>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster >>>> >>>> Again, this is to the best of my knowledge based on other quorum >>>> backed software....and this is what I understand from testing with gluster >>>> and ovirt thus far >>>> >>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com >>>> > wrote: >>>> >>>>> Huh...Ok., how do I convert the arbitrar to full replica, then? >>>>> I was misinformed when I created this setup. I thought the arbitrator held >>>>> enough metadata that it could validate or refudiate any one replica (kinda >>>>> like the parity drive for a RAID-4 array). I was also under the impression >>>>> that one replica + Arbitrator is enough to keep the array online and >>>>> functional. >>>>> >>>>> --Jim >>>>> >>>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>>> @ Jim - you have only two data volumes and lost quorum. >>>>>> Arbitrator only stores metadata, no actual files. So yes, you were running >>>>>> in degraded mode so some operations were hindered. >>>>>> >>>>>> @ Sahina - Yes, this actually worked fine for me once I did >>>>>> that. However, the issue I am still facing, is when I go to create a new >>>>>> gluster storage domain (replica 3, hyperconverged) and I tell it "Host to >>>>>> use" and I select that host. If I fail that host, all VMs halt. I do not >>>>>> recall this in 3.6 or early 4.0. This to me makes it seem like this is >>>>>> "pinning" a node to a volume and vice versa like you could, for instance, >>>>>> for a singular hyperconverged to ex: export a local disk via NFS and then >>>>>> mount it via ovirt domain. But of course, this has its caveats. To that >>>>>> end, I am using gluster replica 3, when configuring it I say "host to use: >>>>>> " node 1, then in the connection details I give it node1:/data. I fail >>>>>> node1, all VMs halt. Did I miss something? >>>>>> >>>>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> To the OP question, when you set up a gluster storage domain, >>>>>>> you need to specify backup-volfile-servers=<server2>:<server3> >>>>>>> where server2 and server3 also have bricks running. When server1 is down, >>>>>>> and the volume is mounted again - server2 or server3 are queried to get the >>>>>>> gluster volfiles. >>>>>>> >>>>>>> @Jim, if this does not work, are you using 4.1.5 build with >>>>>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>>>>>> analyse >>>>>>> >>>>>>> If VMs go to paused state - this could mean the storage is not >>>>>>> available. You can check "gluster volume status <volname>" to see if >>>>>>> atleast 2 bricks are running. >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>>>>>> johan@kafit.se> wrote: >>>>>>> >>>>>>>> If gluster drops in quorum so that it has less votes than it >>>>>>>> should it will stop file operations until quorum is back to normal.If i >>>>>>>> rember it right you need two bricks to write for quorum to be met and that >>>>>>>> the arbiter only is a vote to avoid split brain. >>>>>>>> >>>>>>>> >>>>>>>> Basically what you have is a raid5 solution without a spare. >>>>>>>> And when one disk dies it will run in degraded mode. And some raid systems >>>>>>>> will stop the raid until you have removed the disk or forced it to run >>>>>>>> anyway. >>>>>>>> >>>>>>>> You can read up on it here: https://gluster.readthed >>>>>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>>>>>> uorum/ >>>>>>>> >>>>>>>> /Johan >>>>>>>> >>>>>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>>>>>> >>>>>>>> Hi all: >>>>>>>> >>>>>>>> Sorry to hijack the thread, but I was about to start >>>>>>>> essentially the same thread. >>>>>>>> >>>>>>>> I have a 3 node cluster, all three are hosts and gluster >>>>>>>> nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>>>>>> set: >>>>>>>> >>>>>>>> storage=192.168.8.11:/engine >>>>>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>>>>>> >>>>>>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>>>>>> immediately paused, including the engine (all VMs were running on >>>>>>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>>>>>> (192.168.8.11) was restored. >>>>>>>> >>>>>>>> What's wrong / what did I miss? >>>>>>>> >>>>>>>> (this was set up "manually" through the article on setting up >>>>>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>>>>>> since). >>>>>>>> >>>>>>>> Thanks! >>>>>>>> --Jim >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>>>>>> ckozleriii@gmail.com> wrote: >>>>>>>> >>>>>>>> Typo..."Set it up and then failed that **HOST**" >>>>>>>> >>>>>>>> And upon that host going down, the storage domain went down. >>>>>>>> I only have hosted storage domain and this new one - is this why the DC >>>>>>>> went down and no SPM could be elected? >>>>>>>> >>>>>>>> I dont recall this working this way in early 4.0 or 3.6 >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>>>>>> ckozleriii@gmail.com> wrote: >>>>>>>> >>>>>>>> So I've tested this today and I failed a node. Specifically, >>>>>>>> I setup a glusterfs domain and selected "host to use: node1". Set it up and >>>>>>>> then failed that VM >>>>>>>> >>>>>>>> However, this did not work and the datacenter went down. My >>>>>>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>>>>>> to use will obviously cause it to fail >>>>>>>> >>>>>>>> This seems counter-intuitive to the point of glusterfs or any >>>>>>>> redundant storage. If a single host has to be tied to its function, this >>>>>>>> introduces a single point of failure >>>>>>>> >>>>>>>> Am I missing something obvious? >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra < >>>>>>>> knarra@redhat.com> wrote: >>>>>>>> >>>>>>>> yes, right. What you can do is edit the hosted-engine.conf >>>>>>>> file and there is a parameter as shown below [1] and replace h2 and h3 with >>>>>>>> your second and third storage servers. Then you will need to restart >>>>>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>>>>>> >>>>>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>>>>>> ckozleriii@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Kasturi - >>>>>>>> >>>>>>>> Thanks for feedback >>>>>>>> >>>>>>>> > If cockpit+gdeploy plugin would be have been used then >>>>>>>> that would have automatically detected glusterfs replica 3 volume created >>>>>>>> during Hosted Engine deployment and this question would not have been asked >>>>>>>> >>>>>>>> Actually, doing hosted-engine --deploy it too also auto >>>>>>>> detects glusterfs. I know glusterfs fuse client has the ability to >>>>>>>> failover between all nodes in cluster, but I am still curious given the >>>>>>>> fact that I see in ovirt config node1:/engine (being node1 I set it to in >>>>>>>> hosted-engine --deploy). So my concern was to ensure and find out exactly >>>>>>>> how engine works when one node goes away and the fuse client moves over to >>>>>>>> the other node in the gluster cluster >>>>>>>> >>>>>>>> But you did somewhat answer my question, the answer seems to >>>>>>>> be no (as default) and I will have to use hosted-engine.conf and change the >>>>>>>> parameter as you list >>>>>>>> >>>>>>>> So I need to do something manual to create HA for engine on >>>>>>>> gluster? Yes? >>>>>>>> >>>>>>>> Thanks so much! >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra < >>>>>>>> knarra@redhat.com> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> During Hosted Engine setup question about glusterfs volume >>>>>>>> is being asked because you have setup the volumes yourself. If >>>>>>>> cockpit+gdeploy plugin would be have been used then that would have >>>>>>>> automatically detected glusterfs replica 3 volume created during Hosted >>>>>>>> Engine deployment and this question would not have been asked. >>>>>>>> >>>>>>>> During new storage domain creation when glusterfs is >>>>>>>> selected there is a feature called 'use managed gluster volumes' and upon >>>>>>>> checking this all glusterfs volumes managed will be listed and you could >>>>>>>> choose the volume of your choice from the dropdown list. >>>>>>>> >>>>>>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>>>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>>>>>> of the gluster node goes down engine uses this parameter to provide ha / >>>>>>>> failover. >>>>>>>> >>>>>>>> Hope this helps !! >>>>>>>> >>>>>>>> Thanks >>>>>>>> kasturi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>>>>>> ckozleriii@gmail.com> wrote: >>>>>>>> >>>>>>>> Hello - >>>>>>>> >>>>>>>> I have successfully created a hyperconverged hosted engine >>>>>>>> setup consisting of 3 nodes - 2 for VM's and the third purely for storage. >>>>>>>> I manually configured it all, did not use ovirt node or anything. Built the >>>>>>>> gluster volumes myself >>>>>>>> >>>>>>>> However, I noticed that when setting up the hosted engine and >>>>>>>> even when adding a new storage domain with glusterfs type, it still asks >>>>>>>> for hostname:/volumename >>>>>>>> >>>>>>>> This leads me to believe that if that one node goes down (ex: >>>>>>>> node1:/data), then ovirt engine wont be able to communicate with that >>>>>>>> volume because its trying to reach it on node 1 and thus, go down >>>>>>>> >>>>>>>> I know glusterfs fuse client can connect to all nodes to >>>>>>>> provide failover/ha but how does the engine handle this? >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>> >>>> >>> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Kasturi Narra

8:44 a.m.

Hi Jim, I looked at the gluster volume info and that looks to be fine for me. Recommended config is arbiter for data and vmstore and for engine it should be replica 3 since we would want HE to be available always. If i understand right the problem you are facing is when you shut down one of the node all the HE vms and app vms goes to paused state right ? For debugging further and to ensure that volume has been mounted using backup-volfile-servers option, you can move the storage domain to maintenance which will umount the volume , activate back which will mount it again. During this time you can check the mount command passed in vdsm logs and that should have the backup-volfile-servers option. Can you please confirm if you have ovirt-guest-agent installed on the app vms and power management enabled ? ovirt-guest-agent is required on the app vms to ensure HA functionality Thanks kasturi On Sat, Sep 2, 2017 at 2:48 AM, Jim Kusznir <jim@palousetech.com> wrote:

...

I can confirm that I did set it up manually, and I did specify backupvol, and in the "manage domain" storage settings, I do have under mount options, backup-volfile-servers=192.168.8.12:192.168.8.13 (and this was done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - here is my setup which I will test in a few (brand new cluster) and report back what I found in my tests

- 3x servers direct connected via 10Gb - 2 of those 3 setup in ovirt as hosts - Hosted engine - Gluster replica 3 (no arbiter) for all volumes - 1x engine volume gluster replica 3 manually configured (not using ovirt managed gluster) - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt managed gluster) - 1x nfstest domain served from some other server in my infrastructure which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage domain with backupvol? Also, confirm you updated hosted-engine.conf with backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
So, after reading the first document twice and the 2nd link thoroughly once, I believe that the arbitrator volume should be sufficient and count for replica / split brain. EG, if any one full replica is down, and the arbitrator and the other replica is up, then it should have quorum and all should be good.

I think my underlying problem has to do more with config than the replica state. That said, I did size the drive on my 3rd node planning to have an identical copy of all data on it, so I'm still not opposed to making it a full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
These can get a little confusing but this explains it best: https://gluster.readthedocs.io/en/latest/Administrator %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA with quorum for 2 nodes. Here is another overview doc that explains some more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

From my understanding arbiter is good for resolving split brains. Quorum and arbiter are two different things though quorum is a mechanism to help you **avoid** split brain and the arbiter is to help gluster resolve split brain by voting and other internal mechanics (as outlined in link 1). How did you create the volume exactly - what command? It looks to me like you created it with 'gluster volume create replica 2 arbiter 1 {....}' per your earlier mention of "replica 2 arbiter 1". That being said, if you did that and then setup quorum in the volume configuration, this would cause your gluster to halt up since quorum was lost (as you saw until you recovered node 1)

As you can see from the docs, there is still a corner case for getting in to split brain with replica 3, which again, is where arbiter would help gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does not store data, only metadata. I cannot find anything in the docs backing this up however it would make sense for it to be. That being said, in my setup, I would not include my arbiter or my third node in my ovirt VM cluster component. I would keep it completely separate

On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir <jim@palousetech.com> wrote:

...
I'm now also confused as to what the point of an arbiter is / what it does / why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir <jim@palousetech.com> wrote:

...
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data, engine, and iso, but they're all configured the same):

Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on server.allow-insecure: on [root@ovirt1 ~]#

all 3 of my brick nodes ARE also members of the virtualization cluster (including ovirt3). How can I convert it into a full replica instead of just an arbiter?

Thanks! --Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

> @Kasturi - Looks good now. Cluster showed down for a moment but VM's > stayed up in their appropriate places. Thanks! > > < Anyone on this list please feel free to correct my response to Jim > if its wrong> > > @ Jim - If you can share your gluster volume info / status I can > confirm (to the best of my knowledge). From my understanding, If you setup > the volume with something like 'gluster volume set <vol> group virt' this > will configure some quorum options as well, Ex: > http://i.imgur.com/Mya4N5o.png > > While, yes, you are configured for arbiter node you're still losing > quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter > to configure quorum which is in effect 3 writable nodes and 1 arbiter. If > one gluster node drops, you still have 2 up. Although in this case, you > probably wouldnt need arbiter at all > > If you are configured, you can drop quorum settings and just let > arbiter run since you're not using arbiter node in your VM cluster part (I > believe), just storage cluster part. When using quorum, you need > 50% of > the cluster being up at one time. Since you have 3 nodes with 1 arbiter, > you're actually losing 1/2 which == 50 which == degraded / hindered gluster > > Again, this is to the best of my knowledge based on other quorum > backed software....and this is what I understand from testing with gluster > and ovirt thus far > > On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir <jim@palousetech.com> > wrote: > >> Huh...Ok., how do I convert the arbitrar to full replica, then? I >> was misinformed when I created this setup. I thought the arbitrator held >> enough metadata that it could validate or refudiate any one replica (kinda >> like the parity drive for a RAID-4 array). I was also under the impression >> that one replica + Arbitrator is enough to keep the array online and >> functional. >> >> --Jim >> >> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler < >> ckozleriii@gmail.com> wrote: >> >>> @ Jim - you have only two data volumes and lost quorum. Arbitrator >>> only stores metadata, no actual files. So yes, you were running in degraded >>> mode so some operations were hindered. >>> >>> @ Sahina - Yes, this actually worked fine for me once I did that. >>> However, the issue I am still facing, is when I go to create a new gluster >>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and >>> I select that host. If I fail that host, all VMs halt. I do not recall this >>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node >>> to a volume and vice versa like you could, for instance, for a singular >>> hyperconverged to ex: export a local disk via NFS and then mount it via >>> ovirt domain. But of course, this has its caveats. To that end, I am using >>> gluster replica 3, when configuring it I say "host to use: " node 1, then >>> in the connection details I give it node1:/data. I fail node1, all VMs >>> halt. Did I miss something? >>> >>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> >>> wrote: >>> >>>> To the OP question, when you set up a gluster storage domain, you >>>> need to specify backup-volfile-servers=<server2>:<server3> where >>>> server2 and server3 also have bricks running. When server1 is down, and the >>>> volume is mounted again - server2 or server3 are queried to get the gluster >>>> volfiles. >>>> >>>> @Jim, if this does not work, are you using 4.1.5 build with >>>> libgfapi access? If not, please provide the vdsm and gluster mount logs to >>>> analyse >>>> >>>> If VMs go to paused state - this could mean the storage is not >>>> available. You can check "gluster volume status <volname>" to see if >>>> atleast 2 bricks are running. >>>> >>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson < >>>> johan@kafit.se> wrote: >>>> >>>>> If gluster drops in quorum so that it has less votes than it >>>>> should it will stop file operations until quorum is back to normal.If i >>>>> rember it right you need two bricks to write for quorum to be met and that >>>>> the arbiter only is a vote to avoid split brain. >>>>> >>>>> >>>>> Basically what you have is a raid5 solution without a spare. And >>>>> when one disk dies it will run in degraded mode. And some raid systems will >>>>> stop the raid until you have removed the disk or forced it to run anyway. >>>>> >>>>> You can read up on it here: https://gluster.readthed >>>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-q >>>>> uorum/ >>>>> >>>>> /Johan >>>>> >>>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote: >>>>> >>>>> Hi all: >>>>> >>>>> Sorry to hijack the thread, but I was about to start essentially >>>>> the same thread. >>>>> >>>>> I have a 3 node cluster, all three are hosts and gluster nodes >>>>> (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= >>>>> set: >>>>> >>>>> storage=192.168.8.11:/engine >>>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13 >>>>> >>>>> I had an issue today where 192.168.8.11 went down. ALL VMs >>>>> immediately paused, including the engine (all VMs were running on >>>>> host2:192.168.8.12). I couldn't get any gluster stuff working until host1 >>>>> (192.168.8.11) was restored. >>>>> >>>>> What's wrong / what did I miss? >>>>> >>>>> (this was set up "manually" through the article on setting up >>>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 >>>>> since). >>>>> >>>>> Thanks! >>>>> --Jim >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Typo..."Set it up and then failed that **HOST**" >>>>> >>>>> And upon that host going down, the storage domain went down. I >>>>> only have hosted storage domain and this new one - is this why the DC went >>>>> down and no SPM could be elected? >>>>> >>>>> I dont recall this working this way in early 4.0 or 3.6 >>>>> >>>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> So I've tested this today and I failed a node. Specifically, I >>>>> setup a glusterfs domain and selected "host to use: node1". Set it up and >>>>> then failed that VM >>>>> >>>>> However, this did not work and the datacenter went down. My >>>>> engine stayed up, however, it seems configuring a domain to pin to a host >>>>> to use will obviously cause it to fail >>>>> >>>>> This seems counter-intuitive to the point of glusterfs or any >>>>> redundant storage. If a single host has to be tied to its function, this >>>>> introduces a single point of failure >>>>> >>>>> Am I missing something obvious? >>>>> >>>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra < >>>>> knarra@redhat.com> wrote: >>>>> >>>>> yes, right. What you can do is edit the hosted-engine.conf file >>>>> and there is a parameter as shown below [1] and replace h2 and h3 with your >>>>> second and third storage servers. Then you will need to restart >>>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes . >>>>> >>>>> [1] 'mnt_options=backup-volfile-servers=<h2>:<h3>' >>>>> >>>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Hi Kasturi - >>>>> >>>>> Thanks for feedback >>>>> >>>>> > If cockpit+gdeploy plugin would be have been used then that >>>>> would have automatically detected glusterfs replica 3 volume created during >>>>> Hosted Engine deployment and this question would not have been asked >>>>> >>>>> Actually, doing hosted-engine --deploy it too also auto detects >>>>> glusterfs. I know glusterfs fuse client has the ability to failover >>>>> between all nodes in cluster, but I am still curious given the fact that I >>>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine >>>>> --deploy). So my concern was to ensure and find out exactly how engine >>>>> works when one node goes away and the fuse client moves over to the other >>>>> node in the gluster cluster >>>>> >>>>> But you did somewhat answer my question, the answer seems to be >>>>> no (as default) and I will have to use hosted-engine.conf and change the >>>>> parameter as you list >>>>> >>>>> So I need to do something manual to create HA for engine on >>>>> gluster? Yes? >>>>> >>>>> Thanks so much! >>>>> >>>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra < >>>>> knarra@redhat.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> During Hosted Engine setup question about glusterfs volume is >>>>> being asked because you have setup the volumes yourself. If cockpit+gdeploy >>>>> plugin would be have been used then that would have automatically detected >>>>> glusterfs replica 3 volume created during Hosted Engine deployment and this >>>>> question would not have been asked. >>>>> >>>>> During new storage domain creation when glusterfs is selected >>>>> there is a feature called 'use managed gluster volumes' and upon checking >>>>> this all glusterfs volumes managed will be listed and you could choose the >>>>> volume of your choice from the dropdown list. >>>>> >>>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf >>>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one >>>>> of the gluster node goes down engine uses this parameter to provide ha / >>>>> failover. >>>>> >>>>> Hope this helps !! >>>>> >>>>> Thanks >>>>> kasturi >>>>> >>>>> >>>>> >>>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler < >>>>> ckozleriii@gmail.com> wrote: >>>>> >>>>> Hello - >>>>> >>>>> I have successfully created a hyperconverged hosted engine setup >>>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I >>>>> manually configured it all, did not use ovirt node or anything. Built the >>>>> gluster volumes myself >>>>> >>>>> However, I noticed that when setting up the hosted engine and >>>>> even when adding a new storage domain with glusterfs type, it still asks >>>>> for hostname:/volumename >>>>> >>>>> This leads me to believe that if that one node goes down (ex: >>>>> node1:/data), then ovirt engine wont be able to communicate with that >>>>> volume because its trying to reach it on node 1 and thus, go down >>>>> >>>>> I know glusterfs fuse client can connect to all nodes to provide >>>>> failover/ha but how does the engine handle this? >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

WK

1 Sep 1 Sep

9:29 p.m.

On 9/1/2017 8:53 AM, Jim Kusznir wrote:

...

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

I can not speak for the Ovirt implementation of Rep2+Arbiter as I've not used it, but on a standalone libvirt VM host cluster, Arb does exactly what you want. You can lose 'one' of the two replicas and stay online. The Arb maintains quorum. Of course if you lose the second Replica before you have repaired the first failure you have completely lost your data as the Arb doesn't have that. So Rep2+Arb is not as SAFE as Rep3, however it can be faster, especially on less than 10G networks. When any node fails, Gluster will pause for 42 seconds or so (its configurable) before marking the bad node as bad. Then normal activity will resume. On most people's systems, the 'pause' (I think its a read-only event), it noticeable, but not enough to cause issue. One person has reported that his VMs went read-only during that period, but other have not reported that. -wk

FERNANDO FREDIANI

4 Sep 4 Sep

4:51 p.m.

This is a multi-part message in MIME format. --------------9AE8F9DB18CD3709026D88C7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I had the very same impression. It doesn't look like that it works then. So for a fully redundant where you can loose a complete host you must have at least 3 nodes then ? Fernando On 01/09/2017 12:53, Jim Kusznir wrote:

...

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com <mailto:ckozleriii@gmail.com>> wrote:

@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com <mailto:sabose@redhat.com>> wrote:

To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se <mailto:johan@kafit.se>> wrote:

If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volum... <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/>

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

...
Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com <mailto:ckozleriii@gmail.com>> wrote:

...
Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com <mailto:ckozleriii@gmail.com>> wrote:

...
So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com <mailto:knarra@redhat.com>> wrote:

...
yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com <mailto:ckozleriii@gmail.com>> wrote:

...
Hi Kasturi -

Thanks for feedback

> If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com <mailto:knarra@redhat.com>> wrote: > Hi, > > During Hosted Engine setup question about > glusterfs volume is being asked because you have > setup the volumes yourself. If cockpit+gdeploy > plugin would be have been used then that would have > automatically detected glusterfs replica 3 volume > created during Hosted Engine deployment and this > question would not have been asked. > > During new storage domain creation when glusterfs > is selected there is a feature called 'use managed > gluster volumes' and upon checking this all > glusterfs volumes managed will be listed and you > could choose the volume of your choice from the > dropdown list. > > There is a conf file called > /etc/hosted-engine/hosted-engine.conf where there is > a parameter called backup-volfile-servers="h1:h2" > and if one of the gluster node goes down engine uses > this parameter to provide ha / failover. > > Hope this helps !! > > Thanks > kasturi > > > > On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler > <ckozleriii@gmail.com <mailto:ckozleriii@gmail.com>> > wrote: >> Hello - >> >> I have successfully created a hyperconverged hosted >> engine setup consisting of 3 nodes - 2 for VM's and >> the third purely for storage. I manually configured >> it all, did not use ovirt node or anything. Built >> the gluster volumes myself >> >> However, I noticed that when setting up the hosted >> engine and even when adding a new storage domain >> with glusterfs type, it still asks for >> hostname:/volumename >> >> This leads me to believe that if that one node goes >> down (ex: node1:/data), then ovirt engine wont be >> able to communicate with that volume because its >> trying to reach it on node 1 and thus, go down >> >> I know glusterfs fuse client can connect to all >> nodes to provide failover/ha but how does the >> engine handle this? >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> >> http://lists.ovirt.org/mailman/listinfo/users >> <http://lists.ovirt.org/mailman/listinfo/users> >> > >

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--------------9AE8F9DB18CD3709026D88C7 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>I had the very same impression. It doesn't look like that it works then. So for a fully redundant where you can loose a complete host you must have at least 3 nodes then ?<br> </p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 01/09/2017 12:53, Jim Kusznir wrote:<br> </div> <blockquote type="cite" cite="mid:CA+5-SEonfEjQ-8-5z4F2L4tenSDJq0A9=6B3Af=URp4AKYUOzQ@mail.gmail.com"> <div dir="ltr">Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional. <div><br> </div> <div>--Jim</div> </div> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <span dir="ltr"><<a href="mailto:ckozleriii@gmail.com" target="_blank" moz-do-not-send="true">ckozleriii@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr">@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered. <div><br> </div> <div>@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?</div> </div> <div class="HOEnZb"> <div class="h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <span dir="ltr"><<a href="mailto:sabose@redhat.com" target="_blank" moz-do-not-send="true">sabose@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div>To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server<wbr>2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.<br> <br> </div> @Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse<br> <br> </div> If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.<br> </div> <div class="m_688166259021271601HOEnZb"> <div class="m_688166259021271601h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <span dir="ltr"><<a href="mailto:johan@kafit.se" target="_blank" moz-do-not-send="true">johan@kafit.se</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div> <div>If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.</div> <div><br> </div> <div><br> </div> <div>Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway. </div> <div><br> </div> <div>You can read up on it here: <a href="https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volum..." target="_blank" moz-do-not-send="true">https://gluster.readthed<wbr>ocs.io/en/latest/Administrator<wbr>%20Guide/arbiter-volumes-and-q<wbr>uorum/</a></div> <span class="m_688166259021271601m_8337997229390174390HOEnZb"><font color="#888888"> <div><br> </div> <div>/Johan</div> </font></span> <div> <div class="m_688166259021271601m_8337997229390174390h5"> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250-x-evo-paragraph m_688166259021271601m_8337997229390174390m_-8986942953698210250-x-evo-top-signature-spacer"><br> </div> <div>On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:</div> <blockquote type="cite"> <div dir="ltr">Hi all: <div><br> </div> <div>Sorry to hijack the thread, but I was about to start essentially the same thread.</div> <div><br> </div> <div>I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-ser<wbr>vers= set:</div> <div><br> </div> <div> <div>storage=192.168.8.11:/engine</div> <div>mnt_options=backup-volfile-ser<wbr>vers=192.168.8.12:192.168.8.13</div> </div> <div><br> </div> <div>I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.</div> <div><br> </div> <div>What's wrong / what did I miss?</div> <div><br> </div> <div>(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).</div> <div><br> </div> <div>Thanks!</div> <div>--Jim</div> <div><br> </div> </div> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <span dir="ltr"><<a href="mailto:ckozleriii@gmail.com" target="_blank" moz-do-not-send="true">ckozleriii@gmail.com</a>></span> wrote:<br> <blockquote type="cite"> <div dir="ltr">Typo..."Set it up and then failed that **HOST**" <div><br> </div> <div>And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?</div> <div><br> </div> <div>I dont recall this working this way in early 4.0 or 3.6</div> </div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250HOEnZb"> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <span dir="ltr"><<a href="mailto:ckozleriii@gmail.com" target="_blank" moz-do-not-send="true">ckozleriii@gmail.com</a>></span> wrote:<br> <blockquote type="cite"> <div dir="ltr">So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM <div><br> </div> <div>However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail</div> <div><br> </div> <div>This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure</div> <div><br> </div> <div>Am I missing something obvious?</div> </div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885HOEnZb"> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <span dir="ltr"><<a href="mailto:knarra@redhat.com" target="_blank" moz-do-not-send="true">knarra@redhat.com</a>></span> wrote:<br> <blockquote type="cite"> <div dir="ltr">yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes . <div><br> </div> <div>[1] 'mnt_options=backup-volfile-se<wbr>rvers=<h2>:<h3>' </div> </div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349HOEnZb"> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <span dir="ltr"><<a href="mailto:ckozleriii@gmail.com" target="_blank" moz-do-not-send="true">ckozleriii@gmail.com</a>></span> wrote:<br> <blockquote type="cite"> <div dir="ltr">Hi Kasturi - <div><br> </div> <div>Thanks for feedback</div> <span> <div><br> </div> <div>> <span style="font-size:12.8px">If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked</span></div> <div><span style="font-size:12.8px"> </span></div> </span> <div><span style="font-size:12.8px">Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster</span></div> <div><span style="font-size:12.8px"><br> </span></div> <div><span style="font-size:12.8px">But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list</span></div> <div><span style="font-size:12.8px"><br> </span></div> <div><span style="font-size:12.8px">So I need to do something manual to create HA for engine on gluster? Yes?</span></div> <div><span style="font-size:12.8px"><br> </span></div> <div><span style="font-size:12.8px">Thanks so much!</span></div> </div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349m_3449479715428376713HOEnZb"> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349m_3449479715428376713h5"> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <span dir="ltr"><<a href="mailto:knarra@redhat.com" target="_blank" moz-do-not-send="true">knarra@redhat.com</a>></span> wrote:<br> <blockquote type="cite"> <div dir="ltr">Hi, <div><br> </div> <div> During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.</div> <div><br> </div> <div> During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.</div> <div><br> </div> <div> There is a conf file called /etc/hosted-engine/hosted-engi<wbr>ne.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover. </div> <div><br> </div> <div> Hope this helps !!</div> <div><br> </div> <div>Thanks</div> <div>kasturi</div> <div><br> <div><br> </div> </div> </div> <div class="gmail_extra"><br> <div class="gmail_quote"> <div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349m_3449479715428376713m_-614118149965673531h5">On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <span dir="ltr"><<a href="mailto:ckozleriii@gmail.com" target="_blank" moz-do-not-send="true">ckozleriii@gmail.com</a>></span> wrote:<br> </div> </div> <blockquote type="cite"> <div> <div class="m_688166259021271601m_8337997229390174390m_-8986942953698210250m_-6021655538959603885m_5951134109970997349m_3449479715428376713m_-614118149965673531h5"> <div dir="ltr">Hello - <div><br> </div> <div>I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself</div> <div><br> </div> <div>However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename</div> <div><br> </div> <div>This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down</div> <div><br> </div> <div>I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?</div> </div> <br> </div> </div> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> <br> </blockquote> </div> <br> </div> </div> </div> <br> </blockquote> </div> <br> </div> </div> </div> <br> </blockquote> </div> <br> </div> </div> </div> <br> </blockquote> </div> <br> </div> </div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> <pre>______________________________<wbr>_________________ Users mailing list <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users@ovirt.org</a> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a> </pre> </blockquote> </div> </div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </div> </blockquote> </div> <br> </div> </div> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------9AE8F9DB18CD3709026D88C7--

Charles Kozler

13 Sep 13 Sep

1:04 a.m.

Hey All - So I havent tested this yet but what I do know is that I did setup backupvol option when I added the data gluster volume, however, mount options on mount -l do not show it as being used n1:/data on /rhev/data-center/mnt/glusterSD/n1:_data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) I will delete it and re-add it, but I think this might be part of the problem. Perhaps me and Jim have the same issue because oVirt is actually not passing the additional mount options from the web UI to the backend to mount with said parameters? Thoughts? On Mon, Sep 4, 2017 at 10:51 AM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...

I had the very same impression. It doesn't look like that it works then. So for a fully redundant where you can loose a complete host you must have at least 3 nodes then ?

Fernando

On 01/09/2017 12:53, Jim Kusznir wrote:

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Charles Kozler

2:42 a.m.

So also on my engine storage domain. Shouldnt we see the mount options in mount -l output? It appears fault tolerance worked (sort of - see more below) during my test [root@appovirtp01 ~]# grep -i mnt_options /etc/ovirt-hosted-engine/hosted-engine.conf mnt_options=backup-volfile-servers=n2:n3 [root@appovirtp02 ~]# grep -i mnt_options /etc/ovirt-hosted-engine/hosted-engine.conf mnt_options=backup-volfile-servers=n2:n3 [root@appovirtp03 ~]# grep -i mnt_options /etc/ovirt-hosted-engine/hosted-engine.conf mnt_options=backup-volfile-servers=n2:n3 Meanwhile not visible in mount -l output: [root@appovirtp01 ~]# mount -l | grep -i n1:/engine n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@appovirtp02 ~]# mount -l | grep -i n1:/engine n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@appovirtp03 ~]# mount -l | grep -i n1:/engine n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) So since everything is "pointed" at node 1 for engine storage, I decided to hard shut down node 1 while hosted engine VM runs on node 3 The result was that after ~30 seconds the engine crashed likely because of the gluster 42 second timeout. The hosted engine VM came back up (with node 1 still down) after about 5-7 minutes Is this expected for the VM to go down? I thought gluster fuse mounted all bricks in the volume http://lists.gluster.org/pipermail/gluster-users/2015-May/021989.html so I would imagine this to be more seamless? On Tue, Sep 12, 2017 at 7:04 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

...

Hey All -

So I havent tested this yet but what I do know is that I did setup backupvol option when I added the data gluster volume, however, mount options on mount -l do not show it as being used

n1:/data on /rhev/data-center/mnt/glusterSD/n1:_data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions, allow_other,max_read=131072)

I will delete it and re-add it, but I think this might be part of the problem. Perhaps me and Jim have the same issue because oVirt is actually not passing the additional mount options from the web UI to the backend to mount with said parameters?

Thoughts?

On Mon, Sep 4, 2017 at 10:51 AM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...
I had the very same impression. It doesn't look like that it works then. So for a fully redundant where you can loose a complete host you must have at least 3 nodes then ?

Fernando

On 01/09/2017 12:53, Jim Kusznir wrote:

Huh...Ok., how do I convert the arbitrar to full replica, then? I was misinformed when I created this setup. I thought the arbitrator held enough metadata that it could validate or refudiate any one replica (kinda like the parity drive for a RAID-4 array). I was also under the impression that one replica + Arbitrator is enough to keep the array online and functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler <ckozleriii@gmail.com> wrote:

...
@ Jim - you have only two data volumes and lost quorum. Arbitrator only stores metadata, no actual files. So yes, you were running in degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However, the issue I am still facing, is when I go to create a new gluster storage domain (replica 3, hyperconverged) and I tell it "Host to use" and I select that host. If I fail that host, all VMs halt. I do not recall this in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node to a volume and vice versa like you could, for instance, for a singular hyperconverged to ex: export a local disk via NFS and then mount it via ovirt domain. But of course, this has its caveats. To that end, I am using gluster replica 3, when configuring it I say "host to use: " node 1, then in the connection details I give it node1:/data. I fail node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose <sabose@redhat.com> wrote:

...
To the OP question, when you set up a gluster storage domain, you need to specify backup-volfile-servers=<server2>:<server3> where server2 and server3 also have bricks running. When server1 is down, and the volume is mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available. You can check "gluster volume status <volname>" to see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson <johan@kafit.se> wrote:

...
If gluster drops in quorum so that it has less votes than it should it will stop file operations until quorum is back to normal.If i rember it right you need two bricks to write for quorum to be met and that the arbiter only is a vote to avoid split brain.

Basically what you have is a raid5 solution without a spare. And when one disk dies it will run in degraded mode. And some raid systems will stop the raid until you have removed the disk or forced it to run anyway.

You can read up on it here: https://gluster.readthed ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 + arbitrar). I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down. ALL VMs immediately paused, including the engine (all VMs were running on host2:192.168.8.12). I couldn't get any gluster stuff working until host1 (192.168.8.11) was restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks! --Jim

On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler <ckozleriii@gmail.com

...
wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have hosted storage domain and this new one - is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

So I've tested this today and I failed a node. Specifically, I setup a glusterfs domain and selected "host to use: node1". Set it up and then failed that VM

However, this did not work and the datacenter went down. My engine stayed up, however, it seems configuring a domain to pin to a host to use will obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant storage. If a single host has to be tied to its function, this introduces a single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra <knarra@redhat.com> wrote:

yes, right. What you can do is edit the hosted-engine.conf file and there is a parameter as shown below [1] and replace h2 and h3 with your second and third storage servers. Then you will need to restart ovirt-ha-agent and ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=<h2>:<h3>'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hi Kasturi -

Thanks for feedback

...
If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs. I know glusterfs fuse client has the ability to failover between all nodes in cluster, but I am still curious given the fact that I see in ovirt config node1:/engine (being node1 I set it to in hosted-engine --deploy). So my concern was to ensure and find out exactly how engine works when one node goes away and the fuse client moves over to the other node in the gluster cluster

But you did somewhat answer my question, the answer seems to be no (as default) and I will have to use hosted-engine.conf and change the parameter as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra <knarra@redhat.com> wrote:

Hi,

During Hosted Engine setup question about glusterfs volume is being asked because you have setup the volumes yourself. If cockpit+gdeploy plugin would be have been used then that would have automatically detected glusterfs replica 3 volume created during Hosted Engine deployment and this question would not have been asked.

During new storage domain creation when glusterfs is selected there is a feature called 'use managed gluster volumes' and upon checking this all glusterfs volumes managed will be listed and you could choose the volume of your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where there is a parameter called backup-volfile-servers="h1:h2" and if one of the gluster node goes down engine uses this parameter to provide ha / failover.

Hope this helps !!

Thanks kasturi

On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler <ckozleriii@gmail.com> wrote:

Hello -

I have successfully created a hyperconverged hosted engine setup consisting of 3 nodes - 2 for VM's and the third purely for storage. I manually configured it all, did not use ovirt node or anything. Built the gluster volumes myself

However, I noticed that when setting up the hosted engine and even when adding a new storage domain with glusterfs type, it still asks for hostname:/volumename

This leads me to believe that if that one node goes down (ex: node1:/data), then ovirt engine wont be able to communicate with that volume because its trying to reach it on node 1 and thus, go down

I know glusterfs fuse client can connect to all nodes to provide failover/ha but how does the engine handle this?

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3024

Age (days ago)

3038

Last active (days ago)

List overview

Download

31 comments

7 participants

participants (7)

Charles Kozler
FERNANDO FREDIANI
Jim Kusznir
Johan Bernhardsson
Kasturi Narra
Sahina Bose
WK