Re: [Users] GlusterFS Distributed Replicate

Hi, in a 2 node cluster you can set the path to localhost:volume. If one host goes down and SPM role switches to the remaining running host, your master domain is still accessible and so your VMs stay up and running. Regards, Alex -----Original message-----
From:gregoire.leroy@retenodus.net <gregoire.leroy@retenodus.net> Sent: Friday 20th December 2013 15:10 To: Andrew Lau <andrew@andrewklau.com> Cc: users <users@ovirt.org> Subject: Re: [Users] GlusterFS Distributed Replicate
Hi,
There are some things I don't understand. First of all, why do we need keepalived ? I thought that it would be transparent at this layer and that glusterfs would manage all the replication thing by itself. Is that because I've POSIXFS instead of GlusterFS or is it totally unrelated ?
Secondly, about the split-brain, when you says that I can read but not write, does that mean I can't write data on the VM storage space or can't I create VM ? If I can't write data, what would be the workaround ? Am I force to have 3 (or 4, I guess, as I want to get replication) nodes ?
To conclude : can I get real HA (except engine) with Ovirt / Glusterfs with 2 nodes ?
Thank you very much, Regards, Grégoire Leroy
Le 2013-12-19 23:03, Andrew Lau a écrit :
Hi,
What I learned in the way glusterfs works is you specify the host only to grab the initial volume information, then it'll go directly to the other hosts to connect to the datastore - this avoids the bottleneck issue that NFS has.
Knowing this, the work around I used was to setup keepalived on the gluster hosts (make sure you set it up on an interface other than your ovirtmgmt or you'll clash with the live migration components). So now if one of my hosts drop from the cluster, the storage access is not lost. I haven't fully tested the whole infrastructure yet but my only fear is they may drop into "PAUSE" mode during the keepalived transition period.
Also - you may need to change your glusterfs ports so they don't interfere with vdsm. My post here was a little outdated but it still has my findings on keepalived etc. http://www.andrewklau.com/returning-to-glusterized-ovirt-3-3/ [2]
The other thing to note, is you've only got two gluster hosts. I believe by default now ovirt sets the quorum setting which enforces that there must be atleast 2 nodes alive in your configuration. This means when there is only 1 gluster server up, you'll be able to read but not write this is to avoid split-brain.
Thanks, Andrew
On Thu, Dec 19, 2013 at 5:12 AM, <gregoire.leroy@retenodus.net> wrote:
Hello,
As I said in a previous email, I have this configuration with Ovirt 3.3 : 1 Ovirt Engine 2 Hosts Centos 6.5
I successfully setup GlusterFS. I created a distributed replicate volume with 2 bricks : host1:/gluster and host2:/gluster.
Then, I created a storage storage_gluster POSIXFS with the option glusterfs and I gave the path "host1:/gluster".
First, I'm rather surprised I have to specify an host for the storage as I wish to have a distribute replicated storage. I expected to specify both hosts.
Then I create a VM on this storage. The expected behaviour if I shutdown host1 should be that my VM keeps running on the second brick. Yet, not only I lose my VM but host2 is in a non operationnal status because one of its data storage is not reachable.
Did I miss something in the configuration ? How could I get the wanted behaviour ?
Thanks a lot, Regards, Grégoire Leroy _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users [1]
Links: ------ [1] http://lists.ovirt.org/mailman/listinfo/users [2] http://www.andrewklau.com/returning-to-glusterized-ovirt-3-3/
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi,
keepalived is only for grabbing the gluster volume info (eg. the servers which host the bricks) > in which from what I've noticed your clients will then connect to the gluster servers directly (not using keepalived anymore).
Can't keepalived be replace by naming the hostname by localhost, as what Alex says ?
If you disable quorum then you won't have the issue of "read only" when you lose a host, but you > won't have protection from split brain (if your two hosts lose network connectivity). VMs will keep writing to the hosts, as you have the gluster server and client on the same host this is inevitable.
Ok, I get the problem caused by disabling the quorum. So, what if while I've two hosts the lack of HA is not so dramatic but will be necessary when I'll have more hosts ? (3 or 4). Here is the scenario I would like to have : 1) I have two hosts : HOSTA and HOSTB. They have glusterfs bricks configured as Distribute Replicated and data is replicated. => For now, I'm totally ok with the fact that if a node fails, then VM on this hosts are stopped and unreachable. However, I would like that if a node fails, the DC keeps running so that VM on the other hosts are not stopped and a human intervention make possible to start the VM on the other host. Would it be possible without disabling the quorum ? 2) In few months, I'll add two other hosts to the glusterfs volum. Their bricks will be replicated. => At that time, I would like to be able to make evolve my architecture (without shut my VM and export/import them on a new cluster) so that if a node fails, VM on this host start to run on the other host of the same brick (without manual intervention). Is it possible ? Le 2013-12-20 16:22, a.ludas@gmail.com a écrit :
Hi,
in a 2 node cluster you can set the path to localhost:volume. If one host goes down and SPM role switches to the remaining running host, your master domain is still accessible and so your VMs stay up and running.
Regards, Alex
I tried it but still, the storage/cluster were shutdown, probably because of the quorum. Thank you very much, Regards, Grégoire Leroy

But where is configured this thing of quorum? Ovirt engine GUI or gluster wide?

On Sat, Dec 21, 2013 at 4:43 AM, <gregoire.leroy@retenodus.net> wrote:
Hi,
keepalived is only for grabbing the gluster volume info (eg. the servers
which host the bricks) > in which from what I've noticed your clients will then connect to the gluster servers directly (not using keepalived anymore).
Can't keepalived be replace by naming the hostname by localhost, as what Alex says ?
Yes it could sorry I forgot about that method, I used to do it that way but we had plans on possibly implementing nodes without glusterfs bricks possibly in the future so we moved to keepalived.
If you disable quorum then you won't have the issue of "read only" when
you lose a host, but you > won't have protection from split brain (if your two hosts lose network connectivity). VMs will keep writing to the hosts, as you have the gluster server and client on the same host this is inevitable.
Ok, I get the problem caused by disabling the quorum. So, what if while I've two hosts the lack of HA is not so dramatic but will be necessary when I'll have more hosts ? (3 or 4). Here is the scenario I would like to have :
Quorum generally requires 3 hosts, I believe the default configuration when you press "Optimize for virt store" will require a minimum of 2 bricks connected before writing is allowed.
1) I have two hosts : HOSTA and HOSTB. They have glusterfs bricks configured as Distribute Replicated and data is replicated. => For now, I'm totally ok with the fact that if a node fails, then VM on this hosts are stopped and unreachable. However, I would like that if a node fails, the DC keeps running so that VM on the other hosts are not stopped and a human intervention make possible to start the VM on the other host. Would it be possible without disabling the quorum ?
For the 2 host scenario, disable quorum will allow you to do this.
2) In few months, I'll add two other hosts to the glusterfs volum. Their bricks will be replicated. => At that time, I would like to be able to make evolve my architecture (without shut my VM and export/import them on a new cluster) so that if a node fails, VM on this host start to run on the other host of the same brick (without manual intervention).
Later on you just enable quorum, it's only a setting in the gluster volume. gluster volume set DATA cluster.quorum-type auto
Is it possible ?
Le 2013-12-20 16:22, a.ludas@gmail.com a écrit :
Hi,
in a 2 node cluster you can set the path to localhost:volume. If one host goes down and SPM role switches to the remaining running host, your master domain is still accessible and so your VMs stay up and running.
Regards, Alex
I tried it but still, the storage/cluster were shutdown, probably because of the quorum.
Thank you very much, Regards, Grégoire Leroy

Hello,
If you disable quorum then you won't have the issue of "read only" when
you lose a host, but you > won't have protection from split brain (if your two hosts lose network connectivity). VMs will keep writing to the hosts, as you have the gluster server and client on the same host this is inevitable.
Ok, I get the problem caused by disabling the quorum. So, what if while I've two hosts the lack of HA is not so dramatic but will be necessary when
I'll have more hosts ? (3 or 4). Here is the scenario I would like to have : Quorum generally requires 3 hosts, I believe the default configuration when you press "Optimize for virt store" will require a minimum of 2 bricks connected before writing is allowed.
Ok, if I understand, the quorum thing is very specific to gluster (bricks) and not to ovirt (hosts). So, maybe what I need is just another gluster server with very few space on a dummy VM (not hosted by a ovirt host but outside of my cluster) to add as a brick. It wouldn't be use at all, just to check connectivity Then, if a host lose connectivity, it can't join neither the real gluster server nor the "dummy" one and so, doesn't run VM. The other one, which is able to join the dummy one becomes the SPM (the dummy wouldn't have vdsm server, so it couldn't become) and runs VM. Maybe by this way could I have HA with two hosts, right ? Is there a reason it shouldn't work ?
1) I have two hosts : HOSTA and HOSTB. They have glusterfs bricks configured as Distribute Replicated and data is replicated. => For now, I'm totally ok with the fact that if a node fails, then VM on this hosts are stopped and unreachable. However, I would like that if a node fails, the DC keeps running so that VM on the other hosts are not stopped and a human intervention make possible to start the VM on the other host. Would it be possible without disabling the quorum ?
For the 2 host scenario, disable quorum will allow you to do this.
Unfortunately, not for all cases. If the network interface used by glusterfs to reach each other falls, I get the following behaviour : 1) HOSTB, on which the VM run, detect that HOSTA's brick is unreachable. So it keeps running. Fine. 2) HOSTA detects that HOSTB's brick is unreachable. So it starts to run the VM => Split brain. If the network interfaces not used for management of the cluster but for VM are OK, I'm going to have a split network. 3) Conclusion, the fall of HOSTA has impact on the VM of HOSTB Does this scenario seem correct to you, or have I miss something ? Maybe power management could avoid this issue.
2) In few months, I'll add two other hosts to the glusterfs volum. Their bricks will be replicated. => At that time, I would like to be able to make evolve my architecture (without shut my VM and export/import them on a new cluster) so that if a node fails, VM on this host start to run on the other host of the same brick (without manual intervention).
Later on you just enable quorum, it's only a setting in the gluster volume. gluster volume set DATA cluster.quorum-type auto
Thanks you, Regards, Grégoire Leroy

On Sat, Dec 21, 2013 at 11:56 PM, Grégoire Leroy < gregoire.leroy@retenodus.net> wrote:
Hello,
If you disable quorum then you won't have the issue of "read only" when
you lose a host, but you > won't have protection from split brain (if your two hosts lose network connectivity). VMs will keep writing to the hosts, as you have the gluster server and client on the same host this is inevitable.
Ok, I get the problem caused by disabling the quorum. So, what if while I've two hosts the lack of HA is not so dramatic but will be necessary when
I'll have more hosts ? (3 or 4). Here is the scenario I would like to have : Quorum generally requires 3 hosts, I believe the default configuration when you press "Optimize for virt store" will require a minimum of 2 bricks connected before writing is allowed.
Ok, if I understand, the quorum thing is very specific to gluster (bricks) and not to ovirt (hosts). So, maybe what I need is just another gluster server with very few space on a dummy VM (not hosted by a ovirt host but outside of my cluster) to add as a brick. It wouldn't be use at all, just to check connectivity
Then, if a host lose connectivity, it can't join neither the real gluster server nor the "dummy" one and so, doesn't run VM. The other one, which is able to join the dummy one becomes the SPM (the dummy wouldn't have vdsm server, so it couldn't become) and runs VM.
Maybe by this way could I have HA with two hosts, right ? Is there a reason it shouldn't work ?
I guess this would work as quroum is based on how many peers are in the cluster. Actually quite a good idea and I'd love to hear from you on how it goes. I'd be interested to see how gluster will work with this though, I assume it has to be apart of the volume. If you're doing distribute-replicate I think this "dummy" VM will need to hold hold the full replicated data? cluster.server-quorum-ratio - this is % > 50. If the volume is not set with any ratio the equation for quorum is: active_peer_count > 50% of all peers in cluster. But when the percentage (P) is specified the equation for quorum is active_peer_count >= P % of all the befriended peers in cluster.
1) I have two hosts : HOSTA and HOSTB. They have glusterfs bricks
configured as Distribute Replicated and data is replicated. => For now, I'm totally ok with the fact that if a node fails, then VM on this hosts are stopped and unreachable. However, I would like that if a node fails, the DC keeps running so that VM on the other hosts are not stopped and a human intervention make possible to start the VM on the other host. Would it be possible without disabling the quorum ?
For the 2 host scenario, disable quorum will allow you to do this.
Unfortunately, not for all cases. If the network interface used by glusterfs to reach each other falls, I get the following behaviour :
1) HOSTB, on which the VM run, detect that HOSTA's brick is unreachable. So it keeps running. Fine. 2) HOSTA detects that HOSTB's brick is unreachable. So it starts to run the VM => Split brain. If the network interfaces not used for management of the cluster but for VM are OK, I'm going to have a split network. 3) Conclusion, the fall of HOSTA has impact on the VM of HOSTB
Does this scenario seem correct to you, or have I miss something ? Maybe power management could avoid this issue.
Yes you'll need the power management which they call "fencing", so it will ensure that the host which has dropped from the cluster is sent for a reboot thus making any VMs running on it be shut off immediately and ready to be brought up on another ovirt host.
2) In few months, I'll add two other hosts to the glusterfs volum. Their bricks will be replicated. => At that time, I would like to be able to make evolve my architecture (without shut my VM and export/import them on a new cluster) so that if a node fails, VM on this host start to run on the other host of the same brick (without manual intervention).
Later on you just enable quorum, it's only a setting in the gluster volume. gluster volume set DATA cluster.quorum-type auto
Thanks you, Regards, Grégoire Leroy

Hi,
For the 2 host scenario, disable quorum will allow you to do this.
I just disabled quorum and disabled the auto migration for my cluster. Here is what I get : To remind, the path of my storage is localhost:/path and I selected "HOSTA" as host. Volume options are : cluster.server-quorum-type none cluster.quorum-type fixed cluster.quorum-count 1 If one host is shutdown, le storage and cluster become shutdown. => Do you have any idea about why I get this behaviour ? Is there a way to avoid it ? VM on the UP host are OK; which is the expected behaviour. I can migrate VM from one host to another when they're both UP. However, when a host is down, VM on this host don't become down but in an unknown state instead. Is it a normal behaviour ? If so, how am I supposed to make them manually boot on the other host ? Thank you, Regards, Grégoire Leroy

On Mon, Dec 23, 2013 at 11:54 PM, <gregoire.leroy@retenodus.net> wrote:
Hi,
For the 2 host scenario, disable quorum will allow you to do this.
I just disabled quorum and disabled the auto migration for my cluster. Here is what I get :
Try shutdown the host which isn't your SPM.
To remind, the path of my storage is localhost:/path and I selected "HOSTA" as host. Volume options are : cluster.server-quorum-type none cluster.quorum-type fixed cluster.quorum-count 1
If one host is shutdown, le storage and cluster become shutdown. => Do you have any idea about why I get this behaviour ? Is there a way to avoid it ? VM on the UP host are OK; which is the expected behaviour. I can migrate VM from one host to another when they're both UP.
However, when a host is down, VM on this host don't become down but in an unknown state instead. Is it a normal behaviour ? If so, how am I supposed to make them manually boot on the other host ?
You could try right click on the shutdown host and press "Confirm host has rebooted".
Thank you, Regards, Grégoire Leroy

On 12/23/2013 06:24 PM, gregoire.leroy@retenodus.net wrote:
Hi,
For the 2 host scenario, disable quorum will allow you to do this.
I just disabled quorum and disabled the auto migration for my cluster. Here is what I get :
To remind, the path of my storage is localhost:/path and I selected "HOSTA" as host. Volume options are : cluster.server-quorum-type none cluster.quorum-type fixed cluster.quorum-count 1
With this configuration, client side quorum is enabled and allows operations to continue as long as one brick is available. Is this the intended behaviour?
If one host is shutdown, le storage and cluster become shutdown. => Do you have any idea about why I get this behaviour ?
Are bricks seen online in gluster volume status <volname>? Thanks, Vijay
Is there a way to avoid it ? VM on the UP host are OK; which is the expected behaviour. I can migrate VM from one host to another when they're both UP.
However, when a host is down, VM on this host don't become down but in an unknown state instead. Is it a normal behaviour ? If so, how am I supposed to make them manually boot on the other host ?
Thank you, Regards, Grégoire Leroy _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello, I finally took the following configuration : - Migration policy is "don't migrate" - cluster.server-quorum-type is none - cluster.quorum-type is none When a host is down, a manual migration allows me to use the other. Later, I'll add another host so that I get a real HA. Thank you, Regards, Grégoire Leroy

Hi, I finally got around to running some tests on our environment, are you getting the same case where as when one host drops the VM ends up in a paused state and can't be migrated? With your case, you should be able to obtain full HA, quorum is just a protection for split brain. If you enable the migration policy to migrate all VMs, in theory the VMs from the crashed node I assume should migrate to the other node when it sees the node is offline. I was wondering if this may be because of the VM reads directly from the gluster storage server and there doesn't seem to be any fail over? Would a NFS solution with keepalived across the two servers fix this issue as the connection would be isolated to IP address rather than the single gluster node? I'm not too familiar with completely how the libgfapi protocol works. Could anyone else chime in? Cheers, Andrew. On Fri, Jan 3, 2014 at 2:05 AM, <gregoire.leroy@retenodus.net> wrote:
Hello,
I finally took the following configuration :
- Migration policy is "don't migrate" - cluster.server-quorum-type is none - cluster.quorum-type is none
When a host is down, a manual migration allows me to use the other. Later, I'll add another host so that I get a real HA.
Thank you, Regards, Grégoire Leroy

Hi, Le 2014-01-08 10:24, Andrew Lau a écrit :
Hi,
I finally got around to running some tests on our environment, are you getting the same case where as when one host drops the VM ends up in a paused state and can't be migrated?
Yes, in the hosts panel, I have to manually confirm it was really reboot. Elese, the VM is in paused state. Regards, Grégorie Leroy

On Thu, Jan 9, 2014 at 12:51 AM, <gregoire.leroy@retenodus.net> wrote:
Hi,
Le 2014-01-08 10:24, Andrew Lau a écrit :
Hi,
I finally got around to running some tests on our environment, are you getting the same case where as when one host drops the VM ends up in a paused state and can't be migrated?
Yes, in the hosts panel, I have to manually confirm it was really reboot. Elese, the VM is in paused state.
Is that the only workaround you managed to find? That sort of means it won't be possible get the full automated HA no matter how many nodes you add..
Is this because of fencing? I'm following a similar situation so I'm still digging on finding the best approach.
participants (6)
-
a.ludas@gmail.com
-
Andrew Lau
-
Gianluca Cecchi
-
gregoire.leroy@retenodus.net
-
Grégoire Leroy
-
Vijay Bellur