You would need six storage hosts in total to maintain quorum if even one
of the hosts goes down. There's no way to decide who's right with
replica 2. When you have 2 out of 3 online, majority rules.
I have a four node cluster doing replica 4, no distribute. I can take
one host down. If two are down, quorum is not met and the volumes go
read-only. Same issue applies, only 50% is online.
On 2/16/2015 5:20 AM, Wesley Schaft wrote:
Hi,
I've set up 4 oVirt nodes with Gluster storage to provide high available virtual
machines.
The Gluster volumes are Distributed-Replicate with a replica count of 2.
The extra volume options are configured:
cat /var/lib/glusterd/groups/virt
quick-read=off
read-ahead=off
io-cache=off
stat-prefetch=off
eager-lock=enable
remote-dio=enable
quorum-type=auto
server-quorum-type=server
Volume for the self-hosted engine:
gluster volume info engine
Volume Name: engine
Type: Distributed-Replicate
Volume ID: 9e7a3265-1e91-46e1-a0ba-09c5cc1fc1c1
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/engine/004
Brick2: gluster005:/gluster/engine/005
Brick3: gluster006:/gluster/engine/006
Brick4: gluster007:/gluster/engine/007
Options Reconfigured:
cluster.quorum-type: auto
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.ping-timeout: 10
Volume for the virtual machines:
gluster volume info data
Volume Name: data
Type: Distributed-Replicate
Volume ID: 896db323-7ac4-4023-82a6-a8815a4d06b4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/data/004
Brick2: gluster005:/gluster/data/005
Brick3: gluster006:/gluster/data/006
Brick4: gluster007:/gluster/data/007
Options Reconfigured:
cluster.quorum-type: auto
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
cluster.server-quorum-type: server
network.ping-timeout: 10
Everything seems to be working fine.
However, when I stop the storage network on gluster004 or gluster006, client-quorum is
lost.
Client-quorum isn't lost when the storage network is stopped on gluster005 or
gluster007.
[2015-02-16 07:05:58.541531] W [MSGID: 108001] [afr-common.c:3635:afr_notify]
0-data-replicate-1: Client-quorum is not met
[2015-02-16 07:05:58.541579] W [MSGID: 108001] [afr-common.c:3635:afr_notify]
0-engine-replicate-1: Client-quorum is not met
And as a result, the volumes are read-only and the VM's are paused.
I've added a "dummy" gluster node for quorum use (no bricks, only running
glusterd), but that didn't help.
gluster peer status
Number of Peers: 4
Hostname: gluster005
Uuid: 6c5253b4-b1c6-4d0a-9e6b-1f3efc1e8086
State: Peer in Cluster (Connected)
Hostname: gluster006
Uuid: 4b3d15c4-2de0-4d2e-aa4c-3981e47dadbd
State: Peer in Cluster (Connected)
Hostname: gluster007
Uuid: 165e9ada-addb-496e-abf7-4a4efda4d5d3
State: Peer in Cluster (Connected)
Hostname: glusterdummy
Uuid: 3ef8177b-2394-429b-a58e-ecf0f6ce79a0
State: Peer in Cluster (Connected)
The 4 nodes are running CentOS 7, with the following oVirt / Gluster packages:
glusterfs-3.6.2-1.el7.x86_64
glusterfs-api-3.6.2-1.el7.x86_64
glusterfs-cli-3.6.2-1.el7.x86_64
glusterfs-fuse-3.6.2-1.el7.x86_64
glusterfs-libs-3.6.2-1.el7.x86_64
glusterfs-rdma-3.6.2-1.el7.x86_64
glusterfs-server-3.6.2-1.el7.x86_64
ovirt-engine-sdk-python-3.5.1.0-1.el7.centos.noarch
ovirt-host-deploy-1.3.1-1.el7.noarch
ovirt-hosted-engine-ha-1.2.5-1.el7.centos.noarch
ovirt-hosted-engine-setup-1.2.2-1.el7.centos.noarch
vdsm-gluster-4.16.10-8.gitc937927.el7.noarch
The self-hosted engine is running CentOS 6 with ovirt-engine-3.5.1-1.el6.noarch
Regards,
Wesley
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users