
You would need six storage hosts in total to maintain quorum if even one of the hosts goes down. There's no way to decide who's right with replica 2. When you have 2 out of 3 online, majority rules. I have a four node cluster doing replica 4, no distribute. I can take one host down. If two are down, quorum is not met and the volumes go read-only. Same issue applies, only 50% is online. On 2/16/2015 5:20 AM, Wesley Schaft wrote:
Hi,
I've set up 4 oVirt nodes with Gluster storage to provide high available virtual machines. The Gluster volumes are Distributed-Replicate with a replica count of 2.
The extra volume options are configured:
cat /var/lib/glusterd/groups/virt quick-read=off read-ahead=off io-cache=off stat-prefetch=off eager-lock=enable remote-dio=enable quorum-type=auto server-quorum-type=server
Volume for the self-hosted engine: gluster volume info engine
Volume Name: engine Type: Distributed-Replicate Volume ID: 9e7a3265-1e91-46e1-a0ba-09c5cc1fc1c1 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gluster004:/gluster/engine/004 Brick2: gluster005:/gluster/engine/005 Brick3: gluster006:/gluster/engine/006 Brick4: gluster007:/gluster/engine/007 Options Reconfigured: cluster.quorum-type: auto storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off network.ping-timeout: 10
Volume for the virtual machines: gluster volume info data
Volume Name: data Type: Distributed-Replicate Volume ID: 896db323-7ac4-4023-82a6-a8815a4d06b4 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gluster004:/gluster/data/004 Brick2: gluster005:/gluster/data/005 Brick3: gluster006:/gluster/data/006 Brick4: gluster007:/gluster/data/007 Options Reconfigured: cluster.quorum-type: auto performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable storage.owner-uid: 36 storage.owner-gid: 36 cluster.server-quorum-type: server network.ping-timeout: 10
Everything seems to be working fine. However, when I stop the storage network on gluster004 or gluster006, client-quorum is lost. Client-quorum isn't lost when the storage network is stopped on gluster005 or gluster007.
[2015-02-16 07:05:58.541531] W [MSGID: 108001] [afr-common.c:3635:afr_notify] 0-data-replicate-1: Client-quorum is not met [2015-02-16 07:05:58.541579] W [MSGID: 108001] [afr-common.c:3635:afr_notify] 0-engine-replicate-1: Client-quorum is not met
And as a result, the volumes are read-only and the VM's are paused.
I've added a "dummy" gluster node for quorum use (no bricks, only running glusterd), but that didn't help.
gluster peer status Number of Peers: 4
Hostname: gluster005 Uuid: 6c5253b4-b1c6-4d0a-9e6b-1f3efc1e8086 State: Peer in Cluster (Connected)
Hostname: gluster006 Uuid: 4b3d15c4-2de0-4d2e-aa4c-3981e47dadbd State: Peer in Cluster (Connected)
Hostname: gluster007 Uuid: 165e9ada-addb-496e-abf7-4a4efda4d5d3 State: Peer in Cluster (Connected)
Hostname: glusterdummy Uuid: 3ef8177b-2394-429b-a58e-ecf0f6ce79a0 State: Peer in Cluster (Connected)
The 4 nodes are running CentOS 7, with the following oVirt / Gluster packages:
glusterfs-3.6.2-1.el7.x86_64 glusterfs-api-3.6.2-1.el7.x86_64 glusterfs-cli-3.6.2-1.el7.x86_64 glusterfs-fuse-3.6.2-1.el7.x86_64 glusterfs-libs-3.6.2-1.el7.x86_64 glusterfs-rdma-3.6.2-1.el7.x86_64 glusterfs-server-3.6.2-1.el7.x86_64 ovirt-engine-sdk-python-3.5.1.0-1.el7.centos.noarch ovirt-host-deploy-1.3.1-1.el7.noarch ovirt-hosted-engine-ha-1.2.5-1.el7.centos.noarch ovirt-hosted-engine-setup-1.2.2-1.el7.centos.noarch vdsm-gluster-4.16.10-8.gitc937927.el7.noarch
The self-hosted engine is running CentOS 6 with ovirt-engine-3.5.1-1.el6.noarch
Regards, Wesley
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users