[ovirt-users] Client-quorum not met - Distributed-Replicate gluster volume

16 Feb 2015

      Hi,

I've set up 4 oVirt nodes with Gluster storage to provide high available virtual machines.
The Gluster volumes are Distributed-Replicate with a replica count of 2.

The extra volume options are configured:

cat /var/lib/glusterd/groups/virt
quick-read=off
read-ahead=off
io-cache=off
stat-prefetch=off
eager-lock=enable
remote-dio=enable
quorum-type=auto
server-quorum-type=server

Volume for the self-hosted engine:
gluster volume info engine

Volume Name: engine
Type: Distributed-Replicate
Volume ID: 9e7a3265-1e91-46e1-a0ba-09c5cc1fc1c1
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/engine/004
Brick2: gluster005:/gluster/engine/005
Brick3: gluster006:/gluster/engine/006
Brick4: gluster007:/gluster/engine/007
Options Reconfigured:
cluster.quorum-type: auto
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.ping-timeout: 10

Volume for the virtual machines:
gluster volume info data

Volume Name: data
Type: Distributed-Replicate
Volume ID: 896db323-7ac4-4023-82a6-a8815a4d06b4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/data/004
Brick2: gluster005:/gluster/data/005
Brick3: gluster006:/gluster/data/006
Brick4: gluster007:/gluster/data/007
Options Reconfigured:
cluster.quorum-type: auto
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
cluster.server-quorum-type: server
network.ping-timeout: 10

Everything seems to be working fine.
However, when I stop the storage network on gluster004 or gluster006, client-quorum is lost.
Client-quorum isn't lost when the storage network is stopped on gluster005 or gluster007.

[2015-02-16 07:05:58.541531] W [MSGID: 108001] [afr-common.c:3635:afr_notify] 0-data-replicate-1: Client-quorum is not met
[2015-02-16 07:05:58.541579] W [MSGID: 108001] [afr-common.c:3635:afr_notify] 0-engine-replicate-1: Client-quorum is not met

And as a result, the volumes are read-only and the VM's are paused.

I've added a "dummy" gluster node for quorum use (no bricks, only running glusterd), but that didn't help.

gluster peer status
Number of Peers: 4

Hostname: gluster005
Uuid: 6c5253b4-b1c6-4d0a-9e6b-1f3efc1e8086
State: Peer in Cluster (Connected)

Hostname: gluster006
Uuid: 4b3d15c4-2de0-4d2e-aa4c-3981e47dadbd
State: Peer in Cluster (Connected)

Hostname: gluster007
Uuid: 165e9ada-addb-496e-abf7-4a4efda4d5d3
State: Peer in Cluster (Connected)

Hostname: glusterdummy
Uuid: 3ef8177b-2394-429b-a58e-ecf0f6ce79a0
State: Peer in Cluster (Connected)

The 4 nodes are running CentOS 7, with the following oVirt / Gluster packages:

glusterfs-3.6.2-1.el7.x86_64
glusterfs-api-3.6.2-1.el7.x86_64
glusterfs-cli-3.6.2-1.el7.x86_64
glusterfs-fuse-3.6.2-1.el7.x86_64
glusterfs-libs-3.6.2-1.el7.x86_64
glusterfs-rdma-3.6.2-1.el7.x86_64
glusterfs-server-3.6.2-1.el7.x86_64
ovirt-engine-sdk-python-3.5.1.0-1.el7.centos.noarch
ovirt-host-deploy-1.3.1-1.el7.noarch
ovirt-hosted-engine-ha-1.2.5-1.el7.centos.noarch
ovirt-hosted-engine-setup-1.2.2-1.el7.centos.noarch
vdsm-gluster-4.16.10-8.gitc937927.el7.noarch

The self-hosted engine is running CentOS 6 with ovirt-engine-3.5.1-1.el6.noarch

Regards,
Wesley

[ovirt-users] Client-quorum not met - Distributed-Replicate gluster volume

Wesley Schaft