Hi,
I've set up 4 oVirt nodes with Gluster storage to provide high available virtual
machines.
The Gluster volumes are Distributed-Replicate with a replica count of 2.
The extra volume options are configured:
cat /var/lib/glusterd/groups/virt
quick-read=off
read-ahead=off
io-cache=off
stat-prefetch=off
eager-lock=enable
remote-dio=enable
quorum-type=auto
server-quorum-type=server
Volume for the self-hosted engine:
gluster volume info engine
Volume Name: engine
Type: Distributed-Replicate
Volume ID: 9e7a3265-1e91-46e1-a0ba-09c5cc1fc1c1
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/engine/004
Brick2: gluster005:/gluster/engine/005
Brick3: gluster006:/gluster/engine/006
Brick4: gluster007:/gluster/engine/007
Options Reconfigured:
cluster.quorum-type: auto
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.ping-timeout: 10
Volume for the virtual machines:
gluster volume info data
Volume Name: data
Type: Distributed-Replicate
Volume ID: 896db323-7ac4-4023-82a6-a8815a4d06b4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster004:/gluster/data/004
Brick2: gluster005:/gluster/data/005
Brick3: gluster006:/gluster/data/006
Brick4: gluster007:/gluster/data/007
Options Reconfigured:
cluster.quorum-type: auto
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
cluster.server-quorum-type: server
network.ping-timeout: 10
Everything seems to be working fine.
However, when I stop the storage network on gluster004 or gluster006, client-quorum is
lost.
Client-quorum isn't lost when the storage network is stopped on gluster005 or
gluster007.
[2015-02-16 07:05:58.541531] W [MSGID: 108001] [afr-common.c:3635:afr_notify]
0-data-replicate-1: Client-quorum is not met
[2015-02-16 07:05:58.541579] W [MSGID: 108001] [afr-common.c:3635:afr_notify]
0-engine-replicate-1: Client-quorum is not met
And as a result, the volumes are read-only and the VM's are paused.
I've added a "dummy" gluster node for quorum use (no bricks, only running
glusterd), but that didn't help.
gluster peer status
Number of Peers: 4
Hostname: gluster005
Uuid: 6c5253b4-b1c6-4d0a-9e6b-1f3efc1e8086
State: Peer in Cluster (Connected)
Hostname: gluster006
Uuid: 4b3d15c4-2de0-4d2e-aa4c-3981e47dadbd
State: Peer in Cluster (Connected)
Hostname: gluster007
Uuid: 165e9ada-addb-496e-abf7-4a4efda4d5d3
State: Peer in Cluster (Connected)
Hostname: glusterdummy
Uuid: 3ef8177b-2394-429b-a58e-ecf0f6ce79a0
State: Peer in Cluster (Connected)
The 4 nodes are running CentOS 7, with the following oVirt / Gluster packages:
glusterfs-3.6.2-1.el7.x86_64
glusterfs-api-3.6.2-1.el7.x86_64
glusterfs-cli-3.6.2-1.el7.x86_64
glusterfs-fuse-3.6.2-1.el7.x86_64
glusterfs-libs-3.6.2-1.el7.x86_64
glusterfs-rdma-3.6.2-1.el7.x86_64
glusterfs-server-3.6.2-1.el7.x86_64
ovirt-engine-sdk-python-3.5.1.0-1.el7.centos.noarch
ovirt-host-deploy-1.3.1-1.el7.noarch
ovirt-hosted-engine-ha-1.2.5-1.el7.centos.noarch
ovirt-hosted-engine-setup-1.2.2-1.el7.centos.noarch
vdsm-gluster-4.16.10-8.gitc937927.el7.noarch
The self-hosted engine is running CentOS 6 with ovirt-engine-3.5.1-1.el6.noarch
Regards,
Wesley