Ovirt/Gluster replica 3 distributed-replicated problem

29 Sep 2016

      Hello

maybe this is more glustefs then ovirt related but since OVirt integrates
Gluster management and I'm experiencing the problem in an ovirt cluster,
I'm writing here.

The problem is simple: I have a data domain mappend on a replica 3 arbiter1
Gluster volume with 6 bricks, like this:

Status of volume: data_ssd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick vm01.storage.billy:/gluster/ssd/data/
brick                                       49153     0          Y
19298
Brick vm02.storage.billy:/gluster/ssd/data/
brick                                       49153     0          Y
6146
Brick vm03.storage.billy:/gluster/ssd/data/
arbiter_brick                               49153     0          Y
6552
Brick vm03.storage.billy:/gluster/ssd/data/
brick                                       49154     0          Y
6559
Brick vm04.storage.billy:/gluster/ssd/data/
brick                                       49152     0          Y
6077
Brick vm02.storage.billy:/gluster/ssd/data/
arbiter_brick                               49154     0          Y
6153
Self-heal Daemon on localhost               N/A       N/A        Y
30746
Self-heal Daemon on vm01.storage.billy      N/A       N/A        Y
196058
Self-heal Daemon on vm03.storage.billy      N/A       N/A        Y
23205
Self-heal Daemon on vm04.storage.billy      N/A       N/A        Y
8246

Now, I've put in maintenance the vm04 host, from ovirt, ticking the "Stop
gluster" checkbox, and Ovirt didn't complain about anything. But when I
tried to run a new VM it complained about "storage I/O problem", while the
storage data status was always UP.

Looking in the gluster logs I can see this:

[2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-09-29 11:02:28.124151] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing READ on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.126580] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable
subvolume -1 found with event generation 6 for gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
[2016-09-29 11:02:28.127374] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing FGETXATTR on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.128130] W [MSGID: 108027]
[afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no read
subvols for (null)
[2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
0-glusterfs-fuse: 8201: READ => -1
gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210 (Input/output
error)
[2016-09-29 11:02:28.130824] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing FSTAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 8202: FSTAT()
/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
=> -1 (Input/output error)
The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
0-data_ssd-replicate-1: Unreadable subvolume -1 found with event generation
6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)"
repeated 11 times between [2016-09-29 11:02:28.126580] and [2016-09-29
11:02:28.517744]
[2016-09-29 11:02:28.518607] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing STAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]

Now, how is it possible to have a split brain if I stopped just ONE server
which had just ONE of six bricks, and it was cleanly shut down with
maintenance mode from ovirt?

I created the volume originally this way:
# gluster volume create data_ssd replica 3 arbiter 1
vm01.storage.billy:/gluster/ssd/data/brick
vm02.storage.billy:/gluster/ssd/data/brick
vm03.storage.billy:/gluster/ssd/data/arbiter_brick
vm03.storage.billy:/gluster/ssd/data/brick
vm04.storage.billy:/gluster/ssd/data/brick
vm02.storage.billy:/gluster/ssd/data/arbiter_brick
# gluster volume set data_ssd group virt
# gluster volume set data_ssd storage.owner-uid 36 && gluster volume set
data_ssd storage.owner-gid 36
# gluster volume start data_ssd

-- 
Davide Ferrari
Senior Systems Engineer

Davide Ferrari

Sahina Bose

Ravishankar N

Davide Ferrari

Ravishankar N

tags

participants (3)