Setting up GeoReplication

13 May 2017

      Hi All:

I've been trying to set up georeplication for a while now, but can't seem
to make it work.  I've found documentation on the web (mostly
https://gluster.readthedocs.io/en/refactor/Administrator%20Guide/Geo%20Repli...),
and I found
http://blog.gluster.org/2015/09/introducing-georepsetup-gluster-geo-replicat...

Unfortunately, it seems that some critical steps are missing from both, and
I can't figure out for sure what they are.

My environment:

Production: replica 2 + arbitrator running on my 3-node oVirt cluster, 3
volumes (engine, data, iso).

New geo-replication: Raspberry Pi3 with USB hard drive shoved in some other
data closet off-site.

I've installed rasbian-lite, and after much fighting, got
glusterfs-*-3.8.11 installed.  I've created my mountpoint (USB hard drive,
much larger than my gluster volumes), and then ran the command.  I get this
far:

[    OK] georep.nwfiber.com is Reachable(Port 22)
[    OK] SSH Connection established root@georep.nwfiber.com
[    OK] Master Volume and Slave Volume are compatible (Version: 3.8.11)
[NOT OK] Unable to Mount Gluster Volume georep.nwfiber.com:engine-rep

Trying it with the steps in the gluster docs also has the same problem.  No
long files are generated on the slave.  Log files on the master include:

[root@ovirt1 geo-replication]# more georepsetup.mount.log
[2017-05-13 17:26:27.318599] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-glusterfs: Started running glusterfs version 3.8.11 (args:
 glusterfs --xlator-option="*dht.lookup-unhashed=off" --volfile-server
localhost --volfile-id engine -l /var/log/glusterfs/geo-repli
cation/georepsetup.mount.log --client-pid=-1 /tmp/georepsetup_wZtfkN)
[2017-05-13 17:26:27.341170] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2017-05-13 17:26:27.341260] E [socket.c:2309:socket_connect_finish]
0-glusterfs: connection to ::1:24007 failed (Connection refused
)
[2017-05-13 17:26:27.341846] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host: local
host (Transport endpoint is not connected)
[2017-05-13 17:26:31.335849] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2017-05-13 17:26:31.337545] I [MSGID: 114020] [client.c:2356:notify]
0-engine-client-0: parent translators are ready, attempting co
nnect on transport
[2017-05-13 17:26:31.344485] I [MSGID: 114020] [client.c:2356:notify]
0-engine-client-1: parent translators are ready, attempting co
nnect on transport
[2017-05-13 17:26:31.345146] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-engine-client-0: changing port to 49157 (from 0)
[2017-05-13 17:26:31.350868] I [MSGID: 114020] [client.c:2356:notify]
0-engine-client-2: parent translators are ready, attempting co
nnect on transport
[2017-05-13 17:26:31.355946] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-engine-client-0: Using P
rogram GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-13 17:26:31.356280] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-engine-client-1: changing port to 49157 (from 0)
Final graph:
+------------------------------------------------------------------------------+
  1: volume engine-client-0
  2:     type protocol/client
  3:     option clnt-lk-version 1
  4:     option volfile-checksum 0
  5:     option volfile-key engine
  6:     option client-version 3.8.11
  7:     option process-uuid
ovirt1.nwfiber.com-25660-2017/05/13-17:26:27:311929-engine-client-0-0-0
  8:     option fops-version 1298437
  9:     option ping-timeout 30
 10:     option remote-host ovirt1.nwfiber.com
 11:     option remote-subvolume /gluster/brick1/engine
 12:     option transport-type socket
 13:     option username 028984cf-0399-42e6-b04b-bb9b1685c536
 14:     option password eae737cc-9659-405f-865e-9a7ef97a3307
 15:     option filter-O_DIRECT off
 16:     option send-gids true
 17: end-volume
 18:
 19: volume engine-client-1
 20:     type protocol/client
 21:     option ping-timeout 30
 22:     option remote-host ovirt2.nwfiber.com
 23:     option remote-subvolume /gluster/brick1/engine
 24:     option transport-type socket
 25:     option username 028984cf-0399-42e6-b04b-bb9b1685c536
 26:     option password eae737cc-9659-405f-865e-9a7ef97a3307
 27:     option filter-O_DIRECT off
 28:     option send-gids true
 29: end-volume
 30:
 31: volume engine-client-2
 32:     type protocol/client
 33:     option ping-timeout 30
 34:     option remote-host ovirt3.nwfiber.com
 35:     option remote-subvolume /gluster/brick1/engine
 36:     option transport-type socket
 37:     option username 028984cf-0399-42e6-b04b-bb9b1685c536
 38:     option password eae737cc-9659-405f-865e-9a7ef97a3307
 39:     option filter-O_DIRECT off
 40:     option send-gids true
 41: end-volume
 42:
 43: volume engine-replicate-0
 44:     type cluster/replicate
 45:     option arbiter-count 1
 46:     option data-self-heal-algorithm full
 47:     option eager-lock enable
 48:     option quorum-type auto
 49:     option shd-max-threads 6
 50:     option shd-wait-qlength 10000
 51:     option locking-scheme granular
 52:     subvolumes engine-client-0 engine-client-1 engine-client-2
 53: end-volume
 54:
 55: volume engine-dht
 56:     type cluster/distribute
 57:     option lock-migration off
 58:     subvolumes engine-replicate-0
 59: end-volume
 60:
 61: volume engine-shard
 62:     type features/shard
 63:     option shard-block-size 512MB
 64:     subvolumes engine-dht
 65: end-volume
 66:
 67: volume engine-write-behind
 68:     type performance/write-behind
 69:     option strict-O_DIRECT on
 70:     subvolumes engine-shard
 71: end-volume
 72:
 73: volume engine-readdir-ahead
 74:     type performance/readdir-ahead
 75:     subvolumes engine-write-behind
 76: end-volume
 77:
 78: volume engine-open-behind
 79:     type performance/open-behind
 80:     subvolumes engine-readdir-ahead
 81: end-volume
 82:
 83: volume engine
 84:     type debug/io-stats
 85:     option log-level INFO
 86:     option latency-measurement off
 87:     option count-fop-hits off
 88:     subvolumes engine-open-behind
 89: end-volume
 90:
 91: volume meta-autoload
 92:     type meta
 93:     subvolumes engine
 94: end-volume
 95:
+------------------------------------------------------------------------------+
[2017-05-13 17:26:31.360579] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-engine-client-0: Connected
to engine
-client-0, attached to remote volume '/gluster/brick1/engine'.
[2017-05-13 17:26:31.360599] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-engine-client-0: Server
and Client l
k-version numbers are not same, reopening the fds
[2017-05-13 17:26:31.360707] I [MSGID: 108005]
[afr-common.c:4387:afr_notify] 0-engine-replicate-0: Subvolume
'engine-client-0' came
 back up; going online.
[2017-05-13 17:26:31.360793] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-engine-client-0:
Server lk versi
on = 1
[2017-05-13 17:26:31.361284] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-engine-client-2: changing port to 49158 (from 0)
[2017-05-13 17:26:31.365070] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-engine-client-1: Using P
rogram GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-13 17:26:31.365788] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-engine-client-1: Connected
to engine
-client-1, attached to remote volume '/gluster/brick1/engine'.
[2017-05-13 17:26:31.365821] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-engine-client-1: Server
and Client l
k-version numbers are not same, reopening the fds
[2017-05-13 17:26:31.366059] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-engine-client-1:
Server lk versi
on = 1
[2017-05-13 17:26:31.369948] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-engine-client-2: Using P
rogram GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-13 17:26:31.370657] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-engine-client-2: Connected
to engine
-client-2, attached to remote volume '/gluster/brick1/engine'.
[2017-05-13 17:26:31.370683] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-engine-client-2: Server
and Client l
k-version numbers are not same, reopening the fds
[2017-05-13 17:26:31.383548] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-engine-client-2:
Server lk versi
on = 1
[2017-05-13 17:26:31.383649] I [fuse-bridge.c:4147:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 k
ernel 7.22
[2017-05-13 17:26:31.383676] I [fuse-bridge.c:4832:fuse_graph_sync] 0-fuse:
switched to graph 0
[2017-05-13 17:26:31.385453] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-engine-replicate-0: selecting
local rea
d_child engine-client-0
[2017-05-13 17:26:31.396741] I [fuse-bridge.c:5080:fuse_thread_proc]
0-fuse: unmounting /tmp/georepsetup_wZtfkN
[2017-05-13 17:26:31.397086] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dc5) [0x7f8838df6dc5] -->glusterf
s(glusterfs_sigwaiter+0xe5) [0x7f883a488cd5]
-->glusterfs(cleanup_and_exit+0x6b) [0x7f883a488b4b] ) 0-: received signum
(15), shutti
ng down
[2017-05-13 17:26:31.397112] I [fuse-bridge.c:5788:fini] 0-fuse: Unmounting
'/tmp/georepsetup_wZtfkN'.
[2017-05-13 17:26:31.413901] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-glusterfs: Started running glusterfs version 3.8.11 (args:
 glusterfs --xlator-option="*dht.lookup-unhashed=off" --volfile-server
georep.nwfiber.com --volfile-id engine -l /var/log/glusterfs/
geo-replication/georepsetup.mount.log --client-pid=-1
/tmp/georepsetup_M5poIr)
[2017-05-13 17:26:31.458733] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2017-05-13 17:26:31.458833] E [socket.c:2309:socket_connect_finish]
0-glusterfs: connection to 192.168.8.126:24007 failed (Connecti
on refused)
[2017-05-13 17:26:31.458886] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host: geore
p.nwfiber.com (Transport endpoint is not connected)
[2017-05-13 17:26:31.458900] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify]
0-glusterfsd-mgmt: Exhausted all volfile servers
[2017-05-13 17:26:31.459173] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xdb) [0x7f18d6c89aab]
-->glusterfs(+0x10309) [0x7f18d73b9309] -->glusterfs(cleanup_and_exit+0x6b)
[0x7f18d73b2b4b] ) 0-: received signum (1), shutting dow
n
[2017-05-13 17:26:31.459218] I [fuse-bridge.c:5788:fini] 0-fuse: Unmounting
'/tmp/georepsetup_M5poIr'.
[2017-05-13 17:26:31.459887] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dc5) [0x7f18d5d20dc5] -->glusterf
s(glusterfs_sigwaiter+0xe5) [0x7f18d73b2cd5]
-->glusterfs(cleanup_and_exit+0x6b) [0x7f18d73b2b4b] ) 0-: received signum
(15), shutti
ng down

I don't know what to make of that.

On a whim, I thought that perhaps the georep setup does not set up the
remote volume (I assumed it would, I thought that was what the ssh was
required for, and none of the instructions mentioned create your
destination (replication) volume.  So I tried to create it, but it won't
let me create a volume with replica 1.  this is already a backup, I don't
need a backup of a backup.  This further supported my thought that the
volume needs to be created by the georep setup commands.

Where am I wrong / what do I need to do to fix this?

--Jim

Jim Kusznir

Sahina Bose

Jim Kusznir

Sahina Bose

tags

participants (2)