All these ip are pingable and hosts resolvible across all 3 nodes but,
> only the 10.10.10.0 network is the decidated network for gluster (rosolved
> using gdnode* host names) ... You think that remove other entries can fix
> the problem? So, sorry, but, how can I remove other entries?
>
I don't think having extra entries could be a problem. Did you check the
fuse mount logs for disconnect messages that I referred to in the other
email?
* tail -f
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-dvirtgluster\:engine.log*
*NODE01:*
[2017-07-24 07:34:00.799347] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 07:44:46.687334] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 09:04:25.951350] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 09:15:11.839357] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 10:34:51.231353] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 10:45:36.991321] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 12:05:16.383323] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 12:16:02.271320] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 13:35:41.535308] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 13:46:27.423304] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
Why again gdnode03? Was removed from gluster! was the arbiter node...
*NODE02:*
[2017-07-24 14:08:18.709209] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [
1] sinks=2
[2017-07-24 14:08:38.746688] I [MSGID: 108026] [
afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81
[2017-07-24 14:08:38.749379] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
sources=0 [1] sinks=2
[2017-07-24 14:08:46.068001] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [
1] sinks=2
The message "I [MSGID: 108026] [
afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81" repeated 3 times between [2017-07-24
14:08:38.746688] and [2017-07-24 14:10:09.088625]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=0 [1] sinks=2 " repeated 3
times between [2017-07-24 14:08:38.749379] and [2017-07-24 14:10:09.091377]
[2017-07-24 14:10:19.384379] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0
[1] sinks=2
[2017-07-24 14:10:39.433155] I [MSGID: 108026] [afr-self-heal-metadata.c:51:
__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata
selfheal on f05b9742-2771-484a-85fc-5b6974bcef81
[2017-07-24 14:10:39.435847] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
sources=0 [1] sinks=2
*NODE04:*
[2017-07-24 14:08:56.789598] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2
[2017-07-24 14:09:17.231987] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00
-fd5b-4326-a879-326ff56181de. sources=[0] 1 sinks=2
[2017-07-24 14:09:38.039541] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2
[2017-07-24 14:09:48.875602] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00
-fd5b-4326-a879-326ff56181de. sources=[0] 1 sinks=2
[2017-07-24 14:10:39.832068] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2
The message "I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
sources=[0] 1 sinks=2 " repeated 3 times between [2017-07-24 14:10:
39.832068] and [2017-07-24 14:12:22.686142]
Last message was (I think) because I have reexecute an "heal" command
n.b. dvirtgluster is the RR DNS for all node gluster
> And, what about the selinux?
>
Not sure about this. See if there are disconnect messages in the mount
logs first.
-Ravi
> Thank you
>
>
>
No messages selinux related...
Thank you!