
This is a multi-part message in MIME format. --------------080107030003000505040104 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi, all glusterd daemons was runnig correctly at this time, no firewalls/iptables restrictions But "not connected" bricks are changing during the time without any touch . It looks that glusterd has non-stable cross communication , especially with different LAN range as nodes in Ovirt environmet ( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net ) So I desided reinstall whole cluster, but I'm afraid that these problems will occure again - will you know regs.for your answers Pavel On 27.11.2015 10:16, knarra wrote:
On 11/27/2015 11:04 AM, knarra wrote:
Hi Paf1,
Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?
For more info please read, http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html - (please ignore this line)
Thanks kasturi
On 11/27/2015 10:52 AM, Sahina Bose wrote:
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------080107030003000505040104 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body text="#000066" bgcolor="#FFFFFF"> Hi, <br> all glusterd daemons was runnig correctly at this time, no firewalls/iptables restrictions<br> But "not connected" bricks are changing during the time without any touch .<br> It looks that glusterd has non-stable cross communication , especially with different LAN range as nodes in Ovirt environmet<br> ( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )<br> So I desided reinstall whole cluster, but I'm afraid that these problems will occure again - will you know<br> <br> regs.for your answers<br> Pavel<br> <br> <div class="moz-cite-prefix">On 27.11.2015 10:16, knarra wrote:<br> </div> <blockquote cite="mid:56581F8A.4020406@redhat.com" type="cite"> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> <div class="moz-cite-prefix">On 11/27/2015 11:04 AM, knarra wrote:<br> </div> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <meta http-equiv="Context-Type" content="text/html; charset=windows-1252"> <div class="moz-cite-prefix">Hi Paf1,<br> <br> Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?<br> <br> For more info please read, <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html">http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html</a> - (please ignore this line)<br> </div> </blockquote> <br> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <div class="moz-cite-prefix"> <br> Thanks<br> kasturi<br> <br> On 11/27/2015 10:52 AM, Sahina Bose wrote:<br> </div> <blockquote cite="mid:5657E879.7030604@redhat.com" type="cite"> [+ gluster-users]<br> <br> <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br> </div> <blockquote cite="mid:56572042.1070503@email.cz" type="cite"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </body> </html> --------------080107030003000505040104--