
This is a multi-part message in MIME format. --------------020003090101020807030701 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down ) desc. of gluster bellow ... */var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts . Gluster description: 4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets # gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details. # gluster volume info Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on regs. for any hints Paf1 --------------020003090101020807030701 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000066" bgcolor="#FFFFFF"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> </body> </html> --------------020003090101020807030701--

This is a multi-part message in MIME format. --------------000009080607040000080604 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit [+ gluster-users] On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------000009080607040000080604 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> [+ gluster-users]<br> <br> <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br> </div> <blockquote cite="mid:56572042.1070503@email.cz" type="cite"> <meta http-equiv="content-type" content="text/html; charset=windows-1252"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------000009080607040000080604--

On 11/27/2015 10:52 AM, Sahina Bose wrote:
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details. Looking at glusterd log from the above nodes (2hp1-SAN, 1hp2-SAN, 2hp2-SAN) will give you the exact reason of the failure. Could you attach glusterd log from any one of these nodes?
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users

This is a multi-part message in MIME format. --------------060500070706020104080403 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi Paf1, Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ? For more info please read, http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html Thanks kasturi On 11/27/2015 10:52 AM, Sahina Bose wrote:
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------060500070706020104080403 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">Hi Paf1,<br> <br> Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?<br> <br> For more info please read, <a class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html">http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html</a><br> <br> Thanks<br> kasturi<br> <br> On 11/27/2015 10:52 AM, Sahina Bose wrote:<br> </div> <blockquote cite="mid:5657E879.7030604@redhat.com" type="cite"> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> [+ gluster-users]<br> <br> <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br> </div> <blockquote cite="mid:56572042.1070503@email.cz" type="cite"> <meta http-equiv="content-type" content="text/html; charset=windows-1252"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------060500070706020104080403--

This is a multi-part message in MIME format. --------------060902020102090908010506 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 11/27/2015 11:04 AM, knarra wrote:
Hi Paf1,
Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?
For more info please read, http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html - (please ignore this line)
Thanks kasturi
On 11/27/2015 10:52 AM, Sahina Bose wrote:
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------060902020102090908010506 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 11/27/2015 11:04 AM, knarra wrote:<br> </div> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <meta http-equiv="Context-Type" content="text/html; charset=windows-1252"> <div class="moz-cite-prefix">Hi Paf1,<br> <br> Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?<br> <br> For more info please read, <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html">http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html</a> - (please ignore this line)<br> </div> </blockquote> <br> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <div class="moz-cite-prefix"> <br> Thanks<br> kasturi<br> <br> On 11/27/2015 10:52 AM, Sahina Bose wrote:<br> </div> <blockquote cite="mid:5657E879.7030604@redhat.com" type="cite"> [+ gluster-users]<br> <br> <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br> </div> <blockquote cite="mid:56572042.1070503@email.cz" type="cite"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------060902020102090908010506--

This is a multi-part message in MIME format. --------------080107030003000505040104 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi, all glusterd daemons was runnig correctly at this time, no firewalls/iptables restrictions But "not connected" bricks are changing during the time without any touch . It looks that glusterd has non-stable cross communication , especially with different LAN range as nodes in Ovirt environmet ( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net ) So I desided reinstall whole cluster, but I'm afraid that these problems will occure again - will you know regs.for your answers Pavel On 27.11.2015 10:16, knarra wrote:
On 11/27/2015 11:04 AM, knarra wrote:
Hi Paf1,
Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?
For more info please read, http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html - (please ignore this line)
Thanks kasturi
On 11/27/2015 10:52 AM, Sahina Bose wrote:
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
Hello, can anybody help me with this timeouts ?? Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log* [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1 [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3 [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781] [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702 [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2 oVirt 3.6 - the last gluster 3.7.6 - the last vdsm 4.17.999 - from git repo oVirt - mgmt.nodes 172.16.0.0 oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net) Network works fine, no lost packets
# gluster volume status Staging failed on 2hp1-SAN. Please check log file for details. Staging failed on 1hp2-SAN. Please check log file for details. Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1 Type: Replicate Volume ID: 6991e82c-9745-4203-9b0a-df202060f455 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p1/G Brick2: 1hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 1HP12-P3 Type: Replicate Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1hp1-SAN:/STORAGE/p3/G Brick2: 1hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P1 Type: Replicate Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p1/G Brick2: 2hp2-SAN:/STORAGE/p1/G Options Reconfigured: performance.readdir-ahead: on
Volume Name: 2HP12-P3 Type: Replicate Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 2hp1-SAN:/STORAGE/p3/G Brick2: 2hp2-SAN:/STORAGE/p3/G Options Reconfigured: performance.readdir-ahead: on
regs. for any hints Paf1
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------080107030003000505040104 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body text="#000066" bgcolor="#FFFFFF"> Hi, <br> all glusterd daemons was runnig correctly at this time, no firewalls/iptables restrictions<br> But "not connected" bricks are changing during the time without any touch .<br> It looks that glusterd has non-stable cross communication , especially with different LAN range as nodes in Ovirt environmet<br> ( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )<br> So I desided reinstall whole cluster, but I'm afraid that these problems will occure again - will you know<br> <br> regs.for your answers<br> Pavel<br> <br> <div class="moz-cite-prefix">On 27.11.2015 10:16, knarra wrote:<br> </div> <blockquote cite="mid:56581F8A.4020406@redhat.com" type="cite"> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> <div class="moz-cite-prefix">On 11/27/2015 11:04 AM, knarra wrote:<br> </div> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <meta http-equiv="Context-Type" content="text/html; charset=windows-1252"> <div class="moz-cite-prefix">Hi Paf1,<br> <br> Looks like when you reboot the nodes, glusterd does not start up in one node and due to this the node gets disconnected from other node(that is what i see from logs). After reboot, once your systems are up and running , can you check if glusterd is running on all the nodes? Can you please let me know which build of gluster are you using ?<br> <br> For more info please read, <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html">http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html</a> - (please ignore this line)<br> </div> </blockquote> <br> <blockquote cite="mid:5657EB7B.9040906@redhat.com" type="cite"> <div class="moz-cite-prefix"> <br> Thanks<br> kasturi<br> <br> On 11/27/2015 10:52 AM, Sahina Bose wrote:<br> </div> <blockquote cite="mid:5657E879.7030604@redhat.com" type="cite"> [+ gluster-users]<br> <br> <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br> </div> <blockquote cite="mid:56572042.1070503@email.cz" type="cite"> Hello, <br> can anybody help me with this timeouts ??<br> Volumes are not active yes ( bricks down )<br> <br> desc. of gluster bellow ...<br> <br> <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br> [2015-11-26 14:44:47.174221] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in Cluster>, has disconnected from glusterd.<br> [2015-11-26 14:44:47.174354] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br> [2015-11-26 14:44:47.174444] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br> [2015-11-26 14:44:47.174521] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br> [2015-11-26 14:44:47.174662] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) [0x7fb7039d44dc] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fb7039de542] -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br> [2015-11-26 14:44:47.174532] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P1<br> [2015-11-26 14:44:47.174675] W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for 2HP12-P3<br> [2015-11-26 14:44:49.423334] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 4 times between [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br> [2015-11-26 14:44:51.148711] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> [2015-11-26 14:44:52.395468] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30702<br> [2015-11-26 14:44:54.851958] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req<br> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument<br> <br> After volumes creation all works fine ( volumes up ) , but then, after several reboots ( yum updates) volumes failed due timeouts .<br> <br> Gluster description:<br> <br> 4 nodes with 4 volumes replica 2 <br> oVirt 3.6 - the last<br> gluster 3.7.6 - the last <br> vdsm 4.17.999 - from git repo<br> oVirt - mgmt.nodes 172.16.0.0<br> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br> Network works fine, no lost packets<br> <br> # gluster volume status <br> Staging failed on 2hp1-SAN. Please check log file for details.<br> Staging failed on 1hp2-SAN. Please check log file for details.<br> Staging failed on 2hp2-SAN. Please check log file for details.<br> <br> # gluster volume info<br> <br> Volume Name: 1HP12-P1<br> Type: Replicate<br> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p1/G<br> Brick2: 1hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 1HP12-P3<br> Type: Replicate<br> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 1hp1-SAN:/STORAGE/p3/G<br> Brick2: 1hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P1<br> Type: Replicate<br> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p1/G<br> Brick2: 2hp2-SAN:/STORAGE/p1/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> Volume Name: 2HP12-P3<br> Type: Replicate<br> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br> Status: Started<br> Number of Bricks: 1 x 2 = 2<br> Transport-type: tcp<br> Bricks:<br> Brick1: 2hp1-SAN:/STORAGE/p3/G<br> Brick2: 2hp2-SAN:/STORAGE/p3/G<br> Options Reconfigured:<br> performance.readdir-ahead: on<br> <br> regs. for any hints<br> Paf1<br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </blockquote> <br> </body> </html> --------------080107030003000505040104--
participants (4)
-
Atin Mukherjee
-
knarra
-
paf1@email.cz
-
Sahina Bose