[ovirt-users] timeouts

26 Nov 2015

      This is a multi-part message in MIME format.
--------------020003090101020807030701
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hello,
can anybody  help me with this timeouts ??
Volumes are not active yes ( bricks down )

desc. of gluster bellow ...

*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log*
[2015-11-26 14:44:47.174221] I [MSGID: 106004] 
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer 
<1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in 
Cluster>, has disconnected from glusterd.
[2015-11-26 14:44:47.174354] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held
[2015-11-26 14:44:47.174444] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held
[2015-11-26 14:44:47.174521] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held
[2015-11-26 14:44:47.174662] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held
[2015-11-26 14:44:47.174532] W [MSGID: 106118] 
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock 
not released for 2HP12-P1
[2015-11-26 14:44:47.174675] W [MSGID: 106118] 
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock 
not released for 2HP12-P3
[2015-11-26 14:44:49.423334] I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: 
Received get vol req
The message "I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: 
Received get vol req" repeated 4 times between [2015-11-26 
14:44:49.423334] and [2015-11-26 14:44:49.429781]
[2015-11-26 14:44:51.148711] I [MSGID: 106163] 
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 
0-management: using the op-version 30702
[2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid 
argument
[2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
[2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid 
argument
[2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
[2015-11-26 14:44:52.395468] I [MSGID: 106163] 
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 
0-management: using the op-version 30702
[2015-11-26 14:44:54.851958] I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: 
Received get vol req
[2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid 
argument
[2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument

After volumes creation all works fine ( volumes up ) , but then, after 
several reboots ( yum updates) volumes failed due timeouts .

Gluster description:

4 nodes with 4 volumes replica 2
oVirt 3.6 - the last
gluster 3.7.6 - the last
vdsm 4.17.999 - from git repo
oVirt - mgmt.nodes 172.16.0.0
oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)
Network works fine, no lost packets

# gluster volume status
Staging failed on 2hp1-SAN. Please check log file for details.
Staging failed on 1hp2-SAN. Please check log file for details.
Staging failed on 2hp2-SAN. Please check log file for details.

# gluster volume info

Volume Name: 1HP12-P1
Type: Replicate
Volume ID: 6991e82c-9745-4203-9b0a-df202060f455
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 1hp1-SAN:/STORAGE/p1/G
Brick2: 1hp2-SAN:/STORAGE/p1/G
Options Reconfigured:
performance.readdir-ahead: on

Volume Name: 1HP12-P3
Type: Replicate
Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 1hp1-SAN:/STORAGE/p3/G
Brick2: 1hp2-SAN:/STORAGE/p3/G
Options Reconfigured:
performance.readdir-ahead: on

Volume Name: 2HP12-P1
Type: Replicate
Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 2hp1-SAN:/STORAGE/p1/G
Brick2: 2hp2-SAN:/STORAGE/p1/G
Options Reconfigured:
performance.readdir-ahead: on

Volume Name: 2HP12-P3
Type: Replicate
Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 2hp1-SAN:/STORAGE/p3/G
Brick2: 2hp2-SAN:/STORAGE/p3/G
Options Reconfigured:
performance.readdir-ahead: on

regs. for any hints
Paf1

--------------020003090101020807030701
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body text="#000066" bgcolor="#FFFFFF">
    Hello, <br>
    can anybody  help me with this timeouts ??<br>
    Volumes are not active yes ( bricks down )<br>
    <br>
    desc. of gluster bellow ...<br>
    <br>
    <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br>
    [2015-11-26 14:44:47.174221] I [MSGID: 106004]
    [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
    Peer <1hp1-SAN>
    (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state <Peer in
    Cluster>, has disconnected from glusterd.<br>
    [2015-11-26 14:44:47.174354] W
    [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
    (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
    [0x7fb7039d44dc]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
    [0x7fb7039de542]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
    [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br>
    [2015-11-26 14:44:47.174444] W
    [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
    (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
    [0x7fb7039d44dc]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
    [0x7fb7039de542]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
    [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br>
    [2015-11-26 14:44:47.174521] W
    [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
    (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
    [0x7fb7039d44dc]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
    [0x7fb7039de542]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
    [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br>
    [2015-11-26 14:44:47.174662] W
    [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
    (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
    [0x7fb7039d44dc]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
    [0x7fb7039de542]
    -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
    [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br>
    [2015-11-26 14:44:47.174532] W [MSGID: 106118]
    [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
    Lock not released for 2HP12-P1<br>
    [2015-11-26 14:44:47.174675] W [MSGID: 106118]
    [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
    Lock not released for 2HP12-P3<br>
    [2015-11-26 14:44:49.423334] I [MSGID: 106488]
    [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
    0-glusterd: Received get vol req<br>
    The message "I [MSGID: 106488]
    [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
    0-glusterd: Received get vol req" repeated 4 times between
    [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br>
    [2015-11-26 14:44:51.148711] I [MSGID: 106163]
    [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
    0-management: using the op-version 30702<br>
    [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive]
    0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid
    argument<br>
    [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect]
    0-management: Failed to set keep-alive: Invalid argument<br>
    [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive]
    0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid
    argument<br>
    [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect]
    0-management: Failed to set keep-alive: Invalid argument<br>
    [2015-11-26 14:44:52.395468] I [MSGID: 106163]
    [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
    0-management: using the op-version 30702<br>
    [2015-11-26 14:44:54.851958] I [MSGID: 106488]
    [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
    0-glusterd: Received get vol req<br>
    [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive]
    0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid
    argument<br>
    [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect]
    0-management: Failed to set keep-alive: Invalid argument<br>
    <br>
    After volumes creation all works fine ( volumes up ) , but then,
    after several reboots ( yum updates) volumes failed due timeouts .<br>
    <br>
    Gluster description:<br>
    <br>
    4 nodes with 4 volumes replica 2 <br>
    oVirt 3.6 - the last<br>
    gluster 3.7.6 - the last <br>
    vdsm 4.17.999 - from git repo<br>
    oVirt - mgmt.nodes 172.16.0.0<br>
    oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br>
    Network works fine, no lost packets<br>
    <br>
    # gluster volume status <br>
    Staging failed on 2hp1-SAN. Please check log file for details.<br>
    Staging failed on 1hp2-SAN. Please check log file for details.<br>
    Staging failed on 2hp2-SAN. Please check log file for details.<br>
    <br>
    # gluster volume info<br>
    <br>
    Volume Name: 1HP12-P1<br>
    Type: Replicate<br>
    Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br>
    Status: Started<br>
    Number of Bricks: 1 x 2 = 2<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 1hp1-SAN:/STORAGE/p1/G<br>
    Brick2: 1hp2-SAN:/STORAGE/p1/G<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    <br>
    Volume Name: 1HP12-P3<br>
    Type: Replicate<br>
    Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br>
    Status: Started<br>
    Number of Bricks: 1 x 2 = 2<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 1hp1-SAN:/STORAGE/p3/G<br>
    Brick2: 1hp2-SAN:/STORAGE/p3/G<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    <br>
    Volume Name: 2HP12-P1<br>
    Type: Replicate<br>
    Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br>
    Status: Started<br>
    Number of Bricks: 1 x 2 = 2<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 2hp1-SAN:/STORAGE/p1/G<br>
    Brick2: 2hp2-SAN:/STORAGE/p1/G<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    <br>
    Volume Name: 2HP12-P3<br>
    Type: Replicate<br>
    Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br>
    Status: Started<br>
    Number of Bricks: 1 x 2 = 2<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 2hp1-SAN:/STORAGE/p3/G<br>
    Brick2: 2hp2-SAN:/STORAGE/p3/G<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    <br>
    regs. for any hints<br>
    Paf1<br>
  </body>
</html>

--------------020003090101020807030701--

[ovirt-users] timeouts

paf1＠email.cz