Re: [ovirt-users] timeouts

27 Nov 2015

      This is a multi-part message in MIME format.
--------------060500070706020104080403
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

Hi Paf1,

     Looks like when you reboot the nodes, glusterd does not start up in 
one node and due to this the node gets disconnected from other node(that 
is what i see from logs). After reboot, once your systems are up and 
running , can you check if glusterd is running on all the nodes? Can you 
please let me know which build of gluster are you using ?

     For more info please read, 
http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html

Thanks
kasturi

On 11/27/2015 10:52 AM, Sahina Bose wrote:
...
[+ gluster-users]
On 11/26/2015 08:37 PM, paf1@email.cz wrote:
...
Hello,
can anybody  help me with this timeouts ??
Volumes are not active yes ( bricks down )
desc. of gluster bellow ...
*/var/log/glusterfs/**etc-glusterfs-glusterd.vol.log*
[2015-11-26 14:44:47.174221] I [MSGID: 106004] 
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: 
Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state 
<Peer in Cluster>, has disconnected from glusterd.
[2015-11-26 14:44:47.174354] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held
[2015-11-26 14:44:47.174444] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held
[2015-11-26 14:44:47.174521] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held
[2015-11-26 14:44:47.174662] W 
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c) 
[0x7fb7039d44dc] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) 
[0x7fb7039de542] 
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) 
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held
[2015-11-26 14:44:47.174532] W [MSGID: 106118] 
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: 
Lock not released for 2HP12-P1
[2015-11-26 14:44:47.174675] W [MSGID: 106118] 
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: 
Lock not released for 2HP12-P3
[2015-11-26 14:44:49.423334] I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 
0-glusterd: Received get vol req
The message "I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 
0-glusterd: Received get vol req" repeated 4 times between 
[2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]
[2015-11-26 14:44:51.148711] I [MSGID: 106163] 
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 
0-management: using the op-version 30702
[2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid 
argument
[2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
[2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid 
argument
[2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
[2015-11-26 14:44:52.395468] I [MSGID: 106163] 
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 
0-management: using the op-version 30702
[2015-11-26 14:44:54.851958] I [MSGID: 106488] 
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 
0-glusterd: Received get vol req
[2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid 
argument
[2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
After volumes creation all works fine ( volumes up ) , but then, 
after several reboots ( yum updates) volumes failed due timeouts .
Gluster description:
4 nodes with 4 volumes replica 2
oVirt 3.6 - the last
gluster 3.7.6 - the last
vdsm 4.17.999 - from git repo
oVirt - mgmt.nodes 172.16.0.0
oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)
Network works fine, no lost packets
# gluster volume status
Staging failed on 2hp1-SAN. Please check log file for details.
Staging failed on 1hp2-SAN. Please check log file for details.
Staging failed on 2hp2-SAN. Please check log file for details.
# gluster volume info
Volume Name: 1HP12-P1
Type: Replicate
Volume ID: 6991e82c-9745-4203-9b0a-df202060f455
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 1hp1-SAN:/STORAGE/p1/G
Brick2: 1hp2-SAN:/STORAGE/p1/G
Options Reconfigured:
performance.readdir-ahead: on
Volume Name: 1HP12-P3
Type: Replicate
Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 1hp1-SAN:/STORAGE/p3/G
Brick2: 1hp2-SAN:/STORAGE/p3/G
Options Reconfigured:
performance.readdir-ahead: on
Volume Name: 2HP12-P1
Type: Replicate
Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 2hp1-SAN:/STORAGE/p1/G
Brick2: 2hp2-SAN:/STORAGE/p1/G
Options Reconfigured:
performance.readdir-ahead: on
Volume Name: 2HP12-P3
Type: Replicate
Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 2hp1-SAN:/STORAGE/p3/G
Brick2: 2hp2-SAN:/STORAGE/p3/G
Options Reconfigured:
performance.readdir-ahead: on
regs. for any hints
Paf1
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------060500070706020104080403
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Paf1,<br>
      <br>
          Looks like when you reboot the nodes, glusterd does not start
      up in one node and due to this the node gets disconnected from
      other node(that is what i see from logs). After reboot, once your
      systems are up and running , can you check if glusterd is running
      on all the nodes? Can you please let me know which build of
      gluster are you using ?<br>
      <br>
          For more info please read,
      <a class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html">http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html</a><br>
      <br>
      Thanks<br>
      kasturi<br>
      <br>
      On 11/27/2015 10:52 AM, Sahina Bose wrote:<br>
    </div>
    <blockquote cite="mid:5657E879.7030604@redhat.com" type="cite">
      <meta content="text/html; charset=windows-1252"
        http-equiv="Content-Type">
      [+ gluster-users]<br>
      <br>
      <div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a
          moz-do-not-send="true" class="moz-txt-link-abbreviated"
          href="mailto:paf1@email.cz">paf1@email.cz</a> wrote:<br>
      </div>
      <blockquote cite="mid:56572042.1070503@email.cz" type="cite">
        <meta http-equiv="content-type" content="text/html;
          charset=windows-1252">
        Hello, <br>
        can anybody  help me with this timeouts ??<br>
        Volumes are not active yes ( bricks down )<br>
        <br>
        desc. of gluster bellow ...<br>
        <br>
        <b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br>
        [2015-11-26 14:44:47.174221] I [MSGID: 106004]
        [glusterd-handler.c:5065:__glusterd_peer_rpc_notify]
        0-management: Peer <1hp1-SAN>
        (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state
        <Peer in Cluster>, has disconnected from glusterd.<br>
        [2015-11-26 14:44:47.174354] W
        [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
        (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)

        [0x7fb7039d44dc]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)

        [0x7fb7039de542]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)

        [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held<br>
        [2015-11-26 14:44:47.174444] W
        [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
        (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)

        [0x7fb7039d44dc]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)

        [0x7fb7039de542]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)

        [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held<br>
        [2015-11-26 14:44:47.174521] W
        [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
        (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)

        [0x7fb7039d44dc]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)

        [0x7fb7039de542]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)

        [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held<br>
        [2015-11-26 14:44:47.174662] W
        [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
        (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)

        [0x7fb7039d44dc]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)

        [0x7fb7039de542]
        -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)

        [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held<br>
        [2015-11-26 14:44:47.174532] W [MSGID: 106118]
        [glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
        0-management: Lock not released for 2HP12-P1<br>
        [2015-11-26 14:44:47.174675] W [MSGID: 106118]
        [glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
        0-management: Lock not released for 2HP12-P3<br>
        [2015-11-26 14:44:49.423334] I [MSGID: 106488]
        [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
        0-glusterd: Received get vol req<br>
        The message "I [MSGID: 106488]
        [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
        0-glusterd: Received get vol req" repeated 4 times between
        [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]<br>
        [2015-11-26 14:44:51.148711] I [MSGID: 106163]
        [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
        0-management: using the op-version 30702<br>
        [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive]
        0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12,
        Invalid argument<br>
        [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect]
        0-management: Failed to set keep-alive: Invalid argument<br>
        [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive]
        0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17,
        Invalid argument<br>
        [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect]
        0-management: Failed to set keep-alive: Invalid argument<br>
        [2015-11-26 14:44:52.395468] I [MSGID: 106163]
        [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
        0-management: using the op-version 30702<br>
        [2015-11-26 14:44:54.851958] I [MSGID: 106488]
        [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
        0-glusterd: Received get vol req<br>
        [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive]
        0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19,
        Invalid argument<br>
        [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect]
        0-management: Failed to set keep-alive: Invalid argument<br>
        <br>
        After volumes creation all works fine ( volumes up ) , but then,
        after several reboots ( yum updates) volumes failed due timeouts
        .<br>
        <br>
        Gluster description:<br>
        <br>
        4 nodes with 4 volumes replica 2 <br>
        oVirt 3.6 - the last<br>
        gluster 3.7.6 - the last <br>
        vdsm 4.17.999 - from git repo<br>
        oVirt - mgmt.nodes 172.16.0.0<br>
        oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)<br>
        Network works fine, no lost packets<br>
        <br>
        # gluster volume status <br>
        Staging failed on 2hp1-SAN. Please check log file for details.<br>
        Staging failed on 1hp2-SAN. Please check log file for details.<br>
        Staging failed on 2hp2-SAN. Please check log file for details.<br>
        <br>
        # gluster volume info<br>
        <br>
        Volume Name: 1HP12-P1<br>
        Type: Replicate<br>
        Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br>
        Status: Started<br>
        Number of Bricks: 1 x 2 = 2<br>
        Transport-type: tcp<br>
        Bricks:<br>
        Brick1: 1hp1-SAN:/STORAGE/p1/G<br>
        Brick2: 1hp2-SAN:/STORAGE/p1/G<br>
        Options Reconfigured:<br>
        performance.readdir-ahead: on<br>
        <br>
        Volume Name: 1HP12-P3<br>
        Type: Replicate<br>
        Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br>
        Status: Started<br>
        Number of Bricks: 1 x 2 = 2<br>
        Transport-type: tcp<br>
        Bricks:<br>
        Brick1: 1hp1-SAN:/STORAGE/p3/G<br>
        Brick2: 1hp2-SAN:/STORAGE/p3/G<br>
        Options Reconfigured:<br>
        performance.readdir-ahead: on<br>
        <br>
        Volume Name: 2HP12-P1<br>
        Type: Replicate<br>
        Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br>
        Status: Started<br>
        Number of Bricks: 1 x 2 = 2<br>
        Transport-type: tcp<br>
        Bricks:<br>
        Brick1: 2hp1-SAN:/STORAGE/p1/G<br>
        Brick2: 2hp2-SAN:/STORAGE/p1/G<br>
        Options Reconfigured:<br>
        performance.readdir-ahead: on<br>
        <br>
        Volume Name: 2HP12-P3<br>
        Type: Replicate<br>
        Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br>
        Status: Started<br>
        Number of Bricks: 1 x 2 = 2<br>
        Transport-type: tcp<br>
        Bricks:<br>
        Brick1: 2hp1-SAN:/STORAGE/p3/G<br>
        Brick2: 2hp2-SAN:/STORAGE/p3/G<br>
        Options Reconfigured:<br>
        performance.readdir-ahead: on<br>
        <br>
        regs. for any hints<br>
        Paf1<br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <pre wrap="">_______________________________________________
Users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>

--------------060500070706020104080403--

Re: [ovirt-users] timeouts

knarra