[ovirt-users] timeouts
paf1 at email.cz
paf1 at email.cz
Fri Nov 27 12:46:25 UTC 2015
Hi,
all glusterd daemons was runnig correctly at this time, no
firewalls/iptables restrictions
But "not connected" bricks are changing during the time without any touch .
It looks that glusterd has non-stable cross communication , especially
with different LAN range as nodes in Ovirt environmet
( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )
So I desided reinstall whole cluster, but I'm afraid that these problems
will occure again - will you know
regs.for your answers
Pavel
On 27.11.2015 10:16, knarra wrote:
> On 11/27/2015 11:04 AM, knarra wrote:
>> Hi Paf1,
>>
>> Looks like when you reboot the nodes, glusterd does not start up
>> in one node and due to this the node gets disconnected from other
>> node(that is what i see from logs). After reboot, once your systems
>> are up and running , can you check if glusterd is running on all the
>> nodes? Can you please let me know which build of gluster are you using ?
>>
>> For more info please read,
>> http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html
>> - (please ignore this line)
>
>>
>> Thanks
>> kasturi
>>
>> On 11/27/2015 10:52 AM, Sahina Bose wrote:
>>> [+ gluster-users]
>>>
>>> On 11/26/2015 08:37 PM, paf1 at email.cz wrote:
>>>> Hello,
>>>> can anybody help me with this timeouts ??
>>>> Volumes are not active yes ( bricks down )
>>>>
>>>> desc. of gluster bellow ...
>>>>
>>>> */var/log/glusterfs/**etc-glusterfs-glusterd.vol.log*
>>>> [2015-11-26 14:44:47.174221] I [MSGID: 106004]
>>>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
>>>> Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state
>>>> <Peer in Cluster>, has disconnected from glusterd.
>>>> [2015-11-26 14:44:47.174354] W
>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>>> [0x7fb7039d44dc]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>>> [0x7fb7039de542]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held
>>>> [2015-11-26 14:44:47.174444] W
>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>>> [0x7fb7039d44dc]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>>> [0x7fb7039de542]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held
>>>> [2015-11-26 14:44:47.174521] W
>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>>> [0x7fb7039d44dc]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>>> [0x7fb7039de542]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held
>>>> [2015-11-26 14:44:47.174662] W
>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>>> [0x7fb7039d44dc]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>>> [0x7fb7039de542]
>>>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held
>>>> [2015-11-26 14:44:47.174532] W [MSGID: 106118]
>>>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>>>> Lock not released for 2HP12-P1
>>>> [2015-11-26 14:44:47.174675] W [MSGID: 106118]
>>>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>>>> Lock not released for 2HP12-P3
>>>> [2015-11-26 14:44:49.423334] I [MSGID: 106488]
>>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>>> 0-glusterd: Received get vol req
>>>> The message "I [MSGID: 106488]
>>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>>> 0-glusterd: Received get vol req" repeated 4 times between
>>>> [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]
>>>> [2015-11-26 14:44:51.148711] I [MSGID: 106163]
>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>>> 0-management: using the op-version 30702
>>>> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive]
>>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12,
>>>> Invalid argument
>>>> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect]
>>>> 0-management: Failed to set keep-alive: Invalid argument
>>>> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive]
>>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17,
>>>> Invalid argument
>>>> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect]
>>>> 0-management: Failed to set keep-alive: Invalid argument
>>>> [2015-11-26 14:44:52.395468] I [MSGID: 106163]
>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>>> 0-management: using the op-version 30702
>>>> [2015-11-26 14:44:54.851958] I [MSGID: 106488]
>>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>>> 0-glusterd: Received get vol req
>>>> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive]
>>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19,
>>>> Invalid argument
>>>> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect]
>>>> 0-management: Failed to set keep-alive: Invalid argument
>>>>
>>>> After volumes creation all works fine ( volumes up ) , but then,
>>>> after several reboots ( yum updates) volumes failed due timeouts .
>>>>
>>>> Gluster description:
>>>>
>>>> 4 nodes with 4 volumes replica 2
>>>> oVirt 3.6 - the last
>>>> gluster 3.7.6 - the last
>>>> vdsm 4.17.999 - from git repo
>>>> oVirt - mgmt.nodes 172.16.0.0
>>>> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)
>>>> Network works fine, no lost packets
>>>>
>>>> # gluster volume status
>>>> Staging failed on 2hp1-SAN. Please check log file for details.
>>>> Staging failed on 1hp2-SAN. Please check log file for details.
>>>> Staging failed on 2hp2-SAN. Please check log file for details.
>>>>
>>>> # gluster volume info
>>>>
>>>> Volume Name: 1HP12-P1
>>>> Type: Replicate
>>>> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 1hp1-SAN:/STORAGE/p1/G
>>>> Brick2: 1hp2-SAN:/STORAGE/p1/G
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> Volume Name: 1HP12-P3
>>>> Type: Replicate
>>>> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 1hp1-SAN:/STORAGE/p3/G
>>>> Brick2: 1hp2-SAN:/STORAGE/p3/G
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> Volume Name: 2HP12-P1
>>>> Type: Replicate
>>>> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 2hp1-SAN:/STORAGE/p1/G
>>>> Brick2: 2hp2-SAN:/STORAGE/p1/G
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> Volume Name: 2HP12-P3
>>>> Type: Replicate
>>>> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 2hp1-SAN:/STORAGE/p3/G
>>>> Brick2: 2hp2-SAN:/STORAGE/p3/G
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> regs. for any hints
>>>> Paf1
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20151127/b1ca559a/attachment-0001.html>
More information about the Users
mailing list