This is a multi-part message in MIME format.
--------------080107030003000505040104
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Hi,
all glusterd daemons was runnig correctly at this time, no
firewalls/iptables restrictions
But "not connected" bricks are changing during the time without any touch .
It looks that glusterd has non-stable cross communication , especially
with different LAN range as nodes in Ovirt environmet
( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )
So I desided reinstall whole cluster, but I'm afraid that these problems
will occure again - will you know
regs.for your answers
Pavel
On 27.11.2015 10:16, knarra wrote:
On 11/27/2015 11:04 AM, knarra wrote:
> Hi Paf1,
>
> Looks like when you reboot the nodes, glusterd does not start up
> in one node and due to this the node gets disconnected from other
> node(that is what i see from logs). After reboot, once your systems
> are up and running , can you check if glusterd is running on all the
> nodes? Can you please let me know which build of gluster are you using ?
>
> For more info please read,
>
http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html
> - (please ignore this line)
>
> Thanks
> kasturi
>
> On 11/27/2015 10:52 AM, Sahina Bose wrote:
>> [+ gluster-users]
>>
>> On 11/26/2015 08:37 PM, paf1(a)email.cz wrote:
>>> Hello,
>>> can anybody help me with this timeouts ??
>>> Volumes are not active yes ( bricks down )
>>>
>>> desc. of gluster bellow ...
>>>
>>> */var/log/glusterfs/**etc-glusterfs-glusterd.vol.log*
>>> [2015-11-26 14:44:47.174221] I [MSGID: 106004]
>>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
>>> Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in
state
>>> <Peer in Cluster>, has disconnected from glusterd.
>>> [2015-11-26 14:44:47.174354] W
>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>> [0x7fb7039d44dc]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>> [0x7fb7039de542]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held
>>> [2015-11-26 14:44:47.174444] W
>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>> [0x7fb7039d44dc]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>> [0x7fb7039de542]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held
>>> [2015-11-26 14:44:47.174521] W
>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>> [0x7fb7039d44dc]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>> [0x7fb7039de542]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held
>>> [2015-11-26 14:44:47.174662] W
>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>>> [0x7fb7039d44dc]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>>> [0x7fb7039de542]
>>>
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held
>>> [2015-11-26 14:44:47.174532] W [MSGID: 106118]
>>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>>> Lock not released for 2HP12-P1
>>> [2015-11-26 14:44:47.174675] W [MSGID: 106118]
>>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>>> Lock not released for 2HP12-P3
>>> [2015-11-26 14:44:49.423334] I [MSGID: 106488]
>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>> 0-glusterd: Received get vol req
>>> The message "I [MSGID: 106488]
>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>> 0-glusterd: Received get vol req" repeated 4 times between
>>> [2015-11-26 14:44:49.423334] and [2015-11-26 14:44:49.429781]
>>> [2015-11-26 14:44:51.148711] I [MSGID: 106163]
>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>> 0-management: using the op-version 30702
>>> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive]
>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12,
>>> Invalid argument
>>> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive]
>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17,
>>> Invalid argument
>>> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>> [2015-11-26 14:44:52.395468] I [MSGID: 106163]
>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>> 0-management: using the op-version 30702
>>> [2015-11-26 14:44:54.851958] I [MSGID: 106488]
>>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
>>> 0-glusterd: Received get vol req
>>> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive]
>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19,
>>> Invalid argument
>>> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>>
>>> After volumes creation all works fine ( volumes up ) , but then,
>>> after several reboots ( yum updates) volumes failed due timeouts .
>>>
>>> Gluster description:
>>>
>>> 4 nodes with 4 volumes replica 2
>>> oVirt 3.6 - the last
>>> gluster 3.7.6 - the last
>>> vdsm 4.17.999 - from git repo
>>> oVirt - mgmt.nodes 172.16.0.0
>>> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster"
net)
>>> Network works fine, no lost packets
>>>
>>> # gluster volume status
>>> Staging failed on 2hp1-SAN. Please check log file for details.
>>> Staging failed on 1hp2-SAN. Please check log file for details.
>>> Staging failed on 2hp2-SAN. Please check log file for details.
>>>
>>> # gluster volume info
>>>
>>> Volume Name: 1HP12-P1
>>> Type: Replicate
>>> Volume ID: 6991e82c-9745-4203-9b0a-df202060f455
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 1hp1-SAN:/STORAGE/p1/G
>>> Brick2: 1hp2-SAN:/STORAGE/p1/G
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> Volume Name: 1HP12-P3
>>> Type: Replicate
>>> Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 1hp1-SAN:/STORAGE/p3/G
>>> Brick2: 1hp2-SAN:/STORAGE/p3/G
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> Volume Name: 2HP12-P1
>>> Type: Replicate
>>> Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 2hp1-SAN:/STORAGE/p1/G
>>> Brick2: 2hp2-SAN:/STORAGE/p1/G
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> Volume Name: 2HP12-P3
>>> Type: Replicate
>>> Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 2hp1-SAN:/STORAGE/p3/G
>>> Brick2: 2hp2-SAN:/STORAGE/p3/G
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> regs. for any hints
>>> Paf1
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
--------------080107030003000505040104
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000066" bgcolor="#FFFFFF">
Hi, <br>
all glusterd daemons was runnig correctly at this time, no
firewalls/iptables restrictions<br>
But "not connected" bricks are changing during the time without any
touch .<br>
It looks that glusterd has non-stable cross communication ,
especially with different LAN range as nodes in Ovirt environmet<br>
( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )<br>
So I desided reinstall whole cluster, but I'm afraid that these
problems will occure again - will you know<br>
<br>
regs.for your answers<br>
Pavel<br>
<br>
<div class="moz-cite-prefix">On 27.11.2015 10:16, knarra
wrote:<br>
</div>
<blockquote cite="mid:56581F8A.4020406@redhat.com"
type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 11/27/2015 11:04 AM, knarra
wrote:<br>
</div>
<blockquote cite="mid:5657EB7B.9040906@redhat.com"
type="cite">
<meta http-equiv="Context-Type" content="text/html;
charset=windows-1252">
<div class="moz-cite-prefix">Hi Paf1,<br>
<br>
Looks like when you reboot the nodes, glusterd does not
start up in one node and due to this the node gets
disconnected from other node(that is what i see from logs).
After reboot, once your systems are up and running , can you
check if glusterd is running on all the nodes? Can you please
let me know which build of gluster are you using ?<br>
<br>
For more info please read, <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://www.gluster.org/pipermail/gluster-users.old/2015-June/0...
- (please ignore this line)<br>
</div>
</blockquote>
<br>
<blockquote cite="mid:5657EB7B.9040906@redhat.com"
type="cite">
<div class="moz-cite-prefix"> <br>
Thanks<br>
kasturi<br>
<br>
On 11/27/2015 10:52 AM, Sahina Bose wrote:<br>
</div>
<blockquote cite="mid:5657E879.7030604@redhat.com"
type="cite">
[+ gluster-users]<br>
<br>
<div class="moz-cite-prefix">On 11/26/2015 08:37 PM, <a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:paf1@email.cz">paf1@email.cz</a>
wrote:<br>
</div>
<blockquote cite="mid:56572042.1070503@email.cz"
type="cite">
Hello, <br>
can anybody help me with this timeouts ??<br>
Volumes are not active yes ( bricks down )<br>
<br>
desc. of gluster bellow ...<br>
<br>
<b>/var/log/glusterfs/</b><b>etc-glusterfs-glusterd.vol.log</b><br>
[2015-11-26 14:44:47.174221] I [MSGID: 106004]
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify]
0-management: Peer <1hp1-SAN>
(<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state
<Peer in Cluster>, has disconnected from glusterd.<br>
[2015-11-26 14:44:47.174354] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
[0x7fb7039d44dc]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
[0x7fb7039de542]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not
held<br>
[2015-11-26 14:44:47.174444] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
[0x7fb7039d44dc]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
[0x7fb7039de542]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
[0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not
held<br>
[2015-11-26 14:44:47.174521] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
[0x7fb7039d44dc]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
[0x7fb7039de542]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not
held<br>
[2015-11-26 14:44:47.174662] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
[0x7fb7039d44dc]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
[0x7fb7039de542]
-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
[0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not
held<br>
[2015-11-26 14:44:47.174532] W [MSGID: 106118]
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
0-management: Lock not released for 2HP12-P1<br>
[2015-11-26 14:44:47.174675] W [MSGID: 106118]
[glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
0-management: Lock not released for 2HP12-P3<br>
[2015-11-26 14:44:49.423334] I [MSGID: 106488]
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
0-glusterd: Received get vol req<br>
The message "I [MSGID: 106488]
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
0-glusterd: Received get vol req" repeated 4 times between
[2015-11-26 14:44:49.423334] and [2015-11-26
14:44:49.429781]<br>
[2015-11-26 14:44:51.148711] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30702<br>
[2015-11-26 14:44:52.177266] W
[socket.c:869:__socket_keepalive] 0-socket: failed to set
TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument<br>
[2015-11-26 14:44:52.177291] E
[socket.c:2965:socket_connect] 0-management: Failed to set
keep-alive: Invalid argument<br>
[2015-11-26 14:44:53.180426] W
[socket.c:869:__socket_keepalive] 0-socket: failed to set
TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument<br>
[2015-11-26 14:44:53.180447] E
[socket.c:2965:socket_connect] 0-management: Failed to set
keep-alive: Invalid argument<br>
[2015-11-26 14:44:52.395468] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30702<br>
[2015-11-26 14:44:54.851958] I [MSGID: 106488]
[glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
0-glusterd: Received get vol req<br>
[2015-11-26 14:44:57.183969] W
[socket.c:869:__socket_keepalive] 0-socket: failed to set
TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument<br>
[2015-11-26 14:44:57.183990] E
[socket.c:2965:socket_connect] 0-management: Failed to set
keep-alive: Invalid argument<br>
<br>
After volumes creation all works fine ( volumes up ) , but
then, after several reboots ( yum updates) volumes failed
due timeouts .<br>
<br>
Gluster description:<br>
<br>
4 nodes with 4 volumes replica 2 <br>
oVirt 3.6 - the last<br>
gluster 3.7.6 - the last <br>
vdsm 4.17.999 - from git repo<br>
oVirt - mgmt.nodes 172.16.0.0<br>
oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster"
net)<br>
Network works fine, no lost packets<br>
<br>
# gluster volume status <br>
Staging failed on 2hp1-SAN. Please check log file for
details.<br>
Staging failed on 1hp2-SAN. Please check log file for
details.<br>
Staging failed on 2hp2-SAN. Please check log file for
details.<br>
<br>
# gluster volume info<br>
<br>
Volume Name: 1HP12-P1<br>
Type: Replicate<br>
Volume ID: 6991e82c-9745-4203-9b0a-df202060f455<br>
Status: Started<br>
Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 1hp1-SAN:/STORAGE/p1/G<br>
Brick2: 1hp2-SAN:/STORAGE/p1/G<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
<br>
Volume Name: 1HP12-P3<br>
Type: Replicate<br>
Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30<br>
Status: Started<br>
Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 1hp1-SAN:/STORAGE/p3/G<br>
Brick2: 1hp2-SAN:/STORAGE/p3/G<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
<br>
Volume Name: 2HP12-P1<br>
Type: Replicate<br>
Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb<br>
Status: Started<br>
Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 2hp1-SAN:/STORAGE/p1/G<br>
Brick2: 2hp2-SAN:/STORAGE/p1/G<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
<br>
Volume Name: 2HP12-P3<br>
Type: Replicate<br>
Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702<br>
Status: Started<br>
Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 2hp1-SAN:/STORAGE/p3/G<br>
Brick2: 2hp2-SAN:/STORAGE/p3/G<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
<br>
regs. for any hints<br>
Paf1<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>
--------------080107030003000505040104--