This is a multi-part message in MIME format.
--------------070504030107060608030102
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
hi,
Are you using thin-lvm based backend on which the bricks are created?
Pranith
On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are
two virtualisation clusters one with two nehelem nodes and one with
four sandybridge nodes. My master storage domain is a GlusterFS
backed by a replica 3 gluster volume from 3 of the gluster nodes. The
engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with
storage broviede by nfs from a different gluster volume. All the
hosts are CentOS 6.6.
vdsm-4.16.10-8.gitc937927.el6
glusterfs-3.6.2-1.el6
2.6.32 - 504.8.1.el6.x86_64
Problems happen when I try to add a new brick or replace a brick
eventually the self heal will kill the VMs. In the VM's logs I see
kernel hung task messages.
Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for
more than 120 seconds.
Mar 12 23:05:16 static1 kernel: Not tainted
2.6.32-504.3.3.el6.x86_64 #1
Mar 12 23:05:16 static1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001
0 1736 1735 0x00000080
Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082
0000000000000000 00000000000126c0
Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080
0006ce5c85bd9185 ffff88007e5c64d0
Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba
ffff88007a615098 ffff8800778b1fd8
Mar 12 23:05:16 static1 kernel: Call Trace:
Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>]
schedule_timeout+0x215/0x2e0
Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>]
wait_for_common+0x123/0x180
Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ?
default_wake_function+0x0/0x20
Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ?
_xfs_buf_read+0x46/0x60 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ?
xfs_trans_read_buf+0x197/0x410 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>]
wait_for_completion+0x1d/0x20
Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>]
xfs_buf_iowait+0x9b/0x100 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ?
xfs_trans_read_buf+0x197/0x410 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>]
_xfs_buf_read+0x46/0x60 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>]
xfs_buf_read+0xab/0x100 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>]
xfs_trans_read_buf+0x197/0x410 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>]
xfs_imap_to_bp+0x54/0x130 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>]
xfs_iread+0x7b/0x1b0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ?
inode_init_always+0x11e/0x1c0
Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>]
xfs_iget+0x27e/0x6e0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ?
xfs_iunlock+0x5d/0xd0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>]
xfs_lookup+0xc6/0x110 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>]
xfs_vn_lookup+0x54/0xa0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>]
do_lookup+0x1a5/0x230
Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>]
__link_path_walk+0x7a4/0x1000
Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ?
cache_grow+0x217/0x320
Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>]
path_walk+0x6a/0xe0
Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>]
filename_lookup+0x6b/0xc0
Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>]
user_path_at+0x57/0xa0
Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ?
_xfs_trans_commit+0x214/0x2a0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ?
xfs_iunlock+0x7e/0xd0 [xfs]
Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>]
vfs_fstatat+0x50/0xa0
Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ?
touch_atime+0x14d/0x1a0
Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>]
vfs_stat+0x1b/0x20
Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>]
sys_newstat+0x24/0x50
Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ?
audit_syscall_entry+0x1d7/0x200
Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ?
__audit_syscall_exit+0x25e/0x290
Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b
I am wondering if my volume settings are causing this. Can anyone
with more knowledge take a look and let me know:
network.remote-dio: on
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.export-volumes: on
network.ping-timeout: 20
cluster.self-heal-readdir-size: 64KB
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: diff
cluster.self-heal-window-size: 8
cluster.heal-timeout: 500
cluster.self-heal-daemon: on
cluster.entry-self-heal: on
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.readdir-optimize: on
cluster.background-self-heal-count: 20
cluster.rebalance-stats: on
cluster.min-free-disk: 5%
cluster.eager-lock: enable
storage.owner-uid: 36
storage.owner-gid: 36
auth.allow:*
user.cifs: disable
cluster.server-quorum-ratio: 51%
Many Thanks, Alastair
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------070504030107060608030102
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
hi,<br>
Are you using thin-lvm based
backend on which the bricks are
created?<br>
<br>
Pranith<br>
<div class="moz-cite-prefix">On 03/18/2015 02:05 AM, Alastair Neil
wrote:<br>
</div>
<blockquote
cite="mid:%3CCA+SarwqOnPP-H-Ls9RHi4BDq-8yNpfMDawP1fGnLdyNALReZKA@mail.gmail.com%3E"
type="cite">
<div dir="ltr">I have a Ovirt cluster with 6 VM hosts and 4
gluster nodes. There are two virtualisation clusters one with
two nehelem nodes and one with four sandybridge nodes. My
master storage domain is a GlusterFS backed by a replica 3
gluster volume from 3 of the gluster nodes. The engine is a
hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage
broviede by nfs from a different gluster volume. All the hosts
are CentOS 6.6.
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> vdsm-4.16.10-8.gitc937927.el6<br>
glusterfs-3.6.2-1.el6<br>
2.6.32 - 504.8.1.el6.x86_64</blockquote>
<div><br>
</div>
<div>Problems happen when I try to add a new brick or replace a
brick eventually the self heal will kill the VMs. In the VM's
logs I see kernel hung task messages. </div>
<div><br>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font
face="monospace, monospace">Mar 12 23:05:16 static1
kernel: INFO: task nginx:1736 blocked for more than 120
seconds.<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: Not tainted
2.6.32-504.3.3.el6.x86_64
#1<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this
message.<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: nginx D
0000000000000001 0
1736 1735 0x00000080<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: ffff8800778b17a8 0000000000000082
0000000000000000 00000000000126c0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: ffff88007e5c6500 ffff880037170080
0006ce5c85bd9185 ffff88007e5c64d0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: ffff88007a614ae0 00000001722b64ba
ffff88007a615098 ffff8800778b1fd8<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: Call Trace:<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8152a885>]
schedule_timeout+0x215/0x2e0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8152a503>]
wait_for_common+0x123/0x180<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff81064b90>] ?
default_wake_function+0x0/0x20<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0210a76>] ?
_xfs_buf_read+0x46/0x60 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa02063c7>] ?
xfs_trans_read_buf+0x197/0x410 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8152a61d>]
wait_for_completion+0x1d/0x20<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa020ff5b>]
xfs_buf_iowait+0x9b/0x100 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa02063c7>] ?
xfs_trans_read_buf+0x197/0x410 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0210a76>]
_xfs_buf_read+0x46/0x60 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0210b3b>]
xfs_buf_read+0xab/0x100 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa02063c7>]
xfs_trans_read_buf+0x197/0x410 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa01ee6a4>]
xfs_imap_to_bp+0x54/0x130 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa01f077b>]
xfs_iread+0x7b/0x1b0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff811ab77e>] ?
inode_init_always+0x11e/0x1c0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa01eb5ee>]
xfs_iget+0x27e/0x6e0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa01eae1d>] ?
xfs_iunlock+0x5d/0xd0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0209366>]
xfs_lookup+0xc6/0x110 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0216024>]
xfs_vn_lookup+0x54/0xa0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8119dc65>]
do_lookup+0x1a5/0x230<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8119e8f4>]
__link_path_walk+0x7a4/0x1000<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff811738e7>] ?
cache_grow+0x217/0x320<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8119f40a>]
path_walk+0x6a/0xe0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8119f61b>]
filename_lookup+0x6b/0xc0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff811a0747>]
user_path_at+0x57/0xa0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa0204e74>] ?
_xfs_trans_commit+0x214/0x2a0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffffa01eae3e>] ?
xfs_iunlock+0x7e/0xd0 [xfs]<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff81193bc0>]
vfs_fstatat+0x50/0xa0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff811aaf5d>] ?
touch_atime+0x14d/0x1a0<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff81193d3b>]
vfs_stat+0x1b/0x20<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff81193d64>]
sys_newstat+0x24/0x50<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff810e5c87>] ?
audit_syscall_entry+0x1d7/0x200<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff810e5a7e>] ?
__audit_syscall_exit+0x25e/0x290<br>
</font><font face="monospace, monospace">Mar 12
23:05:16
static1 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b</font></blockquote>
</div>
<div><br>
</div>
<div><br>
</div>
<div>I am wondering if my volume settings are causing this. Can
anyone with more knowledge take a look and let me know:</div>
<div><br>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font
face="monospace, monospace">network.remote-dio: on<br>
</font><font face="monospace,
monospace">performance.stat-prefetch:
off<br>
</font><font face="monospace,
monospace">performance.io-cache:
off<br>
</font><font face="monospace,
monospace">performance.read-ahead:
off<br>
</font><font face="monospace,
monospace">performance.quick-read:
off<br>
</font><font face="monospace,
monospace">nfs.export-volumes:
on<br>
</font><font face="monospace,
monospace">network.ping-timeout:
20<br>
</font><font face="monospace,
monospace">cluster.self-heal-readdir-size:
64KB<br>
</font><font face="monospace,
monospace">cluster.quorum-type:
auto<br>
</font><font face="monospace,
monospace">cluster.data-self-heal-algorithm:
diff<br>
</font><font face="monospace,
monospace">cluster.self-heal-window-size:
8<br>
</font><font face="monospace,
monospace">cluster.heal-timeout:
500<br>
</font><font face="monospace,
monospace">cluster.self-heal-daemon:
on<br>
</font><font face="monospace,
monospace">cluster.entry-self-heal:
on<br>
</font><font face="monospace,
monospace">cluster.data-self-heal:
on<br>
</font><font face="monospace,
monospace">cluster.metadata-self-heal:
on<br>
</font><font face="monospace,
monospace">cluster.readdir-optimize:
on<br>
</font><font face="monospace,
monospace">cluster.background-self-heal-count:
20<br>
</font><font face="monospace,
monospace">cluster.rebalance-stats:
on<br>
</font><font face="monospace,
monospace">cluster.min-free-disk:
5%<br>
</font><font face="monospace,
monospace">cluster.eager-lock:
enable<br>
</font><font face="monospace,
monospace">storage.owner-uid:
36<br>
</font><font face="monospace,
monospace">storage.owner-gid:
36<br>
</font><font face="monospace,
monospace">auth.allow:*<br>
</font><font face="monospace, monospace">user.cifs:
disable<br>
</font><font face="monospace,
monospace">cluster.server-quorum-ratio:
51%</font></blockquote>
</div>
<div><br>
</div>
<div>Many Thanks, Alastair</div>
<div><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
</body>
</html>
--------------070504030107060608030102--