Re: [ovirt-users] VMs freezing during heals

Pranith I have run a pretty straightforward test. I created a two brick 50 G replica volume with normal lvm bricks, and installed two servers, one centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate some file system activity and then made the volume replica 3. I saw no issues on the servers. Not clear if this is a sufficiently rigorous test and the Volume I have had issues on is a 3TB volume with about 2TB used. -Alastair On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech@gmail.com> wrote:
I don't think I have the resources to test it meaningfully. I have about 50 vms on my primary storage domain. I might be able to set up a small 50 GB volume and provision 2 or 3 vms running test loads but I'm not sure it would be comparable. I'll give it a try and let you know if I see similar behaviour.
On 19 March 2015 at 11:34, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
Without thinly provisioned lvm.
Pranith
On 03/19/2015 08:01 PM, Alastair Neil wrote:
do you mean raw partitions as bricks or simply with out thin provisioned lvm?
On 19 March 2015 at 00:32, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
Could you let me know if you see this problem without lvm as well?
Pranith
On 03/18/2015 08:25 PM, Alastair Neil wrote:
I am in the process of replacing the bricks with thinly provisioned lvs yes.
On 18 March 2015 at 09:35, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
hi, Are you using thin-lvm based backend on which the bricks are created?
Pranith
On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are two virtualisation clusters one with two nehelem nodes and one with four sandybridge nodes. My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes. The engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede by nfs from a different gluster volume. All the hosts are CentOS 6.6.
vdsm-4.16.10-8.gitc937927.el6
glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64
Problems happen when I try to add a new brick or replace a brick eventually the self heal will kill the VMs. In the VM's logs I see kernel hung task messages.
Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for
more than 120 seconds. Mar 12 23:05:16 static1 kernel: Not tainted 2.6.32-504.3.3.el6.x86_64 #1 Mar 12 23:05:16 static1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 0 1736 1735 0x00000080 Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 0000000000000000 00000000000126c0 Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 0006ce5c85bd9185 ffff88007e5c64d0 Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba ffff88007a615098 ffff8800778b1fd8 Mar 12 23:05:16 static1 kernel: Call Trace: Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] schedule_timeout+0x215/0x2e0 Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] wait_for_common+0x123/0x180 Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] wait_for_completion+0x1d/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] xfs_buf_iowait+0x9b/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] xfs_buf_read+0xab/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] xfs_imap_to_bp+0x54/0x130 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] xfs_iread+0x7b/0x1b0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? inode_init_always+0x11e/0x1c0 Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] xfs_iget+0x27e/0x6e0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? xfs_iunlock+0x5d/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] xfs_lookup+0xc6/0x110 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] xfs_vn_lookup+0x54/0xa0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] do_lookup+0x1a5/0x230 Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] __link_path_walk+0x7a4/0x1000 Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? cache_grow+0x217/0x320 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] path_walk+0x6a/0xe0 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] filename_lookup+0x6b/0xc0 Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] user_path_at+0x57/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? _xfs_trans_commit+0x214/0x2a0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? xfs_iunlock+0x7e/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] vfs_fstatat+0x50/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? touch_atime+0x14d/0x1a0 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] vfs_stat+0x1b/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] sys_newstat+0x24/0x50 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? audit_syscall_entry+0x1d7/0x200 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? __audit_syscall_exit+0x25e/0x290 Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
I am wondering if my volume settings are causing this. Can anyone with more knowledge take a look and let me know:
network.remote-dio: on
performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.export-volumes: on network.ping-timeout: 20 cluster.self-heal-readdir-size: 64KB cluster.quorum-type: auto cluster.data-self-heal-algorithm: diff cluster.self-heal-window-size: 8 cluster.heal-timeout: 500 cluster.self-heal-daemon: on cluster.entry-self-heal: on cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.background-self-heal-count: 20 cluster.rebalance-stats: on cluster.min-free-disk: 5% cluster.eager-lock: enable storage.owner-uid: 36 storage.owner-gid: 36 auth.allow:* user.cifs: disable cluster.server-quorum-ratio: 51%
Many Thanks, Alastair
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.tech@gmail.com> = wrote: =20 Pranith =20 I have run a pretty straightforward test. I created a two brick 50 G = replica volume with normal lvm bricks, and installed two servers, one = centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to = generate some file system activity and then made the volume replica 3. = I saw no issues on the servers. =20 =20 Not clear if this is a sufficiently rigorous test and the Volume I = have had issues on is a 3TB volume with about 2TB used. =20 -Alastair =20 =20 On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech@gmail.com = <mailto:ajneil.tech@gmail.com>> wrote: I don't think I have the resources to test it meaningfully. I have = about 50 vms on my primary storage domain. I might be able to set up a = small 50 GB volume and provision 2 or 3 vms running test loads but I'm = not sure it would be comparable. I'll give it a try and let you know if = I see similar behaviour. =20 On 19 March 2015 at 11:34, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: Without thinly provisioned lvm. =20 Pranith =20 On 03/19/2015 08:01 PM, Alastair Neil wrote:
do you mean raw partitions as bricks or simply with out thin =
=20 =20 =20 On 19 March 2015 at 00:32, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: Could you let me know if you see this problem without lvm as well? =20 Pranith =20 On 03/18/2015 08:25 PM, Alastair Neil wrote:
I am in the process of replacing the bricks with thinly provisioned = lvs yes. =20 =20 =20 On 18 March 2015 at 09:35, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: hi, Are you using thin-lvm based backend on which the bricks are = created? =20 Pranith =20 On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There = are two virtualisation clusters one with two nehelem nodes and one with = four sandybridge nodes. My master storage domain is a GlusterFS backed = by a replica 3 gluster volume from 3 of the gluster nodes. The engine = is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage = broviede by nfs from a different gluster volume. All the hosts are = CentOS 6.6. =20 vdsm-4.16.10-8.gitc937927.el6 glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64 =20 Problems happen when I try to add a new brick or replace a brick = eventually the self heal will kill the VMs. In the VM's logs I see = kernel hung task messages.=20 =20 Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for = more than 120 seconds. Mar 12 23:05:16 static1 kernel: Not tainted = 2.6.32-504.3.3.el6.x86_64 #1 Mar 12 23:05:16 static1 kernel: "echo 0 > = /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 = 0 1736 1735 0x00000080 Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 = 0000000000000000 00000000000126c0 Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 = 0006ce5c85bd9185 ffff88007e5c64d0 Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba = ffff88007a615098 ffff8800778b1fd8 Mar 12 23:05:16 static1 kernel: Call Trace: Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] = schedule_timeout+0x215/0x2e0 Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] = wait_for_common+0x123/0x180 Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? = default_wake_function+0x0/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? = _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] = wait_for_completion+0x1d/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] = xfs_buf_iowait+0x9b/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] = _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] = xfs_buf_read+0xab/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] = xfs_imap_to_bp+0x54/0x130 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] = xfs_iread+0x7b/0x1b0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? = inode_init_always+0x11e/0x1c0 Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] = xfs_iget+0x27e/0x6e0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? = xfs_iunlock+0x5d/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] = xfs_lookup+0xc6/0x110 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] = xfs_vn_lookup+0x54/0xa0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] = do_lookup+0x1a5/0x230 Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] = __link_path_walk+0x7a4/0x1000 Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? = cache_grow+0x217/0x320 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] =
--Apple-Mail=_0AF8638A-F586-43E9-B803-1068443CC555 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 What version of gluster are you running on these? I=E2=80=99ve seen high load during heals bounce my hosted engine around = due to overall system load, but never pause anything else. Cent 7 combo = storage/host systems, gluster 3.5.2. provisioned lvm? path_walk+0x6a/0xe0
Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] = filename_lookup+0x6b/0xc0 Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] = user_path_at+0x57/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? = _xfs_trans_commit+0x214/0x2a0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? = xfs_iunlock+0x7e/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] = vfs_fstatat+0x50/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? = touch_atime+0x14d/0x1a0 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] = vfs_stat+0x1b/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] = sys_newstat+0x24/0x50 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? = audit_syscall_entry+0x1d7/0x200 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? = __audit_syscall_exit+0x25e/0x290 Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] = system_call_fastpath+0x16/0x1b =20 =20 I am wondering if my volume settings are causing this. Can anyone = with more knowledge take a look and let me know: =20 network.remote-dio: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.export-volumes: on network.ping-timeout: 20 cluster.self-heal-readdir-size: 64KB cluster.quorum-type: auto cluster.data-self-heal-algorithm: diff cluster.self-heal-window-size: 8 cluster.heal-timeout: 500 cluster.self-heal-daemon: on cluster.entry-self-heal: on cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.background-self-heal-count: 20 cluster.rebalance-stats: on cluster.min-free-disk: 5% cluster.eager-lock: enable storage.owner-uid: 36 storage.owner-gid: 36 auth.allow:* user.cifs: disable cluster.server-quorum-ratio: 51% =20 Many Thanks, Alastair =20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20 =20 =20 =20 =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_0AF8638A-F586-43E9-B803-1068443CC555 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">What version of gluster are you running on these?<div = class=3D""><br class=3D""></div><div class=3D"">I=E2=80=99ve seen high = load during heals bounce my hosted engine around due to overall system = load, but never pause anything else. Cent 7 combo storage/host systems, = gluster 3.5.2.</div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Mar 20, 2015, at 9:57 AM, Alastair Neil <<a = href=3D"mailto:ajneil.tech@gmail.com" = class=3D"">ajneil.tech@gmail.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D"">Pranith<div class=3D""><br class=3D""></div><div class=3D"">I = have run a pretty straightforward test. I created a two brick 50 G = replica volume with normal lvm bricks, and installed two servers, one = centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to = generate some file system activity and then made the volume replica = 3. I saw no issues on the servers. </div><div = class=3D""><br class=3D""></div><div class=3D"">Not clear if this is a = sufficiently rigorous test and the Volume I have had issues on is a 3TB = volume with about 2TB used.</div><div class=3D""><br = class=3D""></div><div class=3D"">-Alastair</div><div class=3D""><br = class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On 19 March 2015 at 12:30, Alastair Neil <span = dir=3D"ltr" class=3D""><<a href=3D"mailto:ajneil.tech@gmail.com" = target=3D"_blank" class=3D"">ajneil.tech@gmail.com</a>></span> = wrote:<br class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 = 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D"">I don't think I have the resources to test it = meaningfully. I have about 50 vms on my primary storage = domain. I might be able to set up a small 50 GB volume and = provision 2 or 3 vms running test loads but I'm not sure it would be = comparable. I'll give it a try and let you know if I see similar = behaviour.</div><div class=3D""><div class=3D""><div = class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">On 19 = March 2015 at 11:34, Pranith Kumar Karampuri <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"> =20 =20 =20 <div text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""> Without thinly provisioned lvm.<span class=3D""><font = color=3D"#888888" class=3D""><br class=3D""> <br class=3D""> Pranith</font></span><div class=3D""><div class=3D""><br class=3D""> <div class=3D"">On 03/19/2015 08:01 PM, Alastair Neil wrote:<br class=3D""> </div> <blockquote type=3D"cite" class=3D""> <div dir=3D"ltr" class=3D"">do you mean raw partitions as bricks = or simply with out thin provisioned lvm? <div class=3D""><br class=3D""> </div> <div class=3D""><br class=3D""> </div> </div> <div class=3D"gmail_extra"><br class=3D""> <div class=3D"gmail_quote">On 19 March 2015 at 00:32, Pranith Kumar Karampuri <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""> Could = you let me know if you see this problem without lvm as well?<span = class=3D""><font color=3D"#888888" class=3D""><br class=3D""> <br class=3D""> Pranith</font></span> <div class=3D""> <div class=3D""><br class=3D""> <div class=3D"">On 03/18/2015 08:25 PM, Alastair Neil = wrote:<br class=3D""> </div> <blockquote type=3D"cite" class=3D""> <div dir=3D"ltr" class=3D"">I am in the process of = replacing the bricks with thinly provisioned lvs yes. <div class=3D""><br class=3D""> </div> <div class=3D""><br class=3D""> </div> </div> <div class=3D"gmail_extra"><br class=3D""> <div class=3D"gmail_quote">On 18 March 2015 at 09:35, Pranith Kumar Karampuri <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" = style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF" = class=3D""> hi,<br class=3D""> Are you using = thin-lvm based backend on which the bricks are created?<br = class=3D""> <br class=3D""> Pranith <div class=3D""> <div class=3D""><br class=3D""> <div class=3D"">On 03/18/2015 02:05 AM, = Alastair Neil wrote:<br class=3D""> </div> </div> </div> <blockquote type=3D"cite" class=3D""> <div class=3D""> <div class=3D""> <div dir=3D"ltr" class=3D"">I have a = Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are two virtualisation clusters one with two nehelem nodes and one with four = sandybridge nodes. My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes. The engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede by nfs from a different gluster = volume. All the hosts are CentOS 6.6. <div class=3D""><br class=3D""> </div> <blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"> vdsm-4.16.10-8.gitc937927.el6<br = class=3D""> glusterfs-3.6.2-1.el6<br class=3D"">= 2.6.32 - = 504.8.1.el6.x86_64</blockquote> <div class=3D""><br class=3D""> </div> <div class=3D"">Problems happen when = I try to add a new brick or replace a brick eventually the self heal will kill the VMs. In the VM's logs I see kernel hung task = messages. </div> <div class=3D""><br class=3D""> </div> <div class=3D""> <blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"><font face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for more than 120 seconds.<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: = Not tainted 2.6.32-504.3.3.el6.x86_64 = #1<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: "echo 0 > = /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: nginx = D 0000000000000001 = 0 1736 1735 0x00000080<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 0000000000000000 00000000000126c0<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 0006ce5c85bd9185 ffff88007e5c64d0<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba ffff88007a615098 ffff8800778b1fd8<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: Call Trace:<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8152a885>] = schedule_timeout+0x215/0x2e0<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8152a503>] wait_for_common+0x123/0x180<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff81064b90>] ? = default_wake_function+0x0/0x20<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? _xfs_buf_read+0x46/0x60 = [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8152a61d>] = wait_for_completion+0x1d/0x20<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa020ff5b>] xfs_buf_iowait+0x9b/0x100 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0210a76>] _xfs_buf_read+0x46/0x60 = [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0210b3b>] xfs_buf_read+0xab/0x100 = [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa02063c7>] xfs_trans_read_buf+0x197/0x410 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] xfs_imap_to_bp+0x54/0x130 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa01f077b>] xfs_iread+0x7b/0x1b0 [xfs]<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? = inode_init_always+0x11e/0x1c0<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] xfs_iget+0x27e/0x6e0 [xfs]<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? xfs_iunlock+0x5d/0xd0 [xfs]<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0209366>] xfs_lookup+0xc6/0x110 [xfs]<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0216024>] xfs_vn_lookup+0x54/0xa0 = [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8119dc65>] do_lookup+0x1a5/0x230<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8119e8f4>] = __link_path_walk+0x7a4/0x1000<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff811738e7>] ? cache_grow+0x217/0x320<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8119f40a>] path_walk+0x6a/0xe0<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8119f61b>] filename_lookup+0x6b/0xc0<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff811a0747>] user_path_at+0x57/0xa0<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? _xfs_trans_commit+0x214/0x2a0 [xfs]<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? xfs_iunlock+0x7e/0xd0 [xfs]<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff81193bc0>] vfs_fstatat+0x50/0xa0<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? touch_atime+0x14d/0x1a0<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff81193d3b>] vfs_stat+0x1b/0x20<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff81193d64>] sys_newstat+0x24/0x50<br = class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? = audit_syscall_entry+0x1d7/0x200<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? = __audit_syscall_exit+0x25e/0x290<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">Mar 12 = 23:05:16 static1 kernel: [<ffffffff8100b072>] = system_call_fastpath+0x16/0x1b</font></blockquote> </div> <div class=3D""><br class=3D""> </div> <div class=3D""><br class=3D""> </div> <div class=3D"">I am wondering if my = volume settings are causing this. = Can anyone with more knowledge take a look and let me know:</div> <div class=3D""><br class=3D""> </div> <div class=3D""> <blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"><font face=3D"monospace, monospace" = class=3D"">network.remote-dio: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">performance.stat-prefetch: off<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">performance.io-cache: off<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">performance.read-ahead: off<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">performance.quick-read: off<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">nfs.export-volumes: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">network.ping-timeout: 20<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.self-heal-readdir-size: 64KB<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.quorum-type: auto<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.data-self-heal-algorithm: diff<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.self-heal-window-size: 8<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.heal-timeout: 500<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.self-heal-daemon: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.entry-self-heal: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.data-self-heal: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.metadata-self-heal: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.readdir-optimize: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.background-self-heal-count: 20<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.rebalance-stats: on<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.min-free-disk: 5%<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.eager-lock: enable<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">storage.owner-uid: 36<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">storage.owner-gid: 36<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">auth.allow:*<br class=3D""> </font><font face=3D"monospace, monospace" class=3D"">user.cifs:= disable<br class=3D""> </font><font face=3D"monospace, monospace" = class=3D"">cluster.server-quorum-ratio: 51%</font></blockquote> </div> <div class=3D""><br class=3D""> </div> <div class=3D"">Many Thanks, = Alastair</div> <div class=3D""><br class=3D""> </div> </div> <br class=3D""> <fieldset class=3D""></fieldset> <br class=3D""> </div> </div> <pre = class=3D"">_______________________________________________ Users mailing list <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br class=3D""> </div> <br class=3D""> _______________________________________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" = target=3D"_blank" class=3D"">Users@ovirt.org</a><br class=3D""> <a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> <br class=3D""> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </div></div></div> </blockquote></div><br class=3D""></div> </div></div></blockquote></div><br class=3D""></div></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_0AF8638A-F586-43E9-B803-1068443CC555--

CentOS 6.6
vdsm-4.16.10-8.gitc937927.el6 glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64
moved to 3.6 specifically to get the snapshotting feature, hence my desire to migrate to thinly provisioned lvm bricks. On 20 March 2015 at 14:57, Darrell Budic <budic@onholyground.com> wrote:
What version of gluster are you running on these?
I’ve seen high load during heals bounce my hosted engine around due to overall system load, but never pause anything else. Cent 7 combo storage/host systems, gluster 3.5.2.
On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.tech@gmail.com> wrote:
Pranith
I have run a pretty straightforward test. I created a two brick 50 G replica volume with normal lvm bricks, and installed two servers, one centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate some file system activity and then made the volume replica 3. I saw no issues on the servers.
Not clear if this is a sufficiently rigorous test and the Volume I have had issues on is a 3TB volume with about 2TB used.
-Alastair
On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech@gmail.com> wrote:
I don't think I have the resources to test it meaningfully. I have about 50 vms on my primary storage domain. I might be able to set up a small 50 GB volume and provision 2 or 3 vms running test loads but I'm not sure it would be comparable. I'll give it a try and let you know if I see similar behaviour.
On 19 March 2015 at 11:34, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
Without thinly provisioned lvm.
Pranith
On 03/19/2015 08:01 PM, Alastair Neil wrote:
do you mean raw partitions as bricks or simply with out thin provisioned lvm?
On 19 March 2015 at 00:32, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
Could you let me know if you see this problem without lvm as well?
Pranith
On 03/18/2015 08:25 PM, Alastair Neil wrote:
I am in the process of replacing the bricks with thinly provisioned lvs yes.
On 18 March 2015 at 09:35, Pranith Kumar Karampuri <pkarampu@redhat.com
wrote:
hi, Are you using thin-lvm based backend on which the bricks are created?
Pranith
On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are two virtualisation clusters one with two nehelem nodes and one with four sandybridge nodes. My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes. The engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede by nfs from a different gluster volume. All the hosts are CentOS 6.6.
vdsm-4.16.10-8.gitc937927.el6
glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64
Problems happen when I try to add a new brick or replace a brick eventually the self heal will kill the VMs. In the VM's logs I see kernel hung task messages.
Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for
more than 120 seconds. Mar 12 23:05:16 static1 kernel: Not tainted 2.6.32-504.3.3.el6.x86_64 #1 Mar 12 23:05:16 static1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 0 1736 1735 0x00000080 Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 0000000000000000 00000000000126c0 Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 0006ce5c85bd9185 ffff88007e5c64d0 Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba ffff88007a615098 ffff8800778b1fd8 Mar 12 23:05:16 static1 kernel: Call Trace: Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] schedule_timeout+0x215/0x2e0 Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] wait_for_common+0x123/0x180 Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] wait_for_completion+0x1d/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] xfs_buf_iowait+0x9b/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] xfs_buf_read+0xab/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] xfs_imap_to_bp+0x54/0x130 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] xfs_iread+0x7b/0x1b0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? inode_init_always+0x11e/0x1c0 Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] xfs_iget+0x27e/0x6e0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? xfs_iunlock+0x5d/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] xfs_lookup+0xc6/0x110 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] xfs_vn_lookup+0x54/0xa0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] do_lookup+0x1a5/0x230 Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] __link_path_walk+0x7a4/0x1000 Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? cache_grow+0x217/0x320 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] path_walk+0x6a/0xe0 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] filename_lookup+0x6b/0xc0 Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] user_path_at+0x57/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? _xfs_trans_commit+0x214/0x2a0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? xfs_iunlock+0x7e/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] vfs_fstatat+0x50/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? touch_atime+0x14d/0x1a0 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] vfs_stat+0x1b/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] sys_newstat+0x24/0x50 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? audit_syscall_entry+0x1d7/0x200 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? __audit_syscall_exit+0x25e/0x290 Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
I am wondering if my volume settings are causing this. Can anyone with more knowledge take a look and let me know:
network.remote-dio: on
performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.export-volumes: on network.ping-timeout: 20 cluster.self-heal-readdir-size: 64KB cluster.quorum-type: auto cluster.data-self-heal-algorithm: diff cluster.self-heal-window-size: 8 cluster.heal-timeout: 500 cluster.self-heal-daemon: on cluster.entry-self-heal: on cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.background-self-heal-count: 20 cluster.rebalance-stats: on cluster.min-free-disk: 5% cluster.eager-lock: enable storage.owner-uid: 36 storage.owner-gid: 36 auth.allow:* user.cifs: disable cluster.server-quorum-ratio: 51%
Many Thanks, Alastair
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Any follow up on this? Are there known issues using a replica 3 glsuter datastore with lvm thin provisioned bricks? On 20 March 2015 at 15:22, Alastair Neil <ajneil.tech@gmail.com> wrote:
CentOS 6.6
vdsm-4.16.10-8.gitc937927.el6 glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64
moved to 3.6 specifically to get the snapshotting feature, hence my desire to migrate to thinly provisioned lvm bricks.
On 20 March 2015 at 14:57, Darrell Budic <budic@onholyground.com> wrote:
What version of gluster are you running on these?
I’ve seen high load during heals bounce my hosted engine around due to overall system load, but never pause anything else. Cent 7 combo storage/host systems, gluster 3.5.2.
On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.tech@gmail.com> wrote:
Pranith
I have run a pretty straightforward test. I created a two brick 50 G replica volume with normal lvm bricks, and installed two servers, one centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate some file system activity and then made the volume replica 3. I saw no issues on the servers.
Not clear if this is a sufficiently rigorous test and the Volume I have had issues on is a 3TB volume with about 2TB used.
-Alastair
On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech@gmail.com> wrote:
I don't think I have the resources to test it meaningfully. I have about 50 vms on my primary storage domain. I might be able to set up a small 50 GB volume and provision 2 or 3 vms running test loads but I'm not sure it would be comparable. I'll give it a try and let you know if I see similar behaviour.
On 19 March 2015 at 11:34, Pranith Kumar Karampuri <pkarampu@redhat.com> wrote:
Without thinly provisioned lvm.
Pranith
On 03/19/2015 08:01 PM, Alastair Neil wrote:
do you mean raw partitions as bricks or simply with out thin provisioned lvm?
On 19 March 2015 at 00:32, Pranith Kumar Karampuri <pkarampu@redhat.com
wrote:
Could you let me know if you see this problem without lvm as well?
Pranith
On 03/18/2015 08:25 PM, Alastair Neil wrote:
I am in the process of replacing the bricks with thinly provisioned lvs yes.
On 18 March 2015 at 09:35, Pranith Kumar Karampuri < pkarampu@redhat.com> wrote:
hi, Are you using thin-lvm based backend on which the bricks are created?
Pranith
On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are two virtualisation clusters one with two nehelem nodes and one with four sandybridge nodes. My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes. The engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede by nfs from a different gluster volume. All the hosts are CentOS 6.6.
vdsm-4.16.10-8.gitc937927.el6 > glusterfs-3.6.2-1.el6 > 2.6.32 - 504.8.1.el6.x86_64
Problems happen when I try to add a new brick or replace a brick eventually the self heal will kill the VMs. In the VM's logs I see kernel hung task messages.
Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for > more than 120 seconds. > Mar 12 23:05:16 static1 kernel: Not tainted > 2.6.32-504.3.3.el6.x86_64 #1 > Mar 12 23:05:16 static1 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 > 0 1736 1735 0x00000080 > Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 > 0000000000000000 00000000000126c0 > Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 > 0006ce5c85bd9185 ffff88007e5c64d0 > Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba > ffff88007a615098 ffff8800778b1fd8 > Mar 12 23:05:16 static1 kernel: Call Trace: > Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] > schedule_timeout+0x215/0x2e0 > Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] > wait_for_common+0x123/0x180 > Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? > default_wake_function+0x0/0x20 > Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? > _xfs_buf_read+0x46/0x60 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? > xfs_trans_read_buf+0x197/0x410 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] > wait_for_completion+0x1d/0x20 > Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] > xfs_buf_iowait+0x9b/0x100 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? > xfs_trans_read_buf+0x197/0x410 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] > _xfs_buf_read+0x46/0x60 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] > xfs_buf_read+0xab/0x100 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] > xfs_trans_read_buf+0x197/0x410 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] > xfs_imap_to_bp+0x54/0x130 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] > xfs_iread+0x7b/0x1b0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? > inode_init_always+0x11e/0x1c0 > Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] > xfs_iget+0x27e/0x6e0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? > xfs_iunlock+0x5d/0xd0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] > xfs_lookup+0xc6/0x110 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] > xfs_vn_lookup+0x54/0xa0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] > do_lookup+0x1a5/0x230 > Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] > __link_path_walk+0x7a4/0x1000 > Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? > cache_grow+0x217/0x320 > Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] > path_walk+0x6a/0xe0 > Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] > filename_lookup+0x6b/0xc0 > Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] > user_path_at+0x57/0xa0 > Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? > _xfs_trans_commit+0x214/0x2a0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? > xfs_iunlock+0x7e/0xd0 [xfs] > Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] > vfs_fstatat+0x50/0xa0 > Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? > touch_atime+0x14d/0x1a0 > Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] > vfs_stat+0x1b/0x20 > Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] > sys_newstat+0x24/0x50 > Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? > audit_syscall_entry+0x1d7/0x200 > Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? > __audit_syscall_exit+0x25e/0x290 > Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b
I am wondering if my volume settings are causing this. Can anyone with more knowledge take a look and let me know:
network.remote-dio: on > performance.stat-prefetch: off > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > nfs.export-volumes: on > network.ping-timeout: 20 > cluster.self-heal-readdir-size: 64KB > cluster.quorum-type: auto > cluster.data-self-heal-algorithm: diff > cluster.self-heal-window-size: 8 > cluster.heal-timeout: 500 > cluster.self-heal-daemon: on > cluster.entry-self-heal: on > cluster.data-self-heal: on > cluster.metadata-self-heal: on > cluster.readdir-optimize: on > cluster.background-self-heal-count: 20 > cluster.rebalance-stats: on > cluster.min-free-disk: 5% > cluster.eager-lock: enable > storage.owner-uid: 36 > storage.owner-gid: 36 > auth.allow:* > user.cifs: disable > cluster.server-quorum-ratio: 51%
Many Thanks, Alastair
_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------030308050700080206070001 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable On 04/03/2015 10=3A04 PM=2C Alastair Neil wrote=3A =3E Any follow up on this=3F =3E =3E Are there known issues using a replica 3 glsuter datastore with lvm =3E thin provisioned bricks=3F =3E =3E On 20 March 2015 at 15=3A22=2C Alastair Neil =3Cajneil=2Etech=40gmail= =2Ecom =3E =3Cmailto=3Aajneil=2Etech=40gmail=2Ecom=3E=3E wrote=3A =3E =3E CentOS 6=2E6 =3E =20 =3E =3E vdsm-4=2E16=2E10-8=2Egitc937927=2Eel6 =3E glusterfs-3=2E6=2E2-1=2Eel6 =3E 2=2E6=2E32 - 504=2E8=2E1=2Eel6=2Ex86=5F64 =3E =3E =3E moved to 3=2E6 specifically to get the snapshotting feature=2C henc= e =3E my desire to migrate to thinly provisioned lvm bricks=2E =3E Well on the glusterfs mailinglist there have been discussions=3A =3E 3=2E6=2E2 is a major release and introduces some new features in cluste= r =3E wide concept=2E Additionally it is not stable yet=2E =3E =3E =3E On 20 March 2015 at 14=3A57=2C Darrell Budic =3Cbudic=40onholygroun= d=2Ecom =3E =3Cmailto=3Abudic=40onholyground=2Ecom=3E=3E wrote=3A =3E =3E What version of gluster are you running on these=3F =3E =3E I=92ve seen high load during heals bounce my hosted engine =3E around due to overall system load=2C but never pause anything= =3E else=2E Cent 7 combo storage/host systems=2C gluster 3=2E5=2E2= =2E =3E =3E =3E=3E On Mar 20=2C 2015=2C at 9=3A57 AM=2C Alastair Neil =3E=3E =3Cajneil=2Etech=40gmail=2Ecom =3Cmailto=3Aajneil=2Etech=40g= mail=2Ecom=3E=3E wrote=3A =3E=3E =3E=3E Pranith =3E=3E =3E=3E I have run a pretty straightforward test=2E I created a two= =3E=3E brick 50 G replica volume with normal lvm bricks=2C and =3E=3E installed two servers=2C one centos 6=2E6 and one centos 7= =2E0=2E I =3E=3E kicked off bonnie++ on both to generate some file system =3E=3E activity and then made the volume replica 3=2E I saw no iss= ues =3E=3E on the servers=2E =20 =3E=3E =3E=3E Not clear if this is a sufficiently rigorous test and the =3E=3E Volume I have had issues on is a 3TB volume with about 2TB= used=2E =3E=3E =3E=3E -Alastair =3E=3E =3E=3E =3E=3E On 19 March 2015 at 12=3A30=2C Alastair Neil =3E=3E =3Cajneil=2Etech=40gmail=2Ecom =3Cmailto=3Aajneil=2Etech=40g= mail=2Ecom=3E=3E wrote=3A =3E=3E =3E=3E I don=27t think I have the resources to test it =3E=3E meaningfully=2E I have about 50 vms on my primary stora= ge =3E=3E domain=2E I might be able to set up a small 50 GB volum= e =3E=3E and provision 2 or 3 vms running test loads but I=27m no= t =3E=3E sure it would be comparable=2E I=27ll give it a try and= let =3E=3E you know if I see similar behaviour=2E =3E=3E =3E=3E On 19 March 2015 at 11=3A34=2C Pranith Kumar Karampuri= =3E=3E =3Cpkarampu=40redhat=2Ecom =3Cmailto=3Apkarampu=40redhat= =2Ecom=3E=3E wrote=3A =3E=3E =3E=3E Without thinly provisioned lvm=2E =3E=3E =3E=3E Pranith =3E=3E =3E=3E On 03/19/2015 08=3A01 PM=2C Alastair Neil wrote=3A= =3E=3E=3E do you mean raw partitions as bricks or simply wi= th =3E=3E=3E out thin provisioned lvm=3F =3E=3E=3E =3E=3E=3E =3E=3E=3E =3E=3E=3E On 19 March 2015 at 00=3A32=2C Pranith Kumar Kara= mpuri =3E=3E=3E =3Cpkarampu=40redhat=2Ecom =3Cmailto=3Apkarampu= =40redhat=2Ecom=3E=3E =3E=3E=3E wrote=3A =3E=3E=3E =3E=3E=3E Could you let me know if you see this problem= =3E=3E=3E without lvm as well=3F =3E=3E=3E =3E=3E=3E Pranith =3E=3E=3E =3E=3E=3E On 03/18/2015 08=3A25 PM=2C Alastair Neil wro= te=3A =3E=3E=3E=3E I am in the process of replacing the brick= s =3E=3E=3E=3E with thinly provisioned lvs yes=2E =3E=3E=3E=3E =3E=3E=3E=3E =3E=3E=3E=3E =3E=3E=3E=3E On 18 March 2015 at 09=3A35=2C Pranith Kum= ar =3E=3E=3E=3E Karampuri =3Cpkarampu=40redhat=2Ecom =3E=3E=3E=3E =3Cmailto=3Apkarampu=40redhat=2Ecom=3E=3E= wrote=3A =3E=3E=3E=3E =3E=3E=3E=3E hi=2C =3E=3E=3E=3E Are you using thin-lvm based bac= kend =3E=3E=3E=3E on which the bricks are created=3F =3E=3E=3E=3E =3E=3E=3E=3E Pranith =3E=3E=3E=3E =3E=3E=3E=3E On 03/18/2015 02=3A05 AM=2C Alastair N= eil wrote=3A =3E=3E=3E=3E=3E I have a Ovirt cluster with 6 VM ho= sts and =3E=3E=3E=3E=3E 4 gluster nodes=2E There are two =3E=3E=3E=3E=3E virtualisation clusters one with tw= o =3E=3E=3E=3E=3E nehelem nodes and one with four =3E=3E=3E=3E=3E sandybridge nodes=2E My master sto= rage =3E=3E=3E=3E=3E domain is a GlusterFS backed by a r= eplica =3E=3E=3E=3E=3E 3 gluster volume from 3 of the glus= ter =3E=3E=3E=3E=3E nodes=2E The engine is a hosted en= gine =3E=3E=3E=3E=3E 3=2E5=2E1 on 3 of the sandybridge n= odes=2C with =3E=3E=3E=3E=3E storage broviede by nfs from a diff= erent =3E=3E=3E=3E=3E gluster volume=2E All the hosts ar= e CentOS =3E=3E=3E=3E=3E 6=2E6=2E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E vdsm-4=2E16=2E10-8=2Egitc93792= 7=2Eel6 =3E=3E=3E=3E=3E glusterfs-3=2E6=2E2-1=2Eel6 =3E=3E=3E=3E=3E 2=2E6=2E32 - 504=2E8=2E1=2Eel6= =2Ex86=5F64 =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E Problems happen when I try to add a= new =3E=3E=3E=3E=3E brick or replace a brick eventually= the =3E=3E=3E=3E=3E self heal will kill the VMs=2E In t= he VM=27s =3E=3E=3E=3E=3E logs I see kernel hung task message= s=2E=20 =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A INFO=3A =3E=3E=3E=3E=3E task nginx=3A1736 blocked for m= ore than =3E=3E=3E=3E=3E 120 seconds=2E =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =20 =3E=3E=3E=3E=3E Not tainted 2=2E6=2E32-504=2E3= =2E3=2Eel6=2Ex86=5F64 =231 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =22echo =3E=3E=3E=3E=3E 0 =3E =3E=3E=3E=3E=3E /proc/sys/kernel/hung=5Ftask=5F= timeout=5Fsecs=22 =3E=3E=3E=3E=3E disables this message=2E =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A nginx =3E=3E=3E=3E=3E D 0000000000000001 =20= 0 1736 =3E=3E=3E=3E=3E 1735 0x00000080 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E ffff8800778b17a8 00000000000000= 82 =3E=3E=3E=3E=3E 0000000000000000 00000000000126= c0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E ffff88007e5c6500 ffff8800371700= 80 =3E=3E=3E=3E=3E 0006ce5c85bd9185 ffff88007e5c64= d0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E ffff88007a614ae0 00000001722b64= ba =3E=3E=3E=3E=3E ffff88007a615098 ffff8800778b1f= d8 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A Call =3E=3E=3E=3E=3E Trace=3A =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8152a885=3E=5D =3E=3E=3E=3E=3E schedule=5Ftimeout+0x215/0x2e0= =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8152a503=3E=5D =3E=3E=3E=3E=3E wait=5Ffor=5Fcommon+0x123/0x180= =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff81064b90=3E=5D=20= =3F =3E=3E=3E=3E=3E default=5Fwake=5Ffunction+0x0/0= x20 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0210a76=3E=5D=20= =3F =3E=3E=3E=3E=3E =5Fxfs=5Fbuf=5Fread+0x46/0x60= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa02063c7=3E=5D=20= =3F =3E=3E=3E=3E=3E xfs=5Ftrans=5Fread=5Fbuf+0x197/= 0x410 =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8152a61d=3E=5D =3E=3E=3E=3E=3E wait=5Ffor=5Fcompletion+0x1d/0x= 20 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa020ff5b=3E=5D =3E=3E=3E=3E=3E xfs=5Fbuf=5Fiowait+0x9b/0x100= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa02063c7=3E=5D=20= =3F =3E=3E=3E=3E=3E xfs=5Ftrans=5Fread=5Fbuf+0x197/= 0x410 =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0210a76=3E=5D =3E=3E=3E=3E=3E =5Fxfs=5Fbuf=5Fread+0x46/0x60= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0210b3b=3E=5D =3E=3E=3E=3E=3E xfs=5Fbuf=5Fread+0xab/0x100=20= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa02063c7=3E=5D =3E=3E=3E=3E=3E xfs=5Ftrans=5Fread=5Fbuf+0x197/= 0x410 =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa01ee6a4=3E=5D =3E=3E=3E=3E=3E xfs=5Fimap=5Fto=5Fbp+0x54/0x130= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa01f077b=3E=5D =3E=3E=3E=3E=3E xfs=5Firead+0x7b/0x1b0 =5Bxfs= =5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff811ab77e=3E=5D=20= =3F =3E=3E=3E=3E=3E inode=5Finit=5Falways+0x11e/0x1= c0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa01eb5ee=3E=5D =3E=3E=3E=3E=3E xfs=5Figet+0x27e/0x6e0 =5Bxfs= =5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa01eae1d=3E=5D=20= =3F =3E=3E=3E=3E=3E xfs=5Fiunlock+0x5d/0xd0 =5Bxfs= =5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0209366=3E=5D =3E=3E=3E=3E=3E xfs=5Flookup+0xc6/0x110 =5Bxfs= =5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0216024=3E=5D =3E=3E=3E=3E=3E xfs=5Fvn=5Flookup+0x54/0xa0=20= =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8119dc65=3E=5D do= =5Flookup+0x1a5/0x230 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8119e8f4=3E=5D =3E=3E=3E=3E=3E =5F=5Flink=5Fpath=5Fwalk+0x7a4/= 0x1000 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff811738e7=3E=5D=20= =3F =3E=3E=3E=3E=3E cache=5Fgrow+0x217/0x320 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8119f40a=3E=5D pa= th=5Fwalk+0x6a/0xe0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8119f61b=3E=5D =3E=3E=3E=3E=3E filename=5Flookup+0x6b/0xc0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff811a0747=3E=5D =3E=3E=3E=3E=3E user=5Fpath=5Fat+0x57/0xa0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa0204e74=3E=5D=20= =3F =3E=3E=3E=3E=3E =5Fxfs=5Ftrans=5Fcommit+0x214/0= x2a0 =5Bxfs=5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffffa01eae3e=3E=5D=20= =3F =3E=3E=3E=3E=3E xfs=5Fiunlock+0x7e/0xd0 =5Bxfs= =5D =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff81193bc0=3E=5D vf= s=5Ffstatat+0x50/0xa0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff811aaf5d=3E=5D=20= =3F =3E=3E=3E=3E=3E touch=5Fatime+0x14d/0x1a0 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff81193d3b=3E=5D vf= s=5Fstat+0x1b/0x20 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff81193d64=3E=5D sy= s=5Fnewstat+0x24/0x50 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff810e5c87=3E=5D=20= =3F =3E=3E=3E=3E=3E audit=5Fsyscall=5Fentry+0x1d7/0= x200 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff810e5a7e=3E=5D=20= =3F =3E=3E=3E=3E=3E =5F=5Faudit=5Fsyscall=5Fexit+0x= 25e/0x290 =3E=3E=3E=3E=3E Mar 12 23=3A05=3A16 static1 ker= nel=3A =3E=3E=3E=3E=3E =5B=3Cffffffff8100b072=3E=5D =3E=3E=3E=3E=3E system=5Fcall=5Ffastpath+0x16/0= x1b =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E I am wondering if my volume setting= s are =3E=3E=3E=3E=3E causing this=2E Can anyone with mo= re =3E=3E=3E=3E=3E knowledge take a look and let me kn= ow=3A =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E network=2Eremote-dio=3A on =3E=3E=3E=3E=3E performance=2Estat-prefetch=3A= off =3E=3E=3E=3E=3E performance=2Eio-cache=3A off= =3E=3E=3E=3E=3E performance=2Eread-ahead=3A off= =3E=3E=3E=3E=3E performance=2Equick-read=3A off= =3E=3E=3E=3E=3E nfs=2Eexport-volumes=3A on =3E=3E=3E=3E=3E network=2Eping-timeout=3A 20 =3E=3E=3E=3E=3E cluster=2Eself-heal-readdir-siz= e=3A 64KB =3E=3E=3E=3E=3E cluster=2Equorum-type=3A auto= =3E=3E=3E=3E=3E cluster=2Edata-self-heal-algori= thm=3A diff =3E=3E=3E=3E=3E cluster=2Eself-heal-window-size= =3A 8 =3E=3E=3E=3E=3E cluster=2Eheal-timeout=3A 500= =3E=3E=3E=3E=3E cluster=2Eself-heal-daemon=3A o= n =3E=3E=3E=3E=3E cluster=2Eentry-self-heal=3A on= =3E=3E=3E=3E=3E cluster=2Edata-self-heal=3A on= =3E=3E=3E=3E=3E cluster=2Emetadata-self-heal=3A= on =3E=3E=3E=3E=3E cluster=2Ereaddir-optimize=3A o= n =3E=3E=3E=3E=3E cluster=2Ebackground-self-heal-= count=3A 20 =3E=3E=3E=3E=3E cluster=2Erebalance-stats=3A on= =3E=3E=3E=3E=3E cluster=2Emin-free-disk=3A 5=25= =3E=3E=3E=3E=3E cluster=2Eeager-lock=3A enable= =3E=3E=3E=3E=3E storage=2Eowner-uid=3A 36 =3E=3E=3E=3E=3E storage=2Eowner-gid=3A 36 =3E=3E=3E=3E=3E auth=2Eallow=3A* =3E=3E=3E=3E=3E user=2Ecifs=3A disable =3E=3E=3E=3E=3E cluster=2Eserver-quorum-ratio= =3A 51=25 =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E Many Thanks=2C Alastair =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =3E=3E=3E=3E=3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F =3E=3E=3E=3E=3E Users mailing list =3E=3E=3E=3E=3E Users=40ovirt=2Eorg =3Cmailto=3AUse= rs=40ovirt=2Eorg=3E =3E=3E=3E=3E=3E http=3A//lists=2Eovirt=2Eorg/mailma= n/listinfo/users =3E=3E=3E=3E =3E=3E=3E=3E =3E=3E=3E=3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F =3E=3E=3E=3E Users mailing list =3E=3E=3E=3E Users=40ovirt=2Eorg =3Cmailto=3AUsers= =40ovirt=2Eorg=3E =3E=3E=3E=3E http=3A//lists=2Eovirt=2Eorg/mailman/l= istinfo/users =3E=3E=3E=3E =3E=3E=3E=3E =3E=3E=3E =3E=3E=3E =3E=3E =3E=3E =3E=3E =3E=3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F =3E=3E Users mailing list =3E=3E Users=40ovirt=2Eorg =3Cmailto=3AUsers=40ovirt=2Eorg=3E =3E=3E http=3A//lists=2Eovirt=2Eorg/mailman/listinfo/users =3E =3E =3E =3E =3E =3E =3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F =3E Users mailing list =3E Users=40ovirt=2Eorg =3E http=3A//lists=2Eovirt=2Eorg/mailman/listinfo/users Met vriendelijke groet=2C With kind regards=2C Jorick Astrego Netbulae Virtualization Experts=20 ---------------- =09Tel=3A 053 20 30 270 =09info=40netbulae=2Eeu =09Staalsteden 4-3A =09KvK= 08198180 =09Fax=3A 053 20 30 271 =09www=2Enetbulae=2Eeu =097547 TA Enschede =09BTW= NL821234584B01 ---------------- --------------030308050700080206070001 Content-Type: text/html; charset="windows-1252" Content-Transfer-Encoding: quoted-printable =3Chtml=3E =3Chead=3E =3Cmeta content=3D=22text/html=3B charset=3Dwindows-1252=22 http-equiv=3D=22Content-Type=22=3E =3C/head=3E =3Cbody bgcolor=3D=22=23FFFFFF=22 text=3D=22=23000000=22=3E =3Cbr=3E =3Cbr=3E =3Cdiv class=3D=22moz-cite-prefix=22=3EOn 04/03/2015 10=3A04 PM=2C Alas= tair Neil wrote=3A=3Cbr=3E =3C/div=3E =3Cblockquote cite=3D=22mid=3ACA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w=40mail= =2Egmail=2Ecom=22 type=3D=22cite=22=3E =3Cdiv dir=3D=22ltr=22=3EAny follow up on this=3F =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3E=A0Are there known issues using a replica 3 glsuter datast= ore with lvm thin provisioned bricks=3F=3C/div=3E =3C/div=3E =3Cdiv class=3D=22gmail=5Fextra=22=3E=3Cbr=3E =3Cdiv class=3D=22gmail=5Fquote=22=3EOn 20 March 2015 at 15=3A22=2C= Alastair Neil =3Cspan dir=3D=22ltr=22=3E=26lt=3B=3Ca moz-do-not-send=3D=22= true=22 href=3D=22mailto=3Aajneil=2Etech=40gmail=2Ecom=22 target=3D= =22=5Fblank=22=3Eajneil=2Etech=40gmail=2Ecom=3C/a=3E=26gt=3B=3C/span=3E wrote=3A=3Cbr=3E =3Cblockquote class=3D=22gmail=5Fquote=22 style=3D=22margin=3A0 0= 0 =2E8ex=3Bborder-left=3A1px =23ccc solid=3Bpadding-left=3A1ex=22= =3E =3Cdiv dir=3D=22ltr=22=3E =3Cdiv=3ECentOS 6=2E6=3C/div=3E =3Cspan class=3D=22=22=3E =3Cdiv=3E=A0=3C/div=3E =3Cblockquote class=3D=22gmail=5Fquote=22 style=3D=22font-size=3A13px=3Bmargin=3A0px 0px 0px 0=2E8ex=3Bborder-left-width=3A1px=3Bborder-left-color=3Argb=28204=2C204=2C2= 04=29=3Bborder-left-style=3Asolid=3Bpadding-left=3A1ex=22=3E=A0vdsm-4=2E16= =2E10-8=2Egitc937927=2Eel6=3Cbr=3E glusterfs-3=2E6=2E2-1=2Eel6=3Cbr=3E 2=2E6=2E32 - 504=2E8=2E1=2Eel6=2Ex86=5F64=3C/blockquote= =3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3C/span=3E =3Cdiv=3Emoved to 3=2E6 specifically to get the snapshotting= feature=2C hence my desire to migrate to thinly provisioned lvm bricks=2E=3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3Cbr=3E =3Cbr=3E Well on the glusterfs mailinglist there have been discussions=3A=3Cbr= =3E =3Cbr=3E =3Cbr=3E =3Cblockquote type=3D=22cite=22=3E3=2E6=2E2 is a major release and intr= oduces some new features in cluster wide concept=2E Additionally it is not stable yet=2E=3C/blockquote=3E =3Cbr=3E =3Cbr=3E =3Cbr=3E =3Cbr=3E =3Cblockquote cite=3D=22mid=3ACA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w=40mail= =2Egmail=2Ecom=22 type=3D=22cite=22=3E =3Cdiv class=3D=22gmail=5Fextra=22=3E =3Cdiv class=3D=22gmail=5Fquote=22=3E =3Cblockquote class=3D=22gmail=5Fquote=22 style=3D=22margin=3A0 0= 0 =2E8ex=3Bborder-left=3A1px =23ccc solid=3Bpadding-left=3A1ex=22= =3E =3Cdiv dir=3D=22ltr=22=3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv class=3D=22gmail=5Fextra=22=3E=3Cbr=3E =3Cdiv class=3D=22gmail=5Fquote=22=3EOn 20 March 2015 at 14= =3A57=2C Darrell Budic =3Cspan dir=3D=22ltr=22=3E=26lt=3B=3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3Abudic=40onholyground=2Ecom=22 target=3D=22=5Fblank=22=3Ebudic=40onholyground=2Ecom= =3C/a=3E=26gt=3B=3C/span=3E wrote=3A=3Cbr=3E =3Cblockquote class=3D=22gmail=5Fquote=22 style=3D=22marg= in=3A0 0 0 =2E8ex=3Bborder-left=3A1px =23ccc solid=3Bpadding-left= =3A1ex=22=3E =3Cdiv style=3D=22word-wrap=3Abreak-word=22=3EWhat vers= ion of gluster are you running on these=3F =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3EI=92ve seen high load during heals bounce my= hosted engine around due to overall system load=2C= but never pause anything else=2E Cent 7 combo storage/host systems=2C gluster 3=2E5=2E2=2E=3C/div= =3E =3Cdiv=3E =3Cdiv=3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3E=3Cbr=3E =3Cdiv=3E =3Cblockquote type=3D=22cite=22=3E =3Cdiv=3EOn Mar 20=2C 2015=2C at 9=3A57 AM= =2C Alastair Neil =26lt=3B=3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3Aajneil=2Etech=40gmai= l=2Ecom=22 target=3D=22=5Fblank=22=3Eajneil=2Etech= =40gmail=2Ecom=3C/a=3E=26gt=3B wrote=3A=3C/div=3E =3Cbr=3E =3Cdiv=3E =3Cdiv dir=3D=22ltr=22=3EPranith =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3EI have run a pretty straightforward test=2E=A0 I created= a two brick 50 G replica volume with normal lvm bricks=2C and installed two servers=2C one centos 6=2E6 and= one centos 7=2E0=2E=A0 I kicked off= bonnie++ on both to generate some file system activity and then made the volume replica 3=2E=A0 I saw no= issues on the servers=2E =A0=A0=3C/di= v=3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3ENot clear if this is a sufficiently rigorous test and the Volume I have had issues on is a 3TB volume =A0with about 2TB used=2E= =3C/div=3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv=3E-Alastair=3C/div=3E =3Cdiv=3E=3Cbr=3E =3C/div=3E =3Cdiv class=3D=22gmail=5Fextra=22=3E= =3Cbr=3E =3Cdiv class=3D=22gmail=5Fquote=22=3E= On 19 March 2015 at 12=3A30=2C Alastair= Neil =3Cspan dir=3D=22ltr=22=3E=26l= t=3B=3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3Aajneil=2Etec= h=40gmail=2Ecom=22 target=3D=22=5Fblank=22=3Eajnei= l=2Etech=40gmail=2Ecom=3C/a=3E=26gt=3B=3C/span=3E wrote=3A=3Cbr=3E =3Cblockquote class=3D=22gmail=5Fqu= ote=22 style=3D=22margin=3A0 0 0 =2E8ex=3Bborder-left=3A1px =23ccc= solid=3Bpadding-left=3A1ex=22=3E= =3Cdiv dir=3D=22ltr=22=3EI don=27= t think I have the resources to test it meaningfully=2E=A0 I have about 50 vms on my primary storage domain=2E=A0 I might be= able to set up a small 50 GB volume and provision 2 or 3 vms running test loads but I=27m not sure it would be comparable=2E=A0 I=27ll give it= a try and let you know if I see similar behaviour=2E=3C/div= =3E =3Cdiv=3E =3Cdiv=3E =3Cdiv class=3D=22gmail=5Fext= ra=22=3E=3Cbr=3E =3Cdiv class=3D=22gmail=5Fq= uote=22=3EOn 19 March 2015 at 11=3A34=2C Pranith Kumar= Karampuri =3Cspan dir=3D=22ltr=22=3E=26lt= =3B=3Ca moz-do-not-send=3D=22= true=22 href=3D=22mailto=3Apkarampu=40redhat=2Ecom=22 target=3D=22=5Fblank=22=3Epka= rampu=40redhat=2Ecom=3C/a=3E=26gt=3B=3C/span=3E wrote=3A=3Cbr=3E =3Cblockquote class=3D=22gmail=5Fquot= e=22 style=3D=22margin=3A0 0= 0 =2E8ex=3Bborder-left=3A= 1px =23ccc solid=3Bpadding-left=3A= 1ex=22=3E =3Cdiv text=3D=22=23000= 000=22 bgcolor=3D=22=23FFFFF= F=22=3E Without thinly provisioned lvm=2E=3C= span=3E=3Cfont color=3D=22=23888888=22=3E=3Cbr=3E =3Cbr=3E Pranith=3C/font= =3E=3C/span=3E =3Cdiv=3E =3Cdiv=3E=3Cbr=3E= =3Cdiv=3EOn 03/19/2015 08=3A01 PM=2C Alastair Neil wrote=3A=3Cbr=3E= =3C/div=3E =3Cblockquote type=3D=22cite=22= =3E =3Cdiv dir=3D=22l= tr=22=3Edo you mean raw partitions as bricks or simply with out thin provisioned lvm=3F =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3C/div=3E =3Cdiv class=3D=22gmail= =5Fextra=22=3E=3Cbr=3E =3Cdiv class=3D=22gmail= =5Fquote=22=3EOn 19 March 2015 at 00=3A32=2C Pranith Kumar Karampuri =3Cspan= dir=3D=22ltr=22= =3E=26lt=3B=3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3Apkarampu=40redhat=2Ecom=22= target=3D=22=5Fblank=22=3Epkarampu=40redhat=2Ecom=3C/a=3E=26gt=3B=3C/span= =3E wrote=3A=3Cbr=3E= =3Cblockquote class=3D=22gmail= =5Fquote=22 style=3D=22margin= =3A0 0 0 =2E8ex=3Bborder-l= eft=3A1px =23ccc solid=3Bpadding-l= eft=3A1ex=22=3E =3Cdiv text=3D=22=230000= 00=22 bgcolor=3D=22=23FFFFFF=22=3E Could you let me know if you see this problem= without lvm as well=3F=3C= span=3E=3Cfont color=3D=22=23888888=22=3E=3Cbr=3E =3Cbr=3E Pranith=3C/font= =3E=3C/span=3E =3Cdiv=3E =3Cdiv=3E=3Cbr=3E= =3Cdiv=3EOn 03/18/2015 08=3A25 PM=2C Alastair Neil wrote=3A=3Cbr=3E= =3C/div=3E =3Cblockquote type=3D=22cite=22= =3E =3Cdiv dir=3D=22l= tr=22=3EI am in the process of replacing the bricks with thinly provisioned lvs yes=2E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3C/div=3E =3Cdiv class=3D=22gmail= =5Fextra=22=3E=3Cbr=3E =3Cdiv class=3D=22gmail= =5Fquote=22=3EOn 18 March 2015 at 09=3A35=2C Pranith Kumar Karampuri =3Cspan= dir=3D=22ltr=22= =3E=26lt=3B=3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3Apkarampu=40redhat=2Ecom=22= target=3D=22=5Fblank=22=3Epkarampu=40redhat=2Ecom=3C/a=3E=26gt=3B=3C/span= =3E wrote=3A=3Cbr=3E= =3Cblockquote class=3D=22gmail= =5Fquote=22 style=3D=22margin= =3A0 0 0 =2E8ex=3Bborder-l= eft=3A1px =23ccc solid=3Bpadding-l= eft=3A1ex=22=3E =3Cdiv text=3D=22=230000= 00=22 bgcolor=3D=22=23FFFFFF=22=3E hi=2C=3Cbr=3E =A0=A0=A0=A0=A0 A= re you using thin-lvm based backend on which the bricks are created=3F=3Cbr= =3E =3Cbr=3E Pranith =3Cdiv=3E =3Cdiv=3E=3Cbr=3E= =3Cdiv=3EOn 03/18/2015 02=3A05 AM=2C Alastair Neil wrote=3A=3Cbr=3E= =3C/div=3E =3C/div=3E =3C/div=3E =3Cblockquote type=3D=22cite=22= =3E =3Cdiv=3E =3Cdiv=3E =3Cdiv dir=3D=22l= tr=22=3EI have a Ovirt cluster with 6 VM hosts and 4 gluster nodes=2E= There are two virtualisation clusters one with two nehelem nodes and one with =A0four =A0sandybridge nodes=2E My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes=2E=A0 The= engine is a hosted engine 3=2E5=2E1 on 3 of= the sandybridge nodes=2C with storage broviede by nfs from a different gluster volume=2E=A0 All= the hosts are CentOS 6=2E6=2E= =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cblockquote class=3D=22gmail= =5Fquote=22 style=3D=22margin= =3A0px 0px 0px 0=2E8ex=3Bborder-left-width=3A1px=3Bborder-left-color=3Argb=28204=2C204=2C2= 04=29=3Bborder-left-style=3Asolid=3Bpadding-left=3A1ex=22=3E=A0vdsm-4=2E16= =2E10-8=2Egitc937927=2Eel6=3Cbr=3E glusterfs-3=2E6=2E2-1=2Eel6=3Cbr=3E 2=2E6=2E32 - 504=2E8=2E1=2Eel6= =2Ex86=5F64=3C/blockquote=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3EProblems= happen when I try to add a new brick or replace a brick eventually the self heal will kill the VMs=2E= In the VM=27s logs I see kernel hung task messages=2E=A0=3C= /div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3E =3Cblockquote class=3D=22gmail= =5Fquote=22 style=3D=22margin= =3A0px 0px 0px 0=2E8ex=3Bborder-left-width=3A1px=3Bborder-left-color=3Argb=28204=2C204=2C2= 04=29=3Bborder-left-style=3Asolid=3Bpadding-left=3A1ex=22=3E=3Cfont face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A INFO=3A= task nginx=3A1736 blocked for more than 120 seconds=2E=3Cbr= =3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =A0=20= =A0 =A0Not tainted 2=2E6=2E32-504=2E= 3=2E3=2Eel6=2Ex86=5F64 =231=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =22echo= 0 =26gt=3B /proc/sys/kernel/= hung=5Ftask=5Ftimeout=5Fsecs=22 disables this message=2E=3Cbr= =3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A nginx= =A0 =A0 =A0 =A0 D= 0000000000000001= =A0 =A0 0 =A01736= =A0 1735 0x00000080=3Cbr= =3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A ffff8800778b17a8= 0000000000000082= 0000000000000000= 00000000000126c0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A ffff88007e5c6500= ffff880037170080= 0006ce5c85bd9185= ffff88007e5c64d0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A ffff88007a614ae0= 00000001722b64ba= ffff88007a615098= ffff8800778b1fd8=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A Call Trace=3A=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8152a885=26gt=3B=5D schedule=5Ftimeout+0x215/0x2e0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8152a503=26gt=3B=5D wait=5Ffor=5Fcommon+0x123/0x180=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff81064b90=26gt=3B=5D =3F default=5Fwake=5F= function+0x0/0x20=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0210a76=26gt=3B=5D =3F =5Fxfs=5Fbuf=5Fre= ad+0x46/0x60 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa02063c7=26gt=3B=5D =3F xfs=5Ftrans=5Frea= d=5Fbuf+0x197/0x410 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8152a61d=26gt=3B=5D wait=5Ffor=5Fcompletion+0x1d/0x20=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa020ff5b=26gt=3B=5D xfs=5Fbuf=5Fiowai= t+0x9b/0x100 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa02063c7=26gt=3B=5D =3F xfs=5Ftrans=5Frea= d=5Fbuf+0x197/0x410 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0210a76=26gt=3B=5D =5Fxfs=5Fbuf=5Fre= ad+0x46/0x60 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0210b3b=26gt=3B=5D xfs=5Fbuf=5Fread+= 0xab/0x100 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa02063c7=26gt=3B=5D xfs=5Ftrans=5Frea= d=5Fbuf+0x197/0x410 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa01ee6a4=26gt=3B=5D xfs=5Fimap=5Fto= =5Fbp+0x54/0x130 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa01f077b=26gt=3B=5D xfs=5Firead+0x7b/= 0x1b0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff811ab77e=26gt=3B=5D =3F inode=5Finit=5Fal= ways+0x11e/0x1c0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa01eb5ee=26gt=3B=5D xfs=5Figet+0x27e/= 0x6e0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa01eae1d=26gt=3B=5D =3F xfs=5Fiunlock+0x5= d/0xd0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0209366=26gt=3B=5D xfs=5Flookup+0xc6= /0x110 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0216024=26gt=3B=5D xfs=5Fvn=5Flookup= +0x54/0xa0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8119dc65=26gt=3B=5D do=5Flookup+0x1a5/0x230=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8119e8f4=26gt=3B=5D =5F=5Flink=5Fpath=5Fwalk+0x7a4/0x1000=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff811738e7=26gt=3B=5D =3F cache=5Fgrow+0x21= 7/0x320=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8119f40a=26gt=3B=5D path=5Fwalk+0x6a/0xe0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8119f61b=26gt=3B=5D filename=5Flookup+0x6b/0xc0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff811a0747=26gt=3B=5D user=5Fpath=5Fat+0x57/0xa0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa0204e74=26gt=3B=5D =3F =5Fxfs=5Ftrans=5F= commit+0x214/0x2a0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ffa01eae3e=26gt=3B=5D =3F xfs=5Fiunlock+0x7= e/0xd0 =5Bxfs=5D=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff81193bc0=26gt=3B=5D vfs=5Ffstatat+0x50/0xa0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff811aaf5d=26gt=3B=5D =3F touch=5Fatime+0x1= 4d/0x1a0=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff81193d3b=26gt=3B=5D vfs=5Fstat+0x1b/0x20=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff81193d64=26gt=3B=5D sys=5Fnewstat+0x24/0x50=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff810e5c87=26gt=3B=5D =3F audit=5Fsyscall= =5Fentry+0x1d7/0x200=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff810e5a7e=26gt=3B=5D =3F =5F=5Faudit=5Fsys= call=5Fexit+0x25e/0x290=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3EMa= r 12 23=3A05=3A16= static1 kernel=3A =5B=26lt=3Bffffff= ff8100b072=26gt=3B=5D system=5Fcall=5Ffastpath+0x16/0x1b=3C/font=3E=3C/blockquote=3E =3C/div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3EI am wondering if my volume settings are causing this=2E= =A0 Can anyone with more knowledge take a look and let me know=3A=3C/div= =3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3E =3Cblockquote class=3D=22gmail= =5Fquote=22 style=3D=22margin= =3A0px 0px 0px 0=2E8ex=3Bborder-left-width=3A1px=3Bborder-left-color=3Argb=28204=2C204=2C2= 04=29=3Bborder-left-style=3Asolid=3Bpadding-left=3A1ex=22=3E=3Cfont face=3D=22monospa= ce=2C monospace=22=3Ene= twork=2Eremote-dio=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Epe= rformance=2Estat-prefetch=3A off=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Epe= rformance=2Eio-cache=3A off=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Epe= rformance=2Eread-ahead=3A off=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Epe= rformance=2Equick-read=3A off=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Enf= s=2Eexport-volumes=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ene= twork=2Eping-timeout=3A 20=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eself-heal-readdir-size=3A 64KB=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Equorum-type=3A auto=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Edata-self-heal-algorithm=3A diff=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eself-heal-window-size=3A 8=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eheal-timeout=3A 500=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eself-heal-daemon=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eentry-self-heal=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Edata-self-heal=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Emetadata-self-heal=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Ereaddir-optimize=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Ebackground-self-heal-count=3A 20=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Erebalance-stats=3A on=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Emin-free-disk=3A 5=25=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eeager-lock=3A enable=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Est= orage=2Eowner-uid=3A 36=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Est= orage=2Eowner-gid=3A 36=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Eau= th=2Eallow=3A*=3Cbr=3E =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Eus= er=2Ecifs=3A disable=3Cbr=3E= =3C/font=3E=3Cfon= t face=3D=22monospa= ce=2C monospace=22=3Ecl= uster=2Eserver-quorum-ratio=3A 51=25=3C/font=3E= =3C/blockquote=3E =3C/div=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3Cdiv=3EMany Thanks=2C =A0Alastair=3C/di= v=3E =3Cdiv=3E=3Cbr=3E= =3C/div=3E =3C/div=3E =3Cbr=3E =3Cfieldset=3E=3C= /fieldset=3E =3Cbr=3E =3C/div=3E =3C/div=3E =3Cpre=3E=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F Users mailing list =3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3AUsers=40ovirt=2Eorg=22= target=3D=22=5Fblank=22=3EUsers=40ovirt=2Eorg=3C/a=3E =3Ca moz-do-not-send=3D=22true=22 href=3D=22http=3A//lists=2Eovirt=2Eorg/ma= ilman/listinfo/users=22 target=3D=22=5Fblank=22=3Ehttp=3A//lists=2Eovirt=2E= org/mailman/listinfo/users=3C/a=3E =3C/pre=3E =3C/blockquote=3E= =3Cbr=3E =3C/div=3E =3Cbr=3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=3Cbr=3E= Users mailing list=3Cbr=3E =3Ca moz-do-not-send= =3D=22true=22 href=3D=22mailto=3AUsers=40ovirt=2Eorg=22 target=3D=22=5Fblank=22=3EUsers= =40ovirt=2Eorg=3C/a=3E=3Cbr=3E =3Ca moz-do-not-send= =3D=22true=22 href=3D=22http=3A//lists=2Eovirt=2Eorg/mailman/listinfo/users=22 target=3D= =22=5Fblank=22=3Ehttp=3A//lists=2Eovirt=2Eorg/mailman/listinfo/users=3C/a= =3E=3Cbr=3E =3Cbr=3E =3C/blockquote=3E= =3C/div=3E =3Cbr=3E =3C/div=3E =3C/blockquote=3E= =3Cbr=3E =3C/div=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E= =3C/div=3E =3Cbr=3E =3C/div=3E =3C/blockquote=3E= =3Cbr=3E =3C/div=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3Cbr=3E =3C/div=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3Cbr=3E =3C/div=3E =3C/div=3E =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=3Cbr=3E= Users mailing list=3Cbr=3E =3Ca moz-do-not-send=3D=22true=22 href=3D=22mailto=3AUsers=40ovirt=2Eorg= =22 target=3D=22=5Fblank=22=3EUsers=40ovirt= =2Eorg=3C/a=3E=3Cbr=3E =3Ca moz-do-not-send=3D=22true=22 href=3D=22http=3A//lists=2Eovirt=2Eorg/= mailman/listinfo/users=22 target=3D=22=5Fblank=22=3Ehttp=3A//list= s=2Eovirt=2Eorg/mailman/listinfo/users=3C/a=3E=3Cbr=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3Cbr=3E =3C/div=3E =3C/div=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3Cbr=3E =3C/div=3E =3C/div=3E =3Cdiv class=3D=22HOEnZb=22=3E =3Cdiv class=3D=22h5=22=3E=3Cbr=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3Cbr=3E =3Cbr=3E =3Cblockquote cite=3D=22mid=3ACA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w=40mail= =2Egmail=2Ecom=22 type=3D=22cite=22=3E =3Cdiv class=3D=22gmail=5Fextra=22=3E =3Cdiv class=3D=22gmail=5Fquote=22=3E =3Cblockquote class=3D=22gmail=5Fquote=22 style=3D=22margin=3A0 0= 0 =2E8ex=3Bborder-left=3A1px =23ccc solid=3Bpadding-left=3A1ex=22= =3E =3Cdiv class=3D=22HOEnZb=22=3E =3Cdiv class=3D=22h5=22=3E =3C/div=3E =3C/div=3E =3C/blockquote=3E =3C/div=3E =3Cbr=3E =3C/div=3E =3Cbr=3E =3Cfieldset class=3D=22mimeAttachmentHeader=22=3E=3C/fieldset=3E =3Cbr=3E =3Cpre wrap=3D=22=22=3E=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F Users mailing list =3Ca class=3D=22moz-txt-link-abbreviated=22 href=3D=22mailto=3AUsers=40ovir= t=2Eorg=22=3EUsers=40ovirt=2Eorg=3C/a=3E =3Ca class=3D=22moz-txt-link-freetext=22 href=3D=22http=3A//lists=2Eovirt= =2Eorg/mailman/listinfo/users=22=3Ehttp=3A//lists=2Eovirt=2Eorg/mailman/lis= tinfo/users=3C/a=3E =3C/pre=3E =3C/blockquote=3E =3Cbr=3E =20= =3CBR /=3E =3CBR /=3E =3Cb style=3D=22color=3A=23604c78=22=3E=3C/b=3E=3Cbr=3E=3Cspan style=3D=22c= olor=3A=23604c78=3B=22=3E=3Cfont color=3D=22000000=22=3E=3Cspan style=3D=22= mso-fareast-language=3Aen-gb=3B=22 lang=3D=22NL=22=3EMet vriendelijke groet= =2C With kind regards=2C=3Cbr=3E=3Cbr=3E=3C/span=3EJorick Astrego=3C/font= =3E=3C/span=3E=3Cb style=3D=22color=3A=23604c78=22=3E=3Cbr=3E=3Cbr=3ENetbul= ae Virtualization Experts =3C/b=3E=3Cbr=3E=3Chr style=3D=22border=3Anone=3B= border-top=3A1px solid =23ccc=3B=22=3E=3Ctable style=3D=22width=3A 522px=22= =3E=3Ctbody=3E=3Ctr=3E=3Ctd style=3D=22width=3A 130px=3Bfont-size=3A 10px= =22=3ETel=3A 053 20 30 270=3C/td=3E =3Ctd style=3D=22width=3A 130px=3Bf= ont-size=3A 10px=22=3Einfo=40netbulae=2Eeu=3C/td=3E =3Ctd style=3D=22wid= th=3A 130px=3Bfont-size=3A 10px=22=3EStaalsteden 4-3A=3C/td=3E =3Ctd sty= le=3D=22width=3A 130px=3Bfont-size=3A 10px=22=3EKvK 08198180=3C/td=3E=3C/tr= =3E=3Ctr=3E =3Ctd style=3D=22width=3A 130px=3Bfont-size=3A 10px=22=3EFax= =3A 053 20 30 271=3C/td=3E =3Ctd style=3D=22width=3A 130px=3Bfont-size= =3A 10px=22=3Ewww=2Enetbulae=2Eeu=3C/td=3E =3Ctd style=3D=22width=3A 130= px=3Bfont-size=3A 10px=22=3E7547 TA Enschede=3C/td=3E =3Ctd style=3D=22w= idth=3A 130px=3Bfont-size=3A 10px=22=3EBTW NL821234584B01=3C/td=3E=3C/tr=3E= =3C/tbody=3E=3C/table=3E=3Cbr=3E=3Chr style=3D=22border=3Anone=3Bborder-top= =3A1px solid =23ccc=3B=22=3E=3CBR /=3E =3C/body=3E =3C/html=3E --------------030308050700080206070001--

On Apr 4, 2015, at 8:57 AM, Jorick Astrego <j.astrego@netbulae.eu> = wrote: =20 =20 =20 On 04/03/2015 10:04 PM, Alastair Neil wrote:
Any follow up on this? =20 Are there known issues using a replica 3 glsuter datastore with lvm =
=20 On 20 March 2015 at 15:22, Alastair Neil <ajneil.tech@gmail.com = <mailto:ajneil.tech@gmail.com>> wrote: CentOS 6.6 =20 vdsm-4.16.10-8.gitc937927.el6 glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64 =20 moved to 3.6 specifically to get the snapshotting feature, hence my = desire to migrate to thinly provisioned lvm bricks. =20 =20 Well on the glusterfs mailinglist there have been discussions: =20 =20 3.6.2 is a major release and introduces some new features in cluster = wide concept. Additionally it is not stable yet. =20 =20 =20 =20 =20 =20 On 20 March 2015 at 14:57, Darrell Budic <budic@onholyground.com = <mailto:budic@onholyground.com>> wrote: What version of gluster are you running on these? =20 I=92ve seen high load during heals bounce my hosted engine around due = to overall system load, but never pause anything else. Cent 7 combo = storage/host systems, gluster 3.5.2. =20 =20
On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.tech@gmail.com = <mailto:ajneil.tech@gmail.com>> wrote: =20 Pranith =20 I have run a pretty straightforward test. I created a two brick 50 = G replica volume with normal lvm bricks, and installed two servers, one = centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to = generate some file system activity and then made the volume replica 3. = I saw no issues on the servers. =20 =20 Not clear if this is a sufficiently rigorous test and the Volume I = have had issues on is a 3TB volume with about 2TB used. =20 -Alastair =20 =20 On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech@gmail.com = <mailto:ajneil.tech@gmail.com>> wrote: I don't think I have the resources to test it meaningfully. I have = about 50 vms on my primary storage domain. I might be able to set up a = small 50 GB volume and provision 2 or 3 vms running test loads but I'm = not sure it would be comparable. I'll give it a try and let you know if = I see similar behaviour. =20 On 19 March 2015 at 11:34, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: Without thinly provisioned lvm. =20 Pranith =20 On 03/19/2015 08:01 PM, Alastair Neil wrote:
do you mean raw partitions as bricks or simply with out thin =
=20 =20 =20 On 19 March 2015 at 00:32, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: Could you let me know if you see this problem without lvm as well? =20 Pranith =20 On 03/18/2015 08:25 PM, Alastair Neil wrote:
I am in the process of replacing the bricks with thinly =
=20 =20 =20 On 18 March 2015 at 09:35, Pranith Kumar Karampuri = <pkarampu@redhat.com <mailto:pkarampu@redhat.com>> wrote: hi, Are you using thin-lvm based backend on which the bricks are = created? =20 Pranith =20 On 03/18/2015 02:05 AM, Alastair Neil wrote:
I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There = are two virtualisation clusters one with two nehelem nodes and one with = four sandybridge nodes. My master storage domain is a GlusterFS backed = by a replica 3 gluster volume from 3 of the gluster nodes. The engine = is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage = broviede by nfs from a different gluster volume. All the hosts are = CentOS 6.6. =20 vdsm-4.16.10-8.gitc937927.el6 glusterfs-3.6.2-1.el6 2.6.32 - 504.8.1.el6.x86_64 =20 Problems happen when I try to add a new brick or replace a brick = eventually the self heal will kill the VMs. In the VM's logs I see = kernel hung task messages.=20 =20 Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for = more than 120 seconds. Mar 12 23:05:16 static1 kernel: Not tainted = 2.6.32-504.3.3.el6.x86_64 #1 Mar 12 23:05:16 static1 kernel: "echo 0 > = /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 = 0 1736 1735 0x00000080 Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 = 0000000000000000 00000000000126c0 Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 = 0006ce5c85bd9185 ffff88007e5c64d0 Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba = ffff88007a615098 ffff8800778b1fd8 Mar 12 23:05:16 static1 kernel: Call Trace: Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] = schedule_timeout+0x215/0x2e0 Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] = wait_for_common+0x123/0x180 Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? = default_wake_function+0x0/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? = _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] = wait_for_completion+0x1d/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] = xfs_buf_iowait+0x9b/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] = _xfs_buf_read+0x46/0x60 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] = xfs_buf_read+0xab/0x100 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] = xfs_trans_read_buf+0x197/0x410 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] = xfs_imap_to_bp+0x54/0x130 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] = xfs_iread+0x7b/0x1b0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? = inode_init_always+0x11e/0x1c0 Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] = xfs_iget+0x27e/0x6e0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? = xfs_iunlock+0x5d/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] = xfs_lookup+0xc6/0x110 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] = xfs_vn_lookup+0x54/0xa0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] = do_lookup+0x1a5/0x230 Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] = __link_path_walk+0x7a4/0x1000 Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? = cache_grow+0x217/0x320 Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] =
--Apple-Mail=_328EDD08-7E1B-45BB-8C4D-3B30BE5DF33F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 I hadn=92t revisited it yet, but it is possible to use cgroups to limit = glusterfs=92s cpu usage, might help you out. Andrew Wklau has a blog post about it: = http://www.andrewklau.com/controlling-glusterfsd-cpu-outbreaks-with-cgroup= s/ Careful about how far you throttle it down, if it=92s your VMs disk it=92s= rebuilding, you=92ll pause it anyway I=92d expect. thin provisioned bricks? provisioned lvm? provisioned lvs yes. path_walk+0x6a/0xe0
Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] = filename_lookup+0x6b/0xc0 Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] = user_path_at+0x57/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? = _xfs_trans_commit+0x214/0x2a0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? = xfs_iunlock+0x7e/0xd0 [xfs] Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] = vfs_fstatat+0x50/0xa0 Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? = touch_atime+0x14d/0x1a0 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] = vfs_stat+0x1b/0x20 Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] = sys_newstat+0x24/0x50 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? = audit_syscall_entry+0x1d7/0x200 Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? = __audit_syscall_exit+0x25e/0x290 Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] = system_call_fastpath+0x16/0x1b =20 =20 I am wondering if my volume settings are causing this. Can = anyone with more knowledge take a look and let me know: =20 network.remote-dio: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.export-volumes: on network.ping-timeout: 20 cluster.self-heal-readdir-size: 64KB cluster.quorum-type: auto cluster.data-self-heal-algorithm: diff cluster.self-heal-window-size: 8 cluster.heal-timeout: 500 cluster.self-heal-daemon: on cluster.entry-self-heal: on cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.background-self-heal-count: 20 cluster.rebalance-stats: on cluster.min-free-disk: 5% cluster.eager-lock: enable storage.owner-uid: 36 storage.owner-gid: 36 auth.allow:* user.cifs: disable cluster.server-quorum-ratio: 51% =20 Many Thanks, Alastair =20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20 =20 =20 =20 =20
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20 =20 =20 =20 =20 =20
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20 =20 Met vriendelijke groet, With kind regards, =20 Jorick Astrego =20 Netbulae Virtualization Experts=20 Tel: 053 20 30 270 info@netbulae.eu Staalsteden 4-3A = KvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW = NL821234584B01 =20 =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_328EDD08-7E1B-45BB-8C4D-3B30BE5DF33F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">I hadn=92t revisited it yet, but it is possible to use = cgroups to limit glusterfs=92s cpu usage, might help you out.<div = class=3D""><br class=3D""></div><div class=3D"">Andrew Wklau has a blog = post about it: <a = href=3D"http://www.andrewklau.com/controlling-glusterfsd-cpu-outbreaks-wit= h-cgroups/" = class=3D"">http://www.andrewklau.com/controlling-glusterfsd-cpu-outbreaks-= with-cgroups/</a></div><div class=3D""><br class=3D""></div><div = class=3D"">Careful about how far you throttle it down, if it=92s your = VMs disk it=92s rebuilding, you=92ll pause it anyway I=92d = expect.</div><div class=3D""><br class=3D""><div><blockquote type=3D"cite"= class=3D""><div class=3D"">On Apr 4, 2015, at 8:57 AM, Jorick Astrego = <<a href=3D"mailto:j.astrego@netbulae.eu" = class=3D"">j.astrego@netbulae.eu</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""> =20 <meta content=3D"text/html; charset=3Dwindows-1252" = http-equiv=3D"Content-Type" class=3D""> =20 <div bgcolor=3D"#FFFFFF" text=3D"#000000" class=3D""> <br class=3D""> <br class=3D""> <div class=3D"moz-cite-prefix">On 04/03/2015 10:04 PM, Alastair Neil wrote:<br class=3D""> </div> <blockquote = cite=3D"mid:CA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w@mail.gmail= .com" type=3D"cite" class=3D""> <div dir=3D"ltr" class=3D"">Any follow up on this? <div class=3D""><br class=3D""> </div> <div class=3D""> Are there known issues using a replica 3 = glsuter datastore with lvm thin provisioned bricks?</div> </div> <div class=3D"gmail_extra"><br class=3D""> <div class=3D"gmail_quote">On 20 March 2015 at 15:22, Alastair Neil <span dir=3D"ltr" class=3D""><<a = moz-do-not-send=3D"true" href=3D"mailto:ajneil.tech@gmail.com" = target=3D"_blank" class=3D"">ajneil.tech@gmail.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir=3D"ltr" class=3D""> <div class=3D"">CentOS 6.6</div> <span class=3D""> <div class=3D""> </div> <blockquote class=3D"gmail_quote" = style=3D"font-size:13px;margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"> vdsm-4.16.10-8.gitc937927.el6<br = class=3D""> glusterfs-3.6.2-1.el6<br class=3D""> 2.6.32 - 504.8.1.el6.x86_64</blockquote> <div class=3D""><br class=3D""> </div> </span> <div class=3D"">moved to 3.6 specifically to get the = snapshotting feature, hence my desire to migrate to thinly provisioned lvm bricks.</div> </div> </blockquote> </div> </div> </blockquote> <br class=3D""> <br class=3D""> Well on the glusterfs mailinglist there have been discussions:<br = class=3D""> <br class=3D""> <br class=3D""> <blockquote type=3D"cite" class=3D"">3.6.2 is a major release and = introduces some new features in cluster wide concept. Additionally it is not stable yet.</blockquote> <br class=3D""> <br class=3D""> <br class=3D""> <br class=3D""> <blockquote = cite=3D"mid:CA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w@mail.gmail= .com" type=3D"cite" class=3D""> <div class=3D"gmail_extra"> <div class=3D"gmail_quote"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir=3D"ltr" class=3D""> <div class=3D""><br class=3D""> </div> <div class=3D"gmail_extra"><br class=3D""> <div class=3D"gmail_quote">On 20 March 2015 at 14:57, Darrell Budic <span dir=3D"ltr" class=3D""><<a = moz-do-not-send=3D"true" href=3D"mailto:budic@onholyground.com" = target=3D"_blank" class=3D"">budic@onholyground.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 = 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div style=3D"word-wrap:break-word" class=3D"">What = version of gluster are you running on these? <div class=3D""><br class=3D""> </div> <div class=3D"">I=92ve seen high load during heals = bounce my hosted engine around due to overall system load, but never pause anything else. Cent 7 combo storage/host systems, gluster 3.5.2.</div> <div class=3D""> <div class=3D""> <div class=3D""><br class=3D""> </div> <div class=3D""><br class=3D""> <div class=3D""> <blockquote type=3D"cite" class=3D""> <div class=3D"">On Mar 20, 2015, at 9:57 = AM, Alastair Neil <<a = moz-do-not-send=3D"true" href=3D"mailto:ajneil.tech@gmail.com" = target=3D"_blank" class=3D"">ajneil.tech@gmail.com</a>> wrote:</div> <br class=3D""> <div class=3D""> <div dir=3D"ltr" class=3D"">Pranith <div class=3D""><br class=3D""> </div> <div class=3D"">I have run a pretty straightforward test. I = created a two brick 50 G replica volume with normal lvm bricks, and installed two servers, one centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate some file system activity and then made the volume replica 3. I saw = no issues on the servers. = </div> <div class=3D""><br class=3D""> </div> <div class=3D"">Not clear if this is = a sufficiently rigorous test and the Volume I have had issues on is a 3TB volume with about 2TB = used.</div> <div class=3D""><br class=3D""> </div> <div class=3D"">-Alastair</div> <div class=3D""><br class=3D""> </div> <div class=3D"gmail_extra"><br = class=3D""> <div class=3D"gmail_quote">On 19 March 2015 at 12:30, Alastair Neil <span dir=3D"ltr" = class=3D""><<a moz-do-not-send=3D"true" = href=3D"mailto:ajneil.tech@gmail.com" target=3D"_blank" = class=3D"">ajneil.tech@gmail.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" = style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir=3D"ltr" class=3D"">I = don't think I have the resources to test it meaningfully. I = have about 50 vms on my primary storage domain. I = might be able to set up a small 50 GB volume and provision 2 or 3 vms running test loads but I'm not sure it would be comparable. I'll give = it a try and let you know if I see similar behaviour.</div> <div class=3D""> <div class=3D""> <div = class=3D"gmail_extra"><br class=3D""> <div = class=3D"gmail_quote">On 19 March 2015 at 11:34, Pranith Kumar Karampuri <span = dir=3D"ltr" class=3D""><<a moz-do-not-send=3D"true" = href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br class=3D""> <blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"> <div text=3D"#000000" = bgcolor=3D"#FFFFFF" class=3D""> Without thinly provisioned = lvm.<span class=3D""><font color=3D"#888888" class=3D""><br class=3D""> <br class=3D""> = Pranith</font></span> <div class=3D""> <div = class=3D""><br class=3D""> <div = class=3D"">On 03/19/2015 08:01 PM, Alastair Neil wrote:<br = class=3D""> </div> <blockquote = type=3D"cite" class=3D""> <div dir=3D"ltr"= class=3D"">do you mean raw partitions as bricks or simply with out thin provisioned lvm? <div = class=3D""><br class=3D""> </div> <div = class=3D""><br class=3D""> </div> </div> <div = class=3D"gmail_extra"><br class=3D""> <div = class=3D"gmail_quote">On 19 March 2015 at 00:32, Pranith Kumar Karampuri = <span dir=3D"ltr" class=3D""><<a moz-do-not-send=3D"true" = href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br = class=3D""> <blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc = solid;padding-left:1ex"> <div = text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""> Could you let me know = if you see this problem without lvm as = well?<span class=3D""><font color=3D"#888888" class=3D""><br class=3D""> <br class=3D""> = Pranith</font></span> <div class=3D"">= <div = class=3D""><br class=3D""> <div = class=3D"">On 03/18/2015 08:25 PM, Alastair Neil wrote:<br = class=3D""> </div> <blockquote = type=3D"cite" class=3D""> <div dir=3D"ltr"= class=3D"">I am in the process of replacing the bricks with thinly provisioned lvs yes. <div = class=3D""><br class=3D""> </div> <div = class=3D""><br class=3D""> </div> </div> <div = class=3D"gmail_extra"><br class=3D""> <div = class=3D"gmail_quote">On 18 March 2015 at 09:35, Pranith Kumar Karampuri = <span dir=3D"ltr" class=3D""><<a moz-do-not-send=3D"true" = href=3D"mailto:pkarampu@redhat.com" target=3D"_blank" = class=3D"">pkarampu@redhat.com</a>></span> wrote:<br = class=3D""> <blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc = solid;padding-left:1ex"> <div = text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""> hi,<br class=3D""> = Are you using thin-lvm based backend on which the bricks are created?<br = class=3D""> <br class=3D""> Pranith <div class=3D"">= <div = class=3D""><br class=3D""> <div = class=3D"">On 03/18/2015 02:05 AM, Alastair Neil wrote:<br = class=3D""> </div> </div> </div> <blockquote = type=3D"cite" class=3D""> <div class=3D"">= <div class=3D"">= <div dir=3D"ltr"= class=3D"">I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are two virtualisation clusters one with two nehelem nodes and one with four = sandybridge nodes. My master storage domain is a GlusterFS backed by a replica 3 gluster volume from 3 of the gluster nodes. = The engine is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede by nfs from a different gluster volume. = All the hosts are CentOS 6.6. <div = class=3D""><br class=3D""> </div> <blockquote = class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"> vdsm-4.16.10-8.gitc937927.el6<br = class=3D""> glusterfs-3.6.2-1.el6<br class=3D""> 2.6.32 - = 504.8.1.el6.x86_64</blockquote> <div = class=3D""><br class=3D""> </div> <div = class=3D"">Problems happen when I try to add a new brick or replace a brick eventually the self heal will kill the VMs. In the VM's logs I see kernel hung task = messages. </div> <div = class=3D""><br class=3D""> </div> <div class=3D"">= <blockquote = class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"><font face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for more than 120 seconds.<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = Not = tainted = 2.6.32-504.3.3.el6.x86_64 #1<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: "echo 0 > = /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: nginx = D = 0000000000000001 = 0 1736 1735 0x00000080<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = ffff8800778b17a8 = 0000000000000082 = 0000000000000000 00000000000126c0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = ffff88007e5c6500 = ffff880037170080 = 0006ce5c85bd9185 ffff88007e5c64d0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = ffff88007a614ae0 = 00000001722b64ba = ffff88007a615098 ffff8800778b1fd8<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: Call Trace:<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8152a885>] schedule_timeout+0x215/0x2e0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8152a503>] wait_for_common+0x123/0x180<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff81064b90>] ? = default_wake_function+0x0/0x20<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0210a76>] ? = _xfs_buf_read+0x46/0x60 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8152a61d>] wait_for_completion+0x1d/0x20<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa020ff5b>] = xfs_buf_iowait+0x9b/0x100 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa02063c7>] ? = xfs_trans_read_buf+0x197/0x410 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0210a76>] = _xfs_buf_read+0x46/0x60 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0210b3b>] = xfs_buf_read+0xab/0x100 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa02063c7>] = xfs_trans_read_buf+0x197/0x410 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa01ee6a4>] = xfs_imap_to_bp+0x54/0x130 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa01f077b>] = xfs_iread+0x7b/0x1b0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff811ab77e>] ? = inode_init_always+0x11e/0x1c0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa01eb5ee>] = xfs_iget+0x27e/0x6e0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa01eae1d>] ? = xfs_iunlock+0x5d/0xd0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0209366>] = xfs_lookup+0xc6/0x110 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0216024>] = xfs_vn_lookup+0x54/0xa0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8119dc65>] do_lookup+0x1a5/0x230<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8119e8f4>] __link_path_walk+0x7a4/0x1000<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff811738e7>] ? = cache_grow+0x217/0x320<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8119f40a>] path_walk+0x6a/0xe0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8119f61b>] filename_lookup+0x6b/0xc0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff811a0747>] user_path_at+0x57/0xa0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa0204e74>] ? = _xfs_trans_commit+0x214/0x2a0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffffa01eae3e>] ? = xfs_iunlock+0x7e/0xd0 [xfs]<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff81193bc0>] vfs_fstatat+0x50/0xa0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff811aaf5d>] ? = touch_atime+0x14d/0x1a0<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff81193d3b>] vfs_stat+0x1b/0x20<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff81193d64>] sys_newstat+0x24/0x50<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff810e5c87>] ? = audit_syscall_entry+0x1d7/0x200<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff810e5a7e>] ? = __audit_syscall_exit+0x25e/0x290<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">Mar 12 23:05:16 static1 kernel: = [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b</font></blockquote> </div> <div = class=3D""><br class=3D""> </div> <div = class=3D""><br class=3D""> </div> <div = class=3D"">I am wondering if my volume settings are causing = this. Can anyone with more knowledge take a look and let me know:</div> <div = class=3D""><br class=3D""> </div> <div class=3D"">= <blockquote = class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left= -style:solid;padding-left:1ex"><font face=3D"monospace, monospace" = class=3D"">network.remote-dio: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">performance.stat-prefetch: off<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">performance.io-cache: off<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">performance.read-ahead: off<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">performance.quick-read: off<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">nfs.export-volumes: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">network.ping-timeout: 20<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.self-heal-readdir-size: 64KB<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.quorum-type: auto<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.data-self-heal-algorithm: diff<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.self-heal-window-size: 8<br class=3D"">= </font><font = face=3D"monospace, monospace" = class=3D"">cluster.heal-timeout: 500<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.self-heal-daemon: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.entry-self-heal: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.data-self-heal: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.metadata-self-heal: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.readdir-optimize: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.background-self-heal-count: 20<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.rebalance-stats: on<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.min-free-disk: 5%<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.eager-lock: enable<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">storage.owner-uid: 36<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">storage.owner-gid: 36<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">auth.allow:*<br class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">user.cifs: disable<br = class=3D""> </font><font = face=3D"monospace, monospace" = class=3D"">cluster.server-quorum-ratio: = 51%</font></blockquote> </div> <div = class=3D""><br class=3D""> </div> <div = class=3D"">Many Thanks, = Alastair</div> <div = class=3D""><br class=3D""> </div> </div> <br class=3D""> <fieldset = class=3D""></fieldset> <br class=3D""> </div> </div> <pre = class=3D"">_______________________________________________ Users mailing list <a moz-do-not-send=3D"true" href=3D"mailto:Users@ovirt.org" = target=3D"_blank" class=3D"">Users@ovirt.org</a> <a moz-do-not-send=3D"true" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br class=3D""> </div> <br class=3D""> _______________________________________________<br class=3D""> Users mailing list<br = class=3D""> <a = moz-do-not-send=3D"true" href=3D"mailto:Users@ovirt.org" target=3D"_blank"= class=3D"">Users@ovirt.org</a><br class=3D""> <a = moz-do-not-send=3D"true" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> <br class=3D""> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </div> _______________________________________________<br class=3D""> Users mailing list<br class=3D""> <a moz-do-not-send=3D"true" = href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> <a moz-do-not-send=3D"true" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> </div> </blockquote> </div> <br class=3D""> </div> </div> </div> </div> </blockquote> </div> <br class=3D""> </div> </div> <div class=3D"HOEnZb"> <div class=3D"h5"><br class=3D""> </div> </div> </blockquote> </div> </div> </blockquote> <br class=3D""> <br class=3D""> <blockquote = cite=3D"mid:CA+SarwqNuvVGUDDjhDRbNii-foMGAyaVibxyMGM5AEPzRkDu+w@mail.gmail= .com" type=3D"cite" class=3D""> <div class=3D"gmail_extra"> <div class=3D"gmail_quote"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"HOEnZb"> <div class=3D"h5"> </div> </div> </blockquote> </div> <br class=3D""> </div> <br class=3D""> <fieldset class=3D"mimeAttachmentHeader"></fieldset> <br class=3D""> <pre wrap=3D"" = class=3D"">_______________________________________________ Users mailing list <a class=3D"moz-txt-link-abbreviated" = href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a> <a class=3D"moz-txt-link-freetext" = href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.= org/mailman/listinfo/users</a> </pre> </blockquote> <br class=3D""> <br class=3D""> <br class=3D""> <b style=3D"color:#604c78" class=3D""></b><br class=3D""><span = style=3D"color:#604c78;" class=3D""><span = style=3D"mso-fareast-language:en-gb;" lang=3D"NL" class=3D"">Met = vriendelijke groet, With kind regards,<br class=3D""><br = class=3D""></span>Jorick Astrego</span><b style=3D"color:#604c78" = class=3D""><br class=3D""><br class=3D"">Netbulae Virtualization Experts = </b><br class=3D""><hr style=3D"border:none;border-top:1px solid #ccc;" = class=3D""><table style=3D"width: 522px" class=3D""><tbody class=3D""><tr = class=3D""><td style=3D"width: 130px;font-size: 10px" class=3D"">Tel: = 053 20 30 270</td> <td style=3D"width: 130px;font-size: 10px" = class=3D""><a href=3D"mailto:info@netbulae.eu" = class=3D"">info@netbulae.eu</a></td> <td style=3D"width: = 130px;font-size: 10px" class=3D"">Staalsteden 4-3A</td> <td = style=3D"width: 130px;font-size: 10px" class=3D"">KvK = 08198180</td></tr><tr class=3D""> <td style=3D"width: = 130px;font-size: 10px" class=3D"">Fax: 053 20 30 271</td> <td = style=3D"width: 130px;font-size: 10px" class=3D""><a = href=3D"http://www.netbulae.eu" class=3D"">www.netbulae.eu</a></td> = <td style=3D"width: 130px;font-size: 10px" class=3D"">7547 TA = Enschede</td> <td style=3D"width: 130px;font-size: 10px" class=3D"">BTW= NL821234584B01</td></tr></tbody></table><br class=3D""><hr = style=3D"border:none;border-top:1px solid #ccc;" class=3D""><br = class=3D""> </div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_328EDD08-7E1B-45BB-8C4D-3B30BE5DF33F--
participants (3)
-
Alastair Neil
-
Darrell Budic
-
Jorick Astrego