vdsm-4.16.10-8.gitc937927.el6
glusterfs-3.6.2-1.el6
2.6.32 - 504.8.1.el6.x86_64
moved to 3.6 specifically to get the snapshotting feature, hence my desire
to migrate to thinly provisioned lvm bricks.
On 20 March 2015 at 14:57, Darrell Budic <budic(a)onholyground.com> wrote:
What version of gluster are you running on these?
I’ve seen high load during heals bounce my hosted engine around due to
overall system load, but never pause anything else. Cent 7 combo
storage/host systems, gluster 3.5.2.
On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.tech(a)gmail.com> wrote:
Pranith
I have run a pretty straightforward test. I created a two brick 50 G
replica volume with normal lvm bricks, and installed two servers, one
centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate
some file system activity and then made the volume replica 3. I saw no
issues on the servers.
Not clear if this is a sufficiently rigorous test and the Volume I have
had issues on is a 3TB volume with about 2TB used.
-Alastair
On 19 March 2015 at 12:30, Alastair Neil <ajneil.tech(a)gmail.com> wrote:
> I don't think I have the resources to test it meaningfully. I have about
> 50 vms on my primary storage domain. I might be able to set up a small 50
> GB volume and provision 2 or 3 vms running test loads but I'm not sure it
> would be comparable. I'll give it a try and let you know if I see similar
> behaviour.
>
> On 19 March 2015 at 11:34, Pranith Kumar Karampuri <pkarampu(a)redhat.com>
> wrote:
>
>> Without thinly provisioned lvm.
>>
>> Pranith
>>
>> On 03/19/2015 08:01 PM, Alastair Neil wrote:
>>
>> do you mean raw partitions as bricks or simply with out thin provisioned
>> lvm?
>>
>>
>>
>> On 19 March 2015 at 00:32, Pranith Kumar Karampuri <pkarampu(a)redhat.com>
>> wrote:
>>
>>> Could you let me know if you see this problem without lvm as well?
>>>
>>> Pranith
>>>
>>> On 03/18/2015 08:25 PM, Alastair Neil wrote:
>>>
>>> I am in the process of replacing the bricks with thinly provisioned lvs
>>> yes.
>>>
>>>
>>>
>>> On 18 March 2015 at 09:35, Pranith Kumar Karampuri <pkarampu(a)redhat.com
>>> > wrote:
>>>
>>>> hi,
>>>> Are you using thin-lvm based backend on which the bricks are
>>>> created?
>>>>
>>>> Pranith
>>>>
>>>> On 03/18/2015 02:05 AM, Alastair Neil wrote:
>>>>
>>>> I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There
>>>> are two virtualisation clusters one with two nehelem nodes and one with
>>>> four sandybridge nodes. My master storage domain is a GlusterFS backed
by
>>>> a replica 3 gluster volume from 3 of the gluster nodes. The engine is a
>>>> hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage broviede
by
>>>> nfs from a different gluster volume. All the hosts are CentOS 6.6.
>>>>
>>>> vdsm-4.16.10-8.gitc937927.el6
>>>>> glusterfs-3.6.2-1.el6
>>>>> 2.6.32 - 504.8.1.el6.x86_64
>>>>
>>>>
>>>> Problems happen when I try to add a new brick or replace a brick
>>>> eventually the self heal will kill the VMs. In the VM's logs I see
kernel
>>>> hung task messages.
>>>>
>>>> Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for
>>>>> more than 120 seconds.
>>>>> Mar 12 23:05:16 static1 kernel: Not tainted
>>>>> 2.6.32-504.3.3.el6.x86_64 #1
>>>>> Mar 12 23:05:16 static1 kernel: "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001
>>>>> 0 1736 1735 0x00000080
>>>>> Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082
>>>>> 0000000000000000 00000000000126c0
>>>>> Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080
>>>>> 0006ce5c85bd9185 ffff88007e5c64d0
>>>>> Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba
>>>>> ffff88007a615098 ffff8800778b1fd8
>>>>> Mar 12 23:05:16 static1 kernel: Call Trace:
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>]
>>>>> schedule_timeout+0x215/0x2e0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>]
>>>>> wait_for_common+0x123/0x180
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ?
>>>>> default_wake_function+0x0/0x20
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ?
>>>>> _xfs_buf_read+0x46/0x60 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ?
>>>>> xfs_trans_read_buf+0x197/0x410 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>]
>>>>> wait_for_completion+0x1d/0x20
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>]
>>>>> xfs_buf_iowait+0x9b/0x100 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ?
>>>>> xfs_trans_read_buf+0x197/0x410 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>]
>>>>> _xfs_buf_read+0x46/0x60 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>]
>>>>> xfs_buf_read+0xab/0x100 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>]
>>>>> xfs_trans_read_buf+0x197/0x410 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>]
>>>>> xfs_imap_to_bp+0x54/0x130 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>]
>>>>> xfs_iread+0x7b/0x1b0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ?
>>>>> inode_init_always+0x11e/0x1c0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>]
>>>>> xfs_iget+0x27e/0x6e0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ?
>>>>> xfs_iunlock+0x5d/0xd0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>]
>>>>> xfs_lookup+0xc6/0x110 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>]
>>>>> xfs_vn_lookup+0x54/0xa0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>]
>>>>> do_lookup+0x1a5/0x230
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>]
>>>>> __link_path_walk+0x7a4/0x1000
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ?
>>>>> cache_grow+0x217/0x320
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>]
>>>>> path_walk+0x6a/0xe0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>]
>>>>> filename_lookup+0x6b/0xc0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>]
>>>>> user_path_at+0x57/0xa0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ?
>>>>> _xfs_trans_commit+0x214/0x2a0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ?
>>>>> xfs_iunlock+0x7e/0xd0 [xfs]
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>]
>>>>> vfs_fstatat+0x50/0xa0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ?
>>>>> touch_atime+0x14d/0x1a0
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>]
>>>>> vfs_stat+0x1b/0x20
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>]
>>>>> sys_newstat+0x24/0x50
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ?
>>>>> audit_syscall_entry+0x1d7/0x200
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ?
>>>>> __audit_syscall_exit+0x25e/0x290
>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>]
>>>>> system_call_fastpath+0x16/0x1b
>>>>
>>>>
>>>>
>>>> I am wondering if my volume settings are causing this. Can anyone
>>>> with more knowledge take a look and let me know:
>>>>
>>>> network.remote-dio: on
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.read-ahead: off
>>>>> performance.quick-read: off
>>>>> nfs.export-volumes: on
>>>>> network.ping-timeout: 20
>>>>> cluster.self-heal-readdir-size: 64KB
>>>>> cluster.quorum-type: auto
>>>>> cluster.data-self-heal-algorithm: diff
>>>>> cluster.self-heal-window-size: 8
>>>>> cluster.heal-timeout: 500
>>>>> cluster.self-heal-daemon: on
>>>>> cluster.entry-self-heal: on
>>>>> cluster.data-self-heal: on
>>>>> cluster.metadata-self-heal: on
>>>>> cluster.readdir-optimize: on
>>>>> cluster.background-self-heal-count: 20
>>>>> cluster.rebalance-stats: on
>>>>> cluster.min-free-disk: 5%
>>>>> cluster.eager-lock: enable
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> auth.allow:*
>>>>> user.cifs: disable
>>>>> cluster.server-quorum-ratio: 51%
>>>>
>>>>
>>>> Many Thanks, Alastair
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing
listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>>
>>
>>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users