[ovirt-users] Huge Glsuter Issues - oVirt 4.1.7
Kasturi Narra
knarra at redhat.com
Fri Nov 24 08:40:02 UTC 2017
Hi Florian,
Are you seeing these issues with gfapi or fuse access as well ?
Thanks
kasturi
On Fri, Nov 24, 2017 at 3:06 AM, Florian Nolden <f.nolden at xilloc.com> wrote:
> I have the same issue when I run backup tasks during the night.
>
> I have a Gluster setup with a 1TB SSD on each of the tree nodes. Maybe its
> related to bug: https://bugzilla.redhat.com/show_bug.cgi?id=1430847
>
> sanlock.log:
> 2017-11-23 00:46:42 3410597 [1114]: s15 check_our_lease warning 60
> last_success 3410537
> 2017-11-23 00:46:43 3410598 [1114]: s15 check_our_lease warning 61
> last_success 3410537
> 2017-11-23 00:46:44 3410599 [1114]: s15 check_our_lease warning 62
> last_success 3410537
> 2017-11-23 00:46:45 3410600 [1114]: s15 check_our_lease warning 63
> last_success 3410537
> 2017-11-23 00:46:46 3410601 [1114]: s15 check_our_lease warning 64
> last_success 3410537
> 2017-11-23 00:46:47 3410602 [1114]: s15 check_our_lease warning 65
> last_success 3410537
> 2017-11-23 00:46:48 3410603 [1114]: s15 check_our_lease warning 66
> last_success 3410537
> 2017-11-23 00:46:49 3410603 [28384]: s15 delta_renew long write time 46 sec
> 2017-11-23 00:46:49 3410603 [28384]: s15 renewed 3410557 delta_length 46
> too long
> 2017-11-23 02:48:04 3417878 [28384]: s15 delta_renew long write time 10 sec
> 2017-11-23 02:57:23 3418438 [28384]: s15 delta_renew long write time 34 sec
> 2017-11-23 02:57:23 3418438 [28384]: s15 renewed 3418404 delta_length 34
> too long
>
>
> vdsm.log | grep "WARN"
> 017-11-23 00:20:05,544+0100 WARN (jsonrpc/0) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=63.7199999997) (vm:5109)
> 2017-11-23 00:20:06,840+0100 WARN (check/loop) [storage.check] Checker
> u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/
> f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata' is blocked for
> 10.00 seconds (check:279)
> 2017-11-23 00:20:13,853+0100 WARN (periodic/170)
> [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'>
> on [u'e1f26ea9-9294-4d9c-8f70-d59f96dec5f7'] (periodic:308)
> 2017-11-23 00:20:15,031+0100 WARN (jsonrpc/2) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=73.21) (vm:5109)
> 2017-11-23 00:20:20,586+0100 WARN (jsonrpc/4) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=78.7599999998) (vm:5109)
> 2017-11-23 00:21:06,849+0100 WARN (check/loop) [storage.check] Checker
> u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/
> f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata' is blocked for
> 10.01 seconds (check:279)
> 2017-11-23 00:21:13,847+0100 WARN (periodic/167)
> [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'>
> on [u'd8f22423-9fe3-4c06-97dc-5c9e9f5b33c8'] (periodic:308)
> 2017-11-23 00:22:13,854+0100 WARN (periodic/172)
> [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'>
> on [u'd8f22423-9fe3-4c06-97dc-5c9e9f5b33c8'] (periodic:308)
> 2017-11-23 00:22:16,846+0100 WARN (check/loop) [storage.check] Checker
> u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/
> f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata' is blocked for 9.99
> seconds (check:279)
> 2017-11-23 00:23:06,040+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=64.2199999997) (vm:5109)
> 2017-11-23 00:23:06,850+0100 WARN (check/loop) [storage.check] Checker
> u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/
> f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata' is blocked for 9.98
> seconds (check:279)
> 2017-11-23 00:23:13,845+0100 WARN (periodic/169)
> [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'>
> on [u'5ef506de-44b9-4ced-9b7f-b90ee098f4f7'] (periodic:308)
> 2017-11-23 00:23:16,855+0100 WARN (jsonrpc/7) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=75.0300000003) (vm:5109)
> 2017-11-23 00:23:21,082+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=79.2599999998) (vm:5109)
> 2017-11-23 00:25:31,488+0100 WARN (libvirt/events) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') unknown eventid 8 args
> ('/rhev/data-center/00000001-0001-0001-0001-000000000370/
> f0e21aae-1237-4dd3-88ec-81254d29c372/images/1a1b9620-
> 52fc-4008-9047-15cd725f8bd8/90b
> 913ba-e03f-46c5-bccf-bae011fcdd55', 4, 3, 8) (clientIF:549)
> 2017-11-23 00:25:32,372+0100 WARN (libvirt/events) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') unknown eventid 8 args
> ('/rhev/data-center/00000001-0001-0001-0001-000000000370/
> f0e21aae-1237-4dd3-88ec-81254d29c372/images/1a1b9620-
> 52fc-4008-9047-15cd725f8bd8/90b
> 913ba-e03f-46c5-bccf-bae011fcdd55', 4, 0, 8) (clientIF:549)
> 2017-11-23 00:45:56,851+0100 WARN (check/loop) [storage.check] Checker
> u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/
> f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata' is blocked for
> 10.00 seconds (check:279)
> 2017-11-23 00:46:13,850+0100 WARN (periodic/172)
> [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'>
> on [u'e1f26ea9-9294-4d9c-8f70-d59f96dec5f7', u'5ef506de-44b9-4ced-9b7f-b90ee098f4f7']
> (periodic:308)
> 2017-11-23 00:46:36,013+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=64.0899999999) (vm:5109)
> 2017-11-23 00:46:38,805+0100 WARN (jsonrpc/2) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=66.8799999999) (vm:5109)
> 2017-11-23 00:46:40,439+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='930ecaca-ef2f-490a-a4df-e4f0dad218aa') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,440+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,441+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,442+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,442+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='0cf9b0cb-7c53-4bab-b879-0bdf190b293c') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,443+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,444+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,445+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='d8f22423-9fe3-4c06-97dc-5c9e9f5b33c8') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,446+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='ea36f7bd-1790-4b42-b7e1-6d8e2ef0487b') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:40,446+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='82ed235e-37bb-4d67-8db9-61d39340f951') monitor became unresponsive
> (command timeout, age=68.5199999996) (vm:5109)
> 2017-11-23 00:46:46,116+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='930ecaca-ef2f-490a-a4df-e4f0dad218aa') monitor became unresponsive
> (command timeout, age=74.1899999995) (vm:5109)
> 2017-11-23 00:46:46,118+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
> (command timeout, age=74.1899999995) (vm:5109)
> 2017-11-23 00:46:46,119+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,120+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,121+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0cf9b0cb-7c53-4bab-b879-0bdf190b293c') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,123+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,124+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,125+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='d8f22423-9fe3-4c06-97dc-5c9e9f5b33c8') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,127+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='ea36f7bd-1790-4b42-b7e1-6d8e2ef0487b') monitor became unresponsive
> (command timeout, age=74.1999999993) (vm:5109)
> 2017-11-23 00:46:46,128+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='82ed235e-37bb-4d67-8db9-61d39340f951') monitor became unresponsive
> (command timeout, age=74.21) (vm:5109)
> 2017-11-23 00:46:46,509+0100 WARN (jsonrpc/3) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=74.5899999999) (vm:5109)
> 2017-11-23 00:46:48,187+0100 WARN (jsonrpc/7) [virt.vm]
> (vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
> (command timeout, age=76.2599999998) (vm:5109)
> 2017-11-23 00:46:49,825+0100 WARN (periodic/173)
> [virt.sampling.StatsCache] dropped stale old sample: sampled 7705208.650000
> stored 7705268.650000 (sampling:442)
> 2017-11-23 00:46:49,835+0100 WARN (periodic/176)
> [virt.sampling.StatsCache] dropped stale old sample: sampled 7705253.650000
> stored 7705268.650000 (sampling:442)
> 2017-11-23 00:46:49,854+0100 WARN (periodic/171)
> [virt.sampling.StatsCache] dropped stale old sample: sampled 7705238.650000
> stored 7705268.650000 (sampling:442)
> 2017-11-23 00:46:49,866+0100 WARN (periodic/174)
> [virt.sampling.StatsCache] dropped stale old sample: sampled 7705223.650000
> stored 7705268.650000 (sampling:442)
> 2017-11-23 00:46:55,488+0100 WARN (jsonrpc/0) [virt.vm]
> (vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
> (command timeout, age=83.5699999994) (vm:5109)
> 2017-11-23 00:46:55,488+0100 WARN (jsonrpc/0) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=83.5699999994) (vm:5109)
> 2017-11-23 00:46:55,489+0100 WARN (jsonrpc/0) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=83.5699999994) (vm:5109)
> 2017-11-23 00:46:55,491+0100 WARN (jsonrpc/0) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=83.5699999994) (vm:5109)
> 2017-11-23 00:47:01,742+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
> (command timeout, age=89.8199999994) (vm:5109)
> 2017-11-23 00:47:01,743+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=89.8199999994) (vm:5109)
> 2017-11-23 00:47:01,744+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=89.8199999994) (vm:5109)
> 2017-11-23 00:47:01,746+0100 WARN (jsonrpc/1) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=89.8199999994) (vm:5109)
> 2017-11-23 00:47:10,531+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=98.6099999994) (vm:5109)
> 2017-11-23 00:47:10,532+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=98.6099999994) (vm:5109)
> 2017-11-23 00:47:10,534+0100 WARN (jsonrpc/6) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=98.6099999994) (vm:5109)
> 2017-11-23 00:47:16,950+0100 WARN (jsonrpc/7) [virt.vm]
> (vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
> (command timeout, age=105.029999999) (vm:5109)
> 2017-11-23 00:47:16,951+0100 WARN (jsonrpc/7) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=105.029999999) (vm:5109)
> 2017-11-23 00:47:16,953+0100 WARN (jsonrpc/7) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=105.029999999) (vm:5109)
> 2017-11-23 00:47:25,578+0100 WARN (jsonrpc/4) [virt.vm]
> (vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
> (command timeout, age=113.659999999) (vm:5109)
> 2017-11-23 00:47:25,581+0100 WARN (jsonrpc/4) [virt.vm]
> (vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
> (command timeout, age=113.659999999) (vm:5109)
>
> Kind regards,
>
>
> Florian Nolden
>
> Head of IT at Xilloc Medical B.V.
>
> ———————————————————————————————
>
> Disclaimer: The content of this e-mail, including any attachments, are
> confidential and are intended for the sole use of the individual or entity
> to which it is addressed. If you have received it by mistake please let us
> know by reply and then delete it from your system. Any distribution,
> copying or dissemination of this message is expected to conform to all
> legal stipulations governing the use of information.
>
> 2017-11-23 11:25 GMT+01:00 Sven Achtelik <Sven.Achtelik at eps.aero>:
>
>> Hi All,
>>
>>
>>
>> I’m experiencing huge issues when working with big VMs on Gluster
>> volumes. Doing a Snapshot or removing a big Disk lead to the effect that
>> the SPM node is getting non responsive. Fencing is than kicking in and
>> taking the node down with the hard reset/reboot.
>>
>>
>>
>> My setup has three nodes with 10Gbit/s NICs for the Gluster network. The
>> Bricks are on Raid-6 with a 1GB cache on the raid controller and the
>> volumes are setup as follows:
>>
>>
>>
>> Volume Name: data
>>
>> Type: Replicate
>>
>> Volume ID: c734d678-91e3-449c-8a24-d26b73bef965
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: ovirt-node01-gfs.storage.lan:/gluster/brick2/data
>>
>> Brick2: ovirt-node02-gfs.storage.lan:/gluster/brick2/data
>>
>> Brick3: ovirt-node03-gfs.storage.lan:/gluster/brick2/data
>>
>> Options Reconfigured:
>>
>> features.barrier: disable
>>
>> cluster.granular-entry-heal: enable
>>
>> performance.readdir-ahead: on
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: on
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: off
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>> features.shard: on
>>
>> features.shard-block-size: 512MB
>>
>> performance.low-prio-threads: 32
>>
>> cluster.data-self-heal-algorithm: full
>>
>> cluster.locking-scheme: granular
>>
>> cluster.shd-wait-qlength: 10000
>>
>> cluster.shd-max-threads: 6
>>
>> network.ping-timeout: 30
>>
>> user.cifs: off
>>
>> nfs.disable: on
>>
>> performance.strict-o-direct: on
>>
>> server.event-threads: 4
>>
>> client.event-threads: 4
>>
>>
>>
>> It feel like the System looks up during snapshotting or removing of a big
>> disk and this delay triggers things to go wrong. Is there anything that is
>> not setup right on my gluster or is this behavior normal with bigger disks
>> (50GB+) ? Is there a reliable option for caching with SSDs ?
>>
>>
>>
>> Thank you,
>>
>> Sven
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171124/ea8ce4ba/attachment.html>
More information about the Users
mailing list