[ovirt-users] High latency on storage domains and sanlock renewal error

Sat May 13 20:11:06 UTC 2017

It's FC/FcoE.

This is with configuration suggested by emc/redhat

360060160a62134002818778f949de411 dm-5 DGC,VRAID
size=11T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:2 sdr  65:16  active ready running
| `- 2:0:1:2 sdy  65:128 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:2 sdd  8:48   active ready running
  `- 2:0:0:2 sdk  8:160  active ready running
360060160a6213400e622de69949de411 dm-2 DGC,VRAID
size=6.0T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:0 sdp  8:240  active ready running
| `- 2:0:1:0 sdw  65:96  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:0 sdb  8:16   active ready running
  `- 2:0:0:0 sdi  8:128  active ready running
360060160a6213400cce46e40949de411 dm-4 DGC,VRAID
size=560G features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:3 sds  65:32  active ready running
| `- 2:0:1:3 sdz  65:144 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:3 sde  8:64   active ready running
  `- 2:0:0:3 sdl  8:176  active ready running
360060160a6213400c4b39e80949de411 dm-3 DGC,VRAID
size=500G features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:1 sdq  65:0   active ready running
| `- 2:0:1:1 sdx  65:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:1 sdc  8:32   active ready running
  `- 2:0:0:1 sdj  8:144  active ready running
360060160a6213400fa2d31acbbfce511 dm-8 DGC,RAID 5
size=5.4T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:6 sdh  8:112  active ready running
| `- 2:0:0:6 sdo  8:224  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:1:6 sdv  65:80  active ready running
  `- 2:0:1:6 sdac 65:192 active ready running
360060160a621340040652b7582f5e511 dm-7 DGC,RAID 5
size=3.6T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:4 sdf  8:80   active ready running
| `- 2:0:0:4 sdm  8:192  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:1:4 sdt  65:48  active ready running
  `- 2:0:1:4 sdaa 65:160 active ready running
360060160a621340064b1034cbbfce511 dm-6 DGC,RAID 5
size=1.0T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:5 sdu  65:64  active ready running
| `- 2:0:1:5 sdab 65:176 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:5 sdg  8:96   active ready running
  `- 2:0:0:5 sdn  8:208  active ready running

This is with ovirt default conf:

360060160a6213400848e60af82f5e511 dm-3 DGC     ,RAID 5
size=3.6T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 12:0:0:4 sdj 8:144 active ready  running
| `- 13:0:1:4 sdd 8:48  active ready  running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 12:0:1:4 sdf 8:80  active ready  running
  `- 13:0:0:4 sdh 8:112 active ready  running
360060160a6213400005e425b6b10e611 dm-2 DGC     ,RAID 10
size=4.2T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 12:0:1:0 sde 8:64  active ready  running
| `- 13:0:0:0 sdg 8:96  active ready  running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 13:0:1:0 sdc 8:32  active ready  running
  `- 12:0:0:0 sdi 8:128 active ready  running

2017-05-13 18:50 GMT+02:00 Juan Pablo <pablo.localhost at gmail.com>:

> can you please give the output of:
> multipath -ll
> and
> iscsiadm -m session -P3
>
> JP
>
> 2017-05-13 6:48 GMT-03:00 Stefano Bovina <bovy89 at gmail.com>:
>
>> Hi,
>>
>> 2.6.32-696.1.1.el6.x86_64
>> 3.10.0-514.10.2.el7.x86_64
>>
>> I tried ioping test from different group of servers using multipath,
>> members of different storage group (different lun, different raid ecc), and
>> everyone report latency.
>> I tried the same test (ioping) on a server with powerpath instead of
>> multipath, with a dedicated raid group and ioping don't report latency.
>>
>>
>> 2017-05-13 2:00 GMT+02:00 Juan Pablo <pablo.localhost at gmail.com>:
>>
>>> sorry to jump in, but what kernel version are you using? had similar
>>> issue with kernel 4.10/4.11
>>>
>>>
>>> 2017-05-12 16:36 GMT-03:00 Stefano Bovina <bovy89 at gmail.com>:
>>>
>>>> Hi,
>>>> a little update:
>>>>
>>>> The command multipath -ll hung when executed on the host while the
>>>> problem occur (nothing logged in /var/log/messages or dmesg).
>>>>
>>>> I tested latency with ioping:
>>>> ioping /dev/6a386652-629d-4045-835b-21d2f5c104aa/metadata
>>>>
>>>> Usually it return "time=15.6 ms", sometimes return "time=19 s" (yes,
>>>> seconds)
>>>>
>>>> Systems are up to date and I tried both path_checker (emc_clariion and
>>>> directio), without results.
>>>> (https://access.redhat.com/solutions/139193, it refers to the Rev A31
>>>> of EMC document; last is A42 and suggest emc_clariion).
>>>>
>>>> Any idea or suggestion?
>>>>
>>>> Thanks,
>>>>
>>>> Stefano
>>>>
>>>> 2017-05-08 11:56 GMT+02:00 Yaniv Kaul <ykaul at redhat.com>:
>>>>
>>>>>
>>>>>
>>>>> On Mon, May 8, 2017 at 11:50 AM, Stefano Bovina <bovy89 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes,
>>>>>> this configuration is the one suggested by EMC for EL7.
>>>>>>
>>>>>
>>>>> https://access.redhat.com/solutions/139193 suggest that for alua, the
>>>>> patch checker needs to be different.
>>>>>
>>>>> Anyway, it is very likely that you have storage issues - they need to
>>>>> be resolved first and I believe they have little to do with oVirt at the
>>>>> moment.
>>>>> Y.
>>>>>
>>>>>
>>>>>>
>>>>>> By the way,
>>>>>> "The parameters rr_min_io vs. rr_min_io_rq mean the same thing but
>>>>>> are used for device-mapper-multipath on differing kernel versions." and
>>>>>> rr_min_io_rq default value is 1, rr_min_io default value is 1000, so it
>>>>>> should be fine.
>>>>>>
>>>>>>
>>>>>> 2017-05-08 9:39 GMT+02:00 Yaniv Kaul <ykaul at redhat.com>:
>>>>>>
>>>>>>>
>>>>>>> On Sun, May 7, 2017 at 1:27 PM, Stefano Bovina <bovy89 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sense data are 0x0/0x0/0x0
>>>>>>>
>>>>>>>
>>>>>>> Interesting - first time I'm seeing 0/0/0. The 1st is usually 0x2
>>>>>>> (see [1]), and then the rest [2], [3] make sense.
>>>>>>>
>>>>>>> A google search found another user with Clarion with the exact same
>>>>>>> error[4], so I'm leaning toward misconfiguration of multipathing/clarion
>>>>>>> here.
>>>>>>>
>>>>>>> Is your multipathing configuration working well for you?
>>>>>>> Are you sure it's a EL7 configuration? For example, I believe you
>>>>>>> should have rr_min_io_rq and not rr_min_io .
>>>>>>> Y.
>>>>>>>
>>>>>>> [1] http://www.t10.org/lists/2status.htm
>>>>>>> [2] http://www.t10.org/lists/2sensekey.htm
>>>>>>> [3] http://www.t10.org/lists/asc-num.htm
>>>>>>> [4] http://www.linuxquestions.org/questions/centos-111/multi
>>>>>>> path-problems-4175544908/
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170513/4f868a20/attachment-0001.html>