[ovirt-users] Storage domain experienced a high latency

Fri Feb 10 16:42:58 UTC 2017

On Thu, Feb 9, 2017 at 10:03 AM, Grundmann, Christian <
Christian.Grundmann at fabasoft.com> wrote:

> Hi,
>
> @ Can also be low level issue in kernel, hba, switch, server.
> I have the old storage on the same cable so I don’t think its hba or
> switch related
> On the same Switch I have a few ESXi Server with same storage setup which
> are working without problems.
>
> @multipath
> I use stock ng-node multipath configuration
>
> # VDSM REVISION 1.3
>
> defaults {
>     polling_interval            5
>     no_path_retry               fail
>     user_friendly_names         no
>     flush_on_last_del           yes
>     fast_io_fail_tmo            5
>     dev_loss_tmo                30
>     max_fds                     4096
> }
>
> # Remove devices entries when overrides section is available.
> devices {
>     device {
>         # These settings overrides built-in devices settings. It does not
> apply
>         # to devices without built-in settings (these use the settings in
> the
>         # "defaults" section), or to devices defined in the "devices"
> section.
>         # Note: This is not available yet on Fedora 21. For more info see
>         # https://bugzilla.redhat.com/1253799
>         all_devs                yes
>         no_path_retry           fail
>     }
> }
>
> # Enable when this section is available on all supported platforms.
> # Options defined here override device specific options embedded into
> # multipathd.
> #
> # overrides {
> #      no_path_retry           fail
> # }
>
>
> multipath -r v3
> has no output
>

My mistake, the correct command is:

multipath -r -v3

It creates tons of output, so better redirect to file and attach the file:

multipath -r -v3 > multiapth-r-v3.out

>
>
> Thx Christian
>
>
> Von: Nir Soffer [mailto:nsoffer at redhat.com]
> Gesendet: Mittwoch, 08. Februar 2017 20:44
> An: Grundmann, Christian <Christian.Grundmann at fabasoft.com>
> Cc: users at ovirt.org
> Betreff: Re: [ovirt-users] Storage domain experienced a high latency
>
> On Wed, Feb 8, 2017 at 6:11 PM, Grundmann, Christian <mailto:
> Christian.Grundmann at fabasoft.com> wrote:
> Hi,
> got a new FC Storage (EMC Unity 300F) which is seen by my Hosts additional
> to my old Storage for Migration.
> New Storage has only on PATH until Migration is done.
> I already have a few VMs running on the new Storage without Problem.
> But after starting some VMs (don’t really no whats the difference to
> working ones), the Path for new Storage fails.
>
> Engine tells me: Storage Domain <storagedomain> experienced a high latency
> of 22.4875 seconds from host <host>
>
> Where can I start looking?
>
> In /var/log/messages I found:
>
> Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> sdd - emc_clariion_checker: Active path is healthy.
> Feb  8 09:03:53 ovirtnode01 multipathd: 8:48: reinstated
> Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> remaining active paths: 1
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 8
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 5833475
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 5833475
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb  8 09:03:53 ovirtnode01 kernel: Buffer I/O error on dev dm-207,
> logical block 97, async page read
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb  8 09:03:53 ovirtnode01 kernel: device-mapper: multipath: Reinstating
> path 8:48.
> Feb  8 09:03:53 ovirtnode01 kernel: sd 3:0:0:22: alua: port group 01 state
> A preferred supports tolUsNA
> Feb  8 09:03:53 ovirtnode01 sanlock[5192]: 2017-02-08 09:03:53+0100 151809
> [11772]: s59 add_lockspace fail result -202
> Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: devmap not registered,
> can't remove
> Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
> Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: devmap not registered,
> can't remove
> Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
> Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: devmap not registered,
> can't remove
> Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb  8 09:04:08 ovirtnode01 kernel: dd: sending ioctl 80306d02 to a
> partition!
> Feb  8 09:04:24 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:24+0100 151840
> [15589]: read_sectors delta_leader offset 2560 rv -202
> /dev/f9b70017-0a34-47bc-bf2f-dfc70200a347/ids
> Feb  8 09:04:34 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:34+0100 151850
> [15589]: f9b70017 close_task_aio 0 0x7fd78c0008c0 busy
> Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> sdd - emc_clariion_checker: Read error for WWN
> 60060160422143002a38935800ae2760.  Sense data are 0x0/0x0/0x0.
> Feb  8 09:04:39 ovirtnode01 multipathd: checker failed path 8:48 in map
> 360060160422143002a38935800ae2760
> Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> remaining active paths: 0
> Feb  8 09:04:39 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort
> command issued nexus=3:0:22 --  1 2002.
> Feb  8 09:04:39 ovirtnode01 kernel: device-mapper: multipath: Failing path
> 8:48.
> Feb  8 09:04:40 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort
> command issued nexus=3:0:22 --  1 2002.
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: 8 callbacks
> suppressed
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
>
> Maybe you should consult the storage vendor about this?
>
> Can be also incorrect multipath configuration, maybe multipatch checker,
> fail, and because you have one path the device moved to faulty state, and
> sanlock fail to access the device.
>
> Can also be low level issue in kernel, hba, switch, server.
>
> Lets start by inspecting multipath configuration, can you share
> output of:
>
> cat /etc/multiapth.conf
> multipath -r v3
>
> Maybe you can expose one lun for testing, and blacklist this lun in
> multipath.conf. You will not be able to use this lun in ovirt, but it can
> be used to validate the layers below multipath. If a plain lun is ok,
> and same lun used a multipath device fails, the problem is likely to be
> multipath configuration.
>
> Nir
>
>
>
> multipath -ll output for this Domain
>
> 360060160422143002a38935800ae2760 dm-10 DGC     ,VRAID
> size=2.0T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=50 status=active
>   `- 3:0:0:22 sdd 8:48  active ready  running
>
>
> Thx Christian
>
>
>
> _______________________________________________
> Users mailing list
> mailto:Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170210/bb6ad1c5/attachment.html>