[ovirt-users] Storage domain experienced a high latency
Nir Soffer
nsoffer at redhat.com
Fri Feb 10 16:42:58 UTC 2017
On Thu, Feb 9, 2017 at 10:03 AM, Grundmann, Christian <
Christian.Grundmann at fabasoft.com> wrote:
> Hi,
>
> @ Can also be low level issue in kernel, hba, switch, server.
> I have the old storage on the same cable so I don’t think its hba or
> switch related
> On the same Switch I have a few ESXi Server with same storage setup which
> are working without problems.
>
> @multipath
> I use stock ng-node multipath configuration
>
> # VDSM REVISION 1.3
>
> defaults {
> polling_interval 5
> no_path_retry fail
> user_friendly_names no
> flush_on_last_del yes
> fast_io_fail_tmo 5
> dev_loss_tmo 30
> max_fds 4096
> }
>
> # Remove devices entries when overrides section is available.
> devices {
> device {
> # These settings overrides built-in devices settings. It does not
> apply
> # to devices without built-in settings (these use the settings in
> the
> # "defaults" section), or to devices defined in the "devices"
> section.
> # Note: This is not available yet on Fedora 21. For more info see
> # https://bugzilla.redhat.com/1253799
> all_devs yes
> no_path_retry fail
> }
> }
>
> # Enable when this section is available on all supported platforms.
> # Options defined here override device specific options embedded into
> # multipathd.
> #
> # overrides {
> # no_path_retry fail
> # }
>
>
> multipath -r v3
> has no output
>
My mistake, the correct command is:
multipath -r -v3
It creates tons of output, so better redirect to file and attach the file:
multipath -r -v3 > multiapth-r-v3.out
>
>
> Thx Christian
>
>
> Von: Nir Soffer [mailto:nsoffer at redhat.com]
> Gesendet: Mittwoch, 08. Februar 2017 20:44
> An: Grundmann, Christian <Christian.Grundmann at fabasoft.com>
> Cc: users at ovirt.org
> Betreff: Re: [ovirt-users] Storage domain experienced a high latency
>
> On Wed, Feb 8, 2017 at 6:11 PM, Grundmann, Christian <mailto:
> Christian.Grundmann at fabasoft.com> wrote:
> Hi,
> got a new FC Storage (EMC Unity 300F) which is seen by my Hosts additional
> to my old Storage for Migration.
> New Storage has only on PATH until Migration is done.
> I already have a few VMs running on the new Storage without Problem.
> But after starting some VMs (don’t really no whats the difference to
> working ones), the Path for new Storage fails.
>
> Engine tells me: Storage Domain <storagedomain> experienced a high latency
> of 22.4875 seconds from host <host>
>
> Where can I start looking?
>
> In /var/log/messages I found:
>
> Feb 8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> sdd - emc_clariion_checker: Active path is healthy.
> Feb 8 09:03:53 ovirtnode01 multipathd: 8:48: reinstated
> Feb 8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> remaining active paths: 1
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 8
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 5833475
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 5833475
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb 8 09:03:53 ovirtnode01 kernel: Buffer I/O error on dev dm-207,
> logical block 97, async page read
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb 8 09:03:53 ovirtnode01 kernel: device-mapper: multipath: Reinstating
> path 8:48.
> Feb 8 09:03:53 ovirtnode01 kernel: sd 3:0:0:22: alua: port group 01 state
> A preferred supports tolUsNA
> Feb 8 09:03:53 ovirtnode01 sanlock[5192]: 2017-02-08 09:03:53+0100 151809
> [11772]: s59 add_lockspace fail result -202
> Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: devmap not registered,
> can't remove
> Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
> Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: devmap not registered,
> can't remove
> Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
> Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: devmap not registered,
> can't remove
> Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
> Feb 8 09:04:08 ovirtnode01 kernel: dd: sending ioctl 80306d02 to a
> partition!
> Feb 8 09:04:24 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:24+0100 151840
> [15589]: read_sectors delta_leader offset 2560 rv -202
> /dev/f9b70017-0a34-47bc-bf2f-dfc70200a347/ids
> Feb 8 09:04:34 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:34+0100 151850
> [15589]: f9b70017 close_task_aio 0 0x7fd78c0008c0 busy
> Feb 8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> sdd - emc_clariion_checker: Read error for WWN
> 60060160422143002a38935800ae2760. Sense data are 0x0/0x0/0x0.
> Feb 8 09:04:39 ovirtnode01 multipathd: checker failed path 8:48 in map
> 360060160422143002a38935800ae2760
> Feb 8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760:
> remaining active paths: 0
> Feb 8 09:04:39 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort
> command issued nexus=3:0:22 -- 1 2002.
> Feb 8 09:04:39 ovirtnode01 kernel: device-mapper: multipath: Failing path
> 8:48.
> Feb 8 09:04:40 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort
> command issued nexus=3:0:22 -- 1 2002.
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: 8 callbacks
> suppressed
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967168
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 4294967280
> Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev
> dm-10, sector 0
>
> Maybe you should consult the storage vendor about this?
>
> Can be also incorrect multipath configuration, maybe multipatch checker,
> fail, and because you have one path the device moved to faulty state, and
> sanlock fail to access the device.
>
> Can also be low level issue in kernel, hba, switch, server.
>
> Lets start by inspecting multipath configuration, can you share
> output of:
>
> cat /etc/multiapth.conf
> multipath -r v3
>
> Maybe you can expose one lun for testing, and blacklist this lun in
> multipath.conf. You will not be able to use this lun in ovirt, but it can
> be used to validate the layers below multipath. If a plain lun is ok,
> and same lun used a multipath device fails, the problem is likely to be
> multipath configuration.
>
> Nir
>
>
>
> multipath -ll output for this Domain
>
> 360060160422143002a38935800ae2760 dm-10 DGC ,VRAID
> size=2.0T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=50 status=active
> `- 3:0:0:22 sdd 8:48 active ready running
>
>
> Thx Christian
>
>
>
> _______________________________________________
> Users mailing list
> mailto:Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170210/bb6ad1c5/attachment.html>
More information about the Users
mailing list