On Thu, Feb 9, 2017 at 10:03 AM, Grundmann, Christian <Christian.Grundmann@fabasoft.com> wrote:
Hi,

@ Can also be low level issue in kernel, hba, switch, server.
I have the old storage on the same cable so I don’t think its hba or switch related
On the same Switch I have a few ESXi Server with same storage setup which are working without problems.

@multipath
I use stock ng-node multipath configuration

# VDSM REVISION 1.3

defaults {
    polling_interval            5
    no_path_retry               fail
    user_friendly_names         no
    flush_on_last_del           yes
    fast_io_fail_tmo            5
    dev_loss_tmo                30
    max_fds                     4096
}

# Remove devices entries when overrides section is available.
devices {
    device {
        # These settings overrides built-in devices settings. It does not apply
        # to devices without built-in settings (these use the settings in the
        # "defaults" section), or to devices defined in the "devices" section.
        # Note: This is not available yet on Fedora 21. For more info see
        # https://bugzilla.redhat.com/1253799
        all_devs                yes
        no_path_retry           fail
    }
}

# Enable when this section is available on all supported platforms.
# Options defined here override device specific options embedded into
# multipathd.
#
# overrides {
#      no_path_retry           fail
# }


multipath -r v3
has no output

My mistake, the correct command is:

multipath -r -v3

It creates tons of output, so better redirect to file and attach the file:

multipath -r -v3 > multiapth-r-v3.out
 


Thx Christian


Von: Nir Soffer [mailto:nsoffer@redhat.com]
Gesendet: Mittwoch, 08. Februar 2017 20:44
An: Grundmann, Christian <Christian.Grundmann@fabasoft.com>
Cc: users@ovirt.org
Betreff: Re: [ovirt-users] Storage domain experienced a high latency

On Wed, Feb 8, 2017 at 6:11 PM, Grundmann, Christian <mailto:Christian.Grundmann@fabasoft.com> wrote:
Hi,
got a new FC Storage (EMC Unity 300F) which is seen by my Hosts additional to my old Storage for Migration.
New Storage has only on PATH until Migration is done.
I already have a few VMs running on the new Storage without Problem.
But after starting some VMs (don’t really no whats the difference to working ones), the Path for new Storage fails.
 
Engine tells me: Storage Domain <storagedomain> experienced a high latency of 22.4875 seconds from host <host>
 
Where can I start looking?
 
In /var/log/messages I found:
 
Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760: sdd - emc_clariion_checker: Active path is healthy.
Feb  8 09:03:53 ovirtnode01 multipathd: 8:48: reinstated
Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2760: remaining active paths: 1
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 8
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168
Feb  8 09:03:53 ovirtnode01 kernel: Buffer I/O error on dev dm-207, logical block 97, async page read
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0
Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168
Feb  8 09:03:53 ovirtnode01 kernel: device-mapper: multipath: Reinstating path 8:48.
Feb  8 09:03:53 ovirtnode01 kernel: sd 3:0:0:22: alua: port group 01 state A preferred supports tolUsNA
Feb  8 09:03:53 ovirtnode01 sanlock[5192]: 2017-02-08 09:03:53+0100 151809 [11772]: s59 add_lockspace fail result -202
Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: devmap not registered, can't remove
Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)
Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: devmap not registered, can't remove
Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)
Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: devmap not registered, can't remove
Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)
Feb  8 09:04:08 ovirtnode01 kernel: dd: sending ioctl 80306d02 to a partition!
Feb  8 09:04:24 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:24+0100 151840 [15589]: read_sectors delta_leader offset 2560 rv -202 /dev/f9b70017-0a34-47bc-bf2f-dfc70200a347/ids
Feb  8 09:04:34 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:34+0100 151850 [15589]: f9b70017 close_task_aio 0 0x7fd78c0008c0 busy
Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760: sdd - emc_clariion_checker: Read error for WWN 60060160422143002a38935800ae2760.  Sense data are 0x0/0x0/0x0.
Feb  8 09:04:39 ovirtnode01 multipathd: checker failed path 8:48 in map 360060160422143002a38935800ae2760
Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2760: remaining active paths: 0
Feb  8 09:04:39 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 --  1 2002.
Feb  8 09:04:39 ovirtnode01 kernel: device-mapper: multipath: Failing path 8:48.
Feb  8 09:04:40 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 --  1 2002.
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: 8 callbacks suppressed
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280
Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0

Maybe you should consult the storage vendor about this?

Can be also incorrect multipath configuration, maybe multipatch checker,
fail, and because you have one path the device moved to faulty state, and 
sanlock fail to access the device.

Can also be low level issue in kernel, hba, switch, server.

Lets start by inspecting multipath configuration, can you share
output of:

cat /etc/multiapth.conf
multipath -r v3

Maybe you can expose one lun for testing, and blacklist this lun in 
multipath.conf. You will not be able to use this lun in ovirt, but it can
be used to validate the layers below multipath. If a plain lun is ok, 
and same lun used a multipath device fails, the problem is likely to be
multipath configuration.
 
Nir

 
 
multipath -ll output for this Domain
 
360060160422143002a38935800ae2760 dm-10 DGC     ,VRAID
size=2.0T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:22 sdd 8:48  active ready  running
 
 
Thx Christian
 
 

_______________________________________________
Users mailing list
mailto:Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users