<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Feb 10, 2017 at 9:24 PM, Grundmann, Christian <span dir="ltr"><<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="DE-AT">
<div class="gmail-m_6819784024837909801WordSection1">
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Attached</span></p></div></div></blockquote><div><br></div><div>I don't see anything unusual.</div><div><br></div><div>Having one path, your system is very sensitive to </div><div>errors on the single path.</div><div><br></div><div>The default ovirt configuration assume that you have several</div><div>paths for each device, and is optimized for fast failover when</div><div>a path has an error.</div><div><br></div><div>This optimization makes your system more fragile when you</div><div>have only one path.</div><div><br></div><div>I would use this setting:</div><div><br></div><div> no_path_retry 4</div><div><br></div><div>This will do 4 retries when all paths are faulty before failing the io.</div><div>If you have short outage for some reason, this may hide the issue</div><div>on the host.</div><div><br></div><div>If you change multipath.conf, you should mark it as private, so </div><div>vdsm will not touch this file on upgrades. To mark the file as</div><div>private the second line of the file must be:</div><div><br></div><div># VDSM PRIVATE</div><div><br></div><div>To check that your file is considered private, you can run:</div><div><br></div><div><div># vdsm-tool is-configured</div><div>Manual override for multipath.conf detected - preserving current configuration</div></div><div><br></div><div>You should see the message about Manual override.</div><div><br></div><div>The real question is why the path failed - did you have anything</div><div>in the server logs at that time? did you have issues on other hosts</div><div>in the same time?</div><div><br></div><div>Nir</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="DE-AT"><div class="gmail-m_6819784024837909801WordSection1"><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Thx Christian<u></u><u></u></span></p>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span lang="DE" style="font-size:11pt;font-family:calibri,sans-serif">Von:</span></b><span lang="DE" style="font-size:11pt;font-family:calibri,sans-serif"> Nir Soffer [mailto:<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>]
<br>
<b>Gesendet:</b> Freitag, 10. Februar 2017 17:43</span></p><div><div class="gmail-h5"><br>
<b>An:</b> Grundmann, Christian <<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>><br>
<b>Cc:</b> <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
<b>Betreff:</b> Re: [ovirt-users] Storage domain experienced a high latency<u></u><u></u></div></div><p></p><div><div class="gmail-h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<div>
<p class="MsoNormal">On Thu, Feb 9, 2017 at 10:03 AM, Grundmann, Christian <<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>> wrote:<u></u><u></u></p>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">Hi,<br>
<br>
@ Can also be low level issue in kernel, hba, switch, server.<br>
I have the old storage on the same cable so I don’t think its hba or switch related<br>
On the same Switch I have a few ESXi Server with same storage setup which are working without problems.<br>
<br>
@multipath<br>
I use stock ng-node multipath configuration<br>
<br>
# VDSM REVISION 1.3<br>
<br>
defaults {<br>
polling_interval 5<br>
no_path_retry fail<br>
user_friendly_names no<br>
flush_on_last_del yes<br>
fast_io_fail_tmo 5<br>
dev_loss_tmo 30<br>
max_fds 4096<br>
}<br>
<br>
# Remove devices entries when overrides section is available.<br>
devices {<br>
device {<br>
# These settings overrides built-in devices settings. It does not apply<br>
# to devices without built-in settings (these use the settings in the<br>
# "defaults" section), or to devices defined in the "devices" section.<br>
# Note: This is not available yet on Fedora 21. For more info see<br>
# <a href="https://bugzilla.redhat.com/1253799" target="_blank">https://bugzilla.redhat.com/<wbr>1253799</a><br>
all_devs yes<br>
no_path_retry fail<br>
}<br>
}<br>
<br>
# Enable when this section is available on all supported platforms.<br>
# Options defined here override device specific options embedded into<br>
# multipathd.<br>
#<br>
# overrides {<br>
# no_path_retry fail<br>
# }<br>
<br>
<br>
multipath -r v3<br>
has no output<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">My mistake, the correct command is:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">multipath -r -v3<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">It creates tons of output, so better redirect to file and attach the file:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">multipath -r -v3 > multiapth-r-v3.out<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal"><br>
<br>
Thx Christian<br>
<br>
<br>
Von: Nir Soffer [mailto:<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>]<br>
Gesendet: Mittwoch, 08. Februar 2017 20:44<br>
An: Grundmann, Christian <<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>><br>
Cc: <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
Betreff: Re: [ovirt-users] Storage domain experienced a high latency<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><br>
On Wed, Feb 8, 2017 at 6:11 PM, Grundmann, Christian <mailto:<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@<wbr>fabasoft.com</a>> wrote:<br>
Hi,<br>
got a new FC Storage (EMC Unity 300F) which is seen by my Hosts additional to my old Storage for Migration.<br>
New Storage has only on PATH until Migration is done.<br>
I already have a few VMs running on the new Storage without Problem.<br>
But after starting some VMs (don’t really no whats the difference to working ones), the Path for new Storage fails.<br>
<br>
Engine tells me: Storage Domain <storagedomain> experienced a high latency of 22.4875 seconds from host <host><br>
<br>
Where can I start looking?<br>
<br>
In /var/log/messages I found:<br>
<br>
Feb 8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: sdd - emc_clariion_checker: Active path is healthy.<br>
Feb 8 09:03:53 ovirtnode01 multipathd: 8:48: reinstated<br>
Feb 8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: remaining active paths: 1<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 8<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>
Feb 8 09:03:53 ovirtnode01 kernel: Buffer I/O error on dev dm-207, logical block 97, async page read<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>
Feb 8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>
Feb 8 09:03:53 ovirtnode01 kernel: device-mapper: multipath: Reinstating path 8:48.<br>
Feb 8 09:03:53 ovirtnode01 kernel: sd 3:0:0:22: alua: port group 01 state A preferred supports tolUsNA<br>
Feb 8 09:03:53 ovirtnode01 sanlock[5192]: 2017-02-08 09:03:53+0100 151809 [11772]: s59 add_lockspace fail result -202<br>
Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>
Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: devmap not registered, can't remove<br>
Feb 8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>
Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)<br>
Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: devmap not registered, can't remove<br>
Feb 8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)<br>
Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>
Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: devmap not registered, can't remove<br>
Feb 8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>
Feb 8 09:04:08 ovirtnode01 kernel: dd: sending ioctl 80306d02 to a partition!<br>
Feb 8 09:04:24 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:24+0100 151840 [15589]: read_sectors delta_leader offset 2560 rv -202 /dev/f9b70017-0a34-47bc-bf2f-<wbr>dfc70200a347/ids<br>
Feb 8 09:04:34 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:34+0100 151850 [15589]: f9b70017 close_task_aio 0 0x7fd78c0008c0 busy<br>
Feb 8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: sdd - emc_clariion_checker: Read error for WWN 60060160422143002a38935800ae27<wbr>60. Sense data are 0x0/0x0/0x0.<br>
Feb 8 09:04:39 ovirtnode01 multipathd: checker failed path 8:48 in map 360060160422143002a38935800ae2<wbr>760<br>
Feb 8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: remaining active paths: 0<br>
Feb 8 09:04:39 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 -- 1 2002.<br>
Feb 8 09:04:39 ovirtnode01 kernel: device-mapper: multipath: Failing path 8:48.<br>
Feb 8 09:04:40 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 -- 1 2002.<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: 8 callbacks suppressed<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>
Feb 8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>
<br>
Maybe you should consult the storage vendor about this?<br>
<br>
Can be also incorrect multipath configuration, maybe multipatch checker,<br>
fail, and because you have one path the device moved to faulty state, and <br>
sanlock fail to access the device.<br>
<br>
Can also be low level issue in kernel, hba, switch, server.<br>
<br>
Lets start by inspecting multipath configuration, can you share<br>
output of:<br>
<br>
cat /etc/multiapth.conf<br>
multipath -r v3<br>
<br>
Maybe you can expose one lun for testing, and blacklist this lun in <br>
multipath.conf. You will not be able to use this lun in ovirt, but it can<br>
be used to validate the layers below multipath. If a plain lun is ok, <br>
and same lun used a multipath device fails, the problem is likely to be<br>
multipath configuration.<br>
<br>
Nir<br>
<br>
<br>
<br>
multipath -ll output for this Domain<br>
<br>
360060160422143002a38935800ae2<wbr>760 dm-10 DGC ,VRAID<br>
size=2.0T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw<br>
`-+- policy='service-time 0' prio=50 status=active<br>
`- 3:0:0:22 sdd 8:48 active ready running<br>
<br>
<br>
Thx Christian<br>
<br>
<br>
<br>
______________________________<wbr>_________________<br>
Users mailing list<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><u></u><u></u></p>
</blockquote>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div></div></div>
</div>
</blockquote></div><br></div></div>