<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Feb 10, 2017 at 9:24 PM, Grundmann, Christian <span dir="ltr">&lt;<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div lang="DE-AT">

<div class="gmail-m_6819784024837909801WordSection1">

<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Attached</span></p></div></div></blockquote><div><br></div><div>I don&#39;t see anything unusual.</div><div><br></div><div>Having one path, your system is very sensitive to </div><div>errors on the single path.</div><div><br></div><div>The default ovirt configuration assume that you have several</div><div>paths for each device, and is optimized for fast failover when</div><div>a path has an error.</div><div><br></div><div>This optimization makes your system more fragile when you</div><div>have only one path.</div><div><br></div><div>I would use this setting:</div><div><br></div><div>    no_path_retry 4</div><div><br></div><div>This will do 4 retries when all paths are faulty before failing the io.</div><div>If you have short outage for some reason, this may hide the issue</div><div>on the host.</div><div><br></div><div>If you change multipath.conf, you should mark it as private, so </div><div>vdsm will not touch this file on upgrades. To mark the file as</div><div>private the second line of the file must be:</div><div><br></div><div># VDSM PRIVATE</div><div><br></div><div>To check that your file is considered private, you can run:</div><div><br></div><div><div># vdsm-tool is-configured</div><div>Manual override for multipath.conf detected - preserving current configuration</div></div><div><br></div><div>You should see the message about Manual override.</div><div><br></div><div>The real question is why the path failed - did you have anything</div><div>in the server logs at that time? did you have issues on other hosts</div><div>in the same time?</div><div><br></div><div>Nir</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="DE-AT"><div class="gmail-m_6819784024837909801WordSection1"><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p>

<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>

<p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Thx Christian<u></u><u></u></span></p>

<p class="MsoNormal"><span><u></u> <u></u></span></p>

<p class="MsoNormal"><b><span lang="DE" style="font-size:11pt;font-family:calibri,sans-serif">Von:</span></b><span lang="DE" style="font-size:11pt;font-family:calibri,sans-serif"> Nir Soffer [mailto:<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>]

<br>

<b>Gesendet:</b> Freitag, 10. Februar 2017 17:43</span></p><div><div class="gmail-h5"><br>

<b>An:</b> Grundmann, Christian &lt;<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>&gt;<br>

<b>Cc:</b> <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>

<b>Betreff:</b> Re: [ovirt-users] Storage domain experienced a high latency<u></u><u></u></div></div><p></p><div><div class="gmail-h5">

<p class="MsoNormal"><u></u> <u></u></p>

<div>

<div>

<div>

<p class="MsoNormal">On Thu, Feb 9, 2017 at 10:03 AM, Grundmann, Christian &lt;<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>&gt; wrote:<u></u><u></u></p>

<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">

<p class="MsoNormal">Hi,<br>

<br>

@ Can also be low level issue in kernel, hba, switch, server.<br>

I have the old storage on the same cable so I don’t think its hba or switch related<br>

On the same Switch I have a few ESXi Server with same storage setup which are working without problems.<br>

<br>

@multipath<br>

I use stock ng-node multipath configuration<br>

<br>

# VDSM REVISION 1.3<br>

<br>

defaults {<br>

    polling_interval            5<br>

    no_path_retry               fail<br>

    user_friendly_names         no<br>

    flush_on_last_del           yes<br>

    fast_io_fail_tmo            5<br>

    dev_loss_tmo                30<br>

    max_fds                     4096<br>

}<br>

<br>

# Remove devices entries when overrides section is available.<br>

devices {<br>

    device {<br>

        # These settings overrides built-in devices settings. It does not apply<br>

        # to devices without built-in settings (these use the settings in the<br>

        # &quot;defaults&quot; section), or to devices defined in the &quot;devices&quot; section.<br>

        # Note: This is not available yet on Fedora 21. For more info see<br>

        # <a href="https://bugzilla.redhat.com/1253799" target="_blank">https://bugzilla.redhat.com/<wbr>1253799</a><br>

        all_devs                yes<br>

        no_path_retry           fail<br>

    }<br>

}<br>

<br>

# Enable when this section is available on all supported platforms.<br>

# Options defined here override device specific options embedded into<br>

# multipathd.<br>

#<br>

# overrides {<br>

#      no_path_retry           fail<br>

# }<br>

<br>

<br>

multipath -r v3<br>

has no output<u></u><u></u></p>

</blockquote>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">My mistake, the correct command is:<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">multipath -r -v3<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">It creates tons of output, so better redirect to file and attach the file:<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">multipath -r -v3 &gt; multiapth-r-v3.out<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"> <u></u><u></u></p>

</div>

<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">

<p class="MsoNormal"><br>

<br>

Thx Christian<br>

<br>

<br>

Von: Nir Soffer [mailto:<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>]<br>

Gesendet: Mittwoch, 08. Februar 2017 20:44<br>

An: Grundmann, Christian &lt;<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@fabasoft.<wbr>com</a>&gt;<br>

Cc: <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>

Betreff: Re: [ovirt-users] Storage domain experienced a high latency<u></u><u></u></p>

<div>

<div>

<p class="MsoNormal"><br>

On Wed, Feb 8, 2017 at 6:11 PM, Grundmann, Christian &lt;mailto:<a href="mailto:Christian.Grundmann@fabasoft.com" target="_blank">Christian.Grundmann@<wbr>fabasoft.com</a>&gt; wrote:<br>

Hi,<br>

got a new FC Storage (EMC Unity 300F) which is seen by my Hosts additional to my old Storage for Migration.<br>

New Storage has only on PATH until Migration is done.<br>

I already have a few VMs running on the new Storage without Problem.<br>

But after starting some VMs (don’t really no whats the difference to working ones), the Path for new Storage fails.<br>

 <br>

Engine tells me: Storage Domain &lt;storagedomain&gt; experienced a high latency of 22.4875 seconds from host &lt;host&gt;<br>

 <br>

Where can I start looking?<br>

 <br>

In /var/log/messages I found:<br>

 <br>

Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: sdd - emc_clariion_checker: Active path is healthy.<br>

Feb  8 09:03:53 ovirtnode01 multipathd: 8:48: reinstated<br>

Feb  8 09:03:53 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: remaining active paths: 1<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 8<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 5833475<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>

Feb  8 09:03:53 ovirtnode01 kernel: Buffer I/O error on dev dm-207, logical block 97, async page read<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>

Feb  8 09:03:53 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>

Feb  8 09:03:53 ovirtnode01 kernel: device-mapper: multipath: Reinstating path 8:48.<br>

Feb  8 09:03:53 ovirtnode01 kernel: sd 3:0:0:22: alua: port group 01 state A preferred supports tolUsNA<br>

Feb  8 09:03:53 ovirtnode01 sanlock[5192]: 2017-02-08 09:03:53+0100 151809 [11772]: s59 add_lockspace fail result -202<br>

Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>

Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: devmap not registered, can&#39;t remove<br>

Feb  8 09:04:05 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>

Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)<br>

Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: devmap not registered, can&#39;t remove<br>

Feb  8 09:04:06 ovirtnode01 multipathd: dm-34: remove map (uevent)<br>

Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>

Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: devmap not registered, can&#39;t remove<br>

Feb  8 09:04:08 ovirtnode01 multipathd: dm-33: remove map (uevent)<br>

Feb  8 09:04:08 ovirtnode01 kernel: dd: sending ioctl 80306d02 to a partition!<br>

Feb  8 09:04:24 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:24+0100 151840 [15589]: read_sectors delta_leader offset 2560 rv -202 /dev/f9b70017-0a34-47bc-bf2f-<wbr>dfc70200a347/ids<br>

Feb  8 09:04:34 ovirtnode01 sanlock[5192]: 2017-02-08 09:04:34+0100 151850 [15589]: f9b70017 close_task_aio 0 0x7fd78c0008c0 busy<br>

Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: sdd - emc_clariion_checker: Read error for WWN 60060160422143002a38935800ae27<wbr>60.  Sense data are 0x0/0x0/0x0.<br>

Feb  8 09:04:39 ovirtnode01 multipathd: checker failed path 8:48 in map 360060160422143002a38935800ae2<wbr>760<br>

Feb  8 09:04:39 ovirtnode01 multipathd: 360060160422143002a38935800ae2<wbr>760: remaining active paths: 0<br>

Feb  8 09:04:39 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 --  1 2002.<br>

Feb  8 09:04:39 ovirtnode01 kernel: device-mapper: multipath: Failing path 8:48.<br>

Feb  8 09:04:40 ovirtnode01 kernel: qla2xxx [0000:11:00.0]-801c:3: Abort command issued nexus=3:0:22 --  1 2002.<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: 8 callbacks suppressed<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967168<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 4294967280<br>

Feb  8 09:04:42 ovirtnode01 kernel: blk_update_request: I/O error, dev dm-10, sector 0<br>

<br>

Maybe you should consult the storage vendor about this?<br>

<br>

Can be also incorrect multipath configuration, maybe multipatch checker,<br>

fail, and because you have one path the device moved to faulty state, and <br>

sanlock fail to access the device.<br>

<br>

Can also be low level issue in kernel, hba, switch, server.<br>

<br>

Lets start by inspecting multipath configuration, can you share<br>

output of:<br>

<br>

cat /etc/multiapth.conf<br>

multipath -r v3<br>

<br>

Maybe you can expose one lun for testing, and blacklist this lun in <br>

multipath.conf. You will not be able to use this lun in ovirt, but it can<br>

be used to validate the layers below multipath. If a plain lun is ok, <br>

and same lun used a multipath device fails, the problem is likely to be<br>

multipath configuration.<br>

 <br>

Nir<br>

<br>

 <br>

 <br>

multipath -ll output for this Domain<br>

 <br>

360060160422143002a38935800ae2<wbr>760 dm-10 DGC     ,VRAID<br>

size=2.0T features=&#39;1 retain_attached_hw_handler&#39; hwhandler=&#39;1 alua&#39; wp=rw<br>

`-+- policy=&#39;service-time 0&#39; prio=50 status=active<br>

  `- 3:0:0:22 sdd 8:48  active ready  running<br>

 <br>

 <br>

Thx Christian<br>

 <br>

 <br>

<br>

______________________________<wbr>_________________<br>

Users mailing list<u></u><u></u></p>

</div>

</div>

<p class="MsoNormal" style="margin-bottom:12pt">mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><u></u><u></u></p>

</blockquote>

</div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

</div>

</div></div></div>

</div>


</blockquote></div><br></div></div>