Re: [ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

12 Jan 2017

      This is a multi-part message in MIME format.
--------------68695C298F307D3C48D2ABAE
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

Hi,

As we are using a very similar hardware and usage as Mark (Dell 
poweredge hosts, Dell Equallogic SAN, iSCSI, and tons of LUNs for all 
those VMs), I'm jumping into this thread.

Le 12/01/2017 à 16:29, Yaniv Kaul a écrit :
...
While it's a bit of a religious war on what is preferred with iSCSI - 
network level bonding (LACP) or multipathing on the iSCSI level, I'm 
on the multipathing side. The main reason is that you may end up 
easily using just one of the paths in a bond - if your policy is not 
set correct on how to distribute connections between the physical 
links (remember that each connection sticks to a single physical link. 
So it really depends on the hash policy and even then - not so sure). 
With iSCSI multipathing you have more control - and it can also be 
determined by queue depth, etc.
(In your example, if you have SRC A -> DST 1 and SRC B -> DST 1 (as 
you seem to have), both connections may end up on the same physical NIC.)
If we reduce the number of storage domains, we reduce the number
    of devices and therefore the number of LVM Physical volumes that
    appear in Linux correct? At the moment each connection results in
    a Linux device which has its own queue. We have some guests with
    high IO loads on their device whilst others are low. All the
    storage domain / datastore sizing guides we found seem to imply
    its a trade-off between ease of management (i.e not having
    millions of domains to manage), IO contention between guests on a
    single large storage domain / datastore and possible wasted space
    on storage domains. If you have further information on
    recommendations, I am more than willing to change things as this
    problem is making our environment somewhat unusable at the moment.
    I have hosts that I cant bring online and therefore reduced
    resiliency in clusters. They used to work just fine but the
    environment has grown over the last year and we also upgraded the
    Ovirt version from 3.6 to 4.x. We certainly had other problems,
    but host activation wasnt one of them and its a problem thats
    driving me mad.
I would say that each path has its own device (and therefore its own 
queue). So I'd argue that you may want to have (for example) 4 paths 
to each LUN or perhaps more (8?). For example, with 2 NICs, each 
connecting to two controllers, each controller having 2 NICs (so no 
SPOF and nice number of paths).
Here, one key point I'm trying (to no avail) to discuss for years with 
Redhat people, and either I did not understood, either I wasn't clear 
enough, or Redhat people answered me they owned no Equallogic SAN to 
test it, is :
My (and maybe many others) Equallogic SAN has two controllers, but is 
publishing only *ONE* virtual ip address.
On one of our other EMC SAN, publishing *TWO* ip addresses, which can be 
published in two different subnets, I fully understand the benefits and 
working of multipathing (and even in the same subnet, our oVirt setup is 
happily using multipath).

But on one of our oVirt setup using the Equallogic SAN, we have no 
choice but point our hosts iSCSI interfaces to one single SAN ip, so no 
multipath here.

At this point, we saw no other mean than using bonding mode 1 to reach 
our SAN, which is terrible for storage experts.

To come back to Mark's story, we are still using 3.6.5 DCs and planning 
to upgrade.
Reading all this is making me delay this step.

-- 
Nicolas ECARNOT

--------------68695C298F307D3C48D2ABAE
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi,<br>
      <br>
      As we are using a very similar hardware and usage as Mark (Dell
      poweredge hosts, Dell Equallogic SAN, iSCSI, and tons of LUNs for
      all those VMs), I'm jumping into this thread.<br>
      <br>
      Le 12/01/2017 à 16:29, Yaniv Kaul a écrit :<br>
    </div>
    <blockquote
cite="mid:CAJgorsZaZOk6yaqH7mQH5bXRkegvLN_3pmGZ6J53TEwXtdnmvQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote"><br>
            <div>While it's a bit of a religious war on what is
              preferred with iSCSI - network level bonding (LACP) or
              multipathing on the iSCSI level, I'm on the multipathing
              side. The main reason is that you may end up easily using
              just one of the paths in a bond - if your policy is not
              set correct on how to distribute connections between the
              physical links (remember that each connection sticks to a
              single physical link. So it really depends on the hash
              policy and even then - not so sure). With iSCSI
              multipathing you have more control - and it can also be
              determined by queue depth, etc.</div>
            <div>(In your example, if you have SRC A -> DST 1 and SRC
              B -> DST 1 (as you seem to have), both connections may
              end up on the same physical NIC.)</div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div link="blue" vlink="purple" lang="EN-GB">
                <div class="m_7631641247395573674WordSection1">
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"></span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">If
                      we reduce the number of storage domains, we reduce
                      the number of devices and therefore the number of
                      LVM Physical volumes that appear in Linux correct?
                      At the moment each connection results in a Linux
                      device which has its own queue. We have some
                      guests with high IO loads on their device whilst
                      others are low. All the storage domain / datastore
                      sizing guides we found seem to imply its a
                      trade-off between ease of management (i.e not
                      having millions of domains to manage), IO
                      contention between guests on a single large
                      storage domain / datastore and possible wasted
                      space on storage domains. If you have further
                      information on recommendations, I am more than
                      willing to change things as this problem is making
                      our environment somewhat unusable at the moment. I
                      have hosts that I cant bring online and therefore
                      reduced resiliency in clusters. They used to work
                      just fine but the environment has grown over the
                      last year and we also upgraded the Ovirt version
                      from 3.6 to 4.x. We certainly had other problems,
                      but host activation wasnt one of them and its a
                      problem thats driving me mad.</span></p>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I would say that each path has its own device (and
              therefore its own queue). So I'd argue that you may want
              to have (for example) 4 paths to each LUN or perhaps more
              (8?). For example, with 2 NICs, each connecting to two
              controllers, each controller having 2 NICs (so no SPOF and
              nice number of paths).</div>
          </div>
        </div>
      </div>
    </blockquote>
    Here, one key point I'm trying (to no avail) to discuss for years
    with Redhat people, and either I did not understood, either I wasn't
    clear enough, or Redhat people answered me they owned no Equallogic
    SAN to test it, is :<br>
    My (and maybe many others) Equallogic SAN has two controllers, but
    is publishing only *ONE* virtual ip address.<br>
    On one of our other EMC SAN, publishing *TWO* ip addresses, which
    can be published in two different subnets, I fully understand the
    benefits and working of multipathing (and even in the same subnet,
    our oVirt setup is happily using multipath).<br>
    <br>
    But on one of our oVirt setup using the Equallogic SAN, we have no
    choice but point our hosts iSCSI interfaces to one single SAN ip, so
    no multipath here.<br>
    <br>
    At this point, we saw no other mean than using bonding mode 1 to
    reach our SAN, which is terrible for storage experts.<br>
    <br>
    <br>
    To come back to Mark's story, we are still using 3.6.5 DCs and
    planning to upgrade.<br>
    Reading all this is making me delay this step.<br>
    <br>
    -- <br>
    Nicolas ECARNOT<br>
  </body>
</html>

--------------68695C298F307D3C48D2ABAE--

Re: [ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Nicolas Ecarnot