[ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Yaniv Kaul ykaul at redhat.com
Thu Jan 12 18:45:54 UTC 2017


On Thu, Jan 12, 2017 at 6:01 PM, Nicolas Ecarnot <nicolas at ecarnot.net>
wrote:

> Hi,
>
> As we are using a very similar hardware and usage as Mark (Dell poweredge
> hosts, Dell Equallogic SAN, iSCSI, and tons of LUNs for all those VMs), I'm
> jumping into this thread.
>
> Le 12/01/2017 à 16:29, Yaniv Kaul a écrit :
>
>
> While it's a bit of a religious war on what is preferred with iSCSI -
> network level bonding (LACP) or multipathing on the iSCSI level, I'm on the
> multipathing side. The main reason is that you may end up easily using just
> one of the paths in a bond - if your policy is not set correct on how to
> distribute connections between the physical links (remember that each
> connection sticks to a single physical link. So it really depends on the
> hash policy and even then - not so sure). With iSCSI multipathing you have
> more control - and it can also be determined by queue depth, etc.
> (In your example, if you have SRC A -> DST 1 and SRC B -> DST 1 (as you
> seem to have), both connections may end up on the same physical NIC.)
>
>
>>
>> If we reduce the number of storage domains, we reduce the number of
>> devices and therefore the number of LVM Physical volumes that appear in
>> Linux correct? At the moment each connection results in a Linux device
>> which has its own queue. We have some guests with high IO loads on their
>> device whilst others are low. All the storage domain / datastore sizing
>> guides we found seem to imply it’s a trade-off between ease of management
>> (i.e not having millions of domains to manage), IO contention between
>> guests on a single large storage domain / datastore and possible wasted
>> space on storage domains. If you have further information on
>> recommendations, I am more than willing to change things as this problem is
>> making our environment somewhat unusable at the moment. I have hosts that I
>> can’t bring online and therefore reduced resiliency in clusters. They used
>> to work just fine but the environment has grown over the last year and we
>> also upgraded the Ovirt version from 3.6 to 4.x. We certainly had other
>> problems, but host activation wasn’t one of them and it’s a problem that’s
>> driving me mad.
>>
>
> I would say that each path has its own device (and therefore its own
> queue). So I'd argue that you may want to have (for example) 4 paths to
> each LUN or perhaps more (8?). For example, with 2 NICs, each connecting to
> two controllers, each controller having 2 NICs (so no SPOF and nice number
> of paths).
>
> Here, one key point I'm trying (to no avail) to discuss for years with
> Redhat people, and either I did not understood, either I wasn't clear
> enough, or Redhat people answered me they owned no Equallogic SAN to test
> it, is :
> My (and maybe many others) Equallogic SAN has two controllers, but is
> publishing only *ONE* virtual ip address.
>

You are completely right - you keep saying that and I keep forgetting that.
I apologize.


> On one of our other EMC SAN, publishing *TWO* ip addresses, which can be
> published in two different subnets, I fully understand the benefits and
> working of multipathing (and even in the same subnet, our oVirt setup is
> happily using multipath).
>
> But on one of our oVirt setup using the Equallogic SAN, we have no choice
> but point our hosts iSCSI interfaces to one single SAN ip, so no multipath
> here.
>
> At this point, we saw no other mean than using bonding mode 1 to reach our
> SAN, which is terrible for storage experts.
>

You could, if you do it properly, have an active-active mode, no?. And if
the hash policy is correct (for example, layer3+4) you might get both
slaves useful. Also, multiple sessions can be achieved with iscsi.conf's
session.nr_sessions (though I'm not sure we don't have a bug where we don't
disconnect all sessions?).


>
>
> To come back to Mark's story, we are still using 3.6.5 DCs and planning to
> upgrade.
> Reading all this is making me delay this step.
>

Well, it'd be nice to get to the bottom of it, but I'm quite sure it has
relatively nothing to do with 4.0.
Y.


>
> --
> Nicolas ECARNOT
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170112/2c33cc94/attachment-0001.html>


More information about the Users mailing list