Hi Yaniv,
>> 1. There is no point in so many connections.
>> 2. Certainly not the same portal - you really should have more.
>> 3. Note that some go via bond1 - and some via 'default' interface. Is that intended?
>> 4. Your multipath.conf is using rr_min_io - where it should use rr_min_io_rq most likely.
We have a single 68TB Equallogic unit with 24 disks. Each Ovirt host has 2 HBA’s on the iSCSI network. We use Ovirt and the Cisco switches to create an LACP group with those 2 HBA’s. I have always assumed that the two connections are one each from the HBA’s (i.e I should have two paths and two connections to each target).
If we reduce the number of storage domains, we reduce the number of devices and therefore the number of LVM Physical volumes that appear in Linux correct? At the moment each connection results in a Linux device which has its own queue. We have some guests with high IO loads on their device whilst others are low. All the storage domain / datastore sizing guides we found seem to imply it’s a trade-off between ease of management (i.e not having millions of domains to manage), IO contention between guests on a single large storage domain / datastore and possible wasted space on storage domains. If you have further information on recommendations, I am more than willing to change things as this problem is making our environment somewhat unusable at the moment. I have hosts that I can’t bring online and therefore reduced resiliency in clusters. They used to work just fine but the environment has grown over the last year and we also upgraded the Ovirt version from 3.6 to 4.x. We certainly had other problems, but host activation wasn’t one of them and it’s a problem that’s driving me mad.
Thanks for the pointer on rr_min_io – I see that was for an older kernel. We had that set from a Dell guide. I’ve now removed that setting as it seems the default value has changed now anyway.
>> Unrelated, your engine.log is quite flooded with:
>> 2017-01-11 15:07:46,085 WARN [org.ovirt.engine.core.
vdsbroker.vdsbroker. VdsBrokerObjectsBuilder] (DefaultQuartzScheduler9) [31a71bf5] Invalid or unknown guest architecture type '' received from guest agent >>
>> Any idea what kind of guest you are running?
Do you have any idea what the guest name is that’s coming from? We pretty much exclusively have Linux (CentOS various versions) and Windows (various versions) as the guest OS.
Thanks again,
Mark