<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 12, 2017 at 12:02 PM, Mark Greenall <span dir="ltr"><<a href="mailto:m.greenall@iontrading.com" target="_blank">m.greenall@iontrading.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Firstly, thanks @Yaniv and thanks @Nir for your responses.<br>
<br>
@Yaniv, in answer to this:<br>
<span class="gmail-"><br>
>> Why do you have 1 SD per VM?<br>
<br>
</span>It's a combination of performance and ease of management. We ran some IO tests with various configurations and settled on this one for a balance of reduced IO contention and ease of management. If there is a better recommended way of handling these then I'm all ears. If you believe having a large amount of storage domains adds to the problem then we can also review the setup.<br></blockquote><div><br></div><div>I don't see how it can improve performance. Having several iSCSI connections to a (single!) target may help, but certainly not too much. Just from looking at your /var/log/messages:</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection1:0 to [target: iqn.2001-05.com.equallogic:4-42a846-37a238a33-4e21185c70857594-uk1-amd-cluster2-template-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection2:0 to [target: iqn.2001-05.com.equallogic:4-42a846-37a238a33-4e21185c70857594-uk1-amd-cluster2-template-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection3:0 to [target: iqn.2001-05.com.equallogic:4-42a846-192238a33-1f71185c70b57598-cuuk1ionhurap02-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection4:0 to [target: iqn.2001-05.com.equallogic:4-42a846-192238a33-1f71185c70b57598-cuuk1ionhurap02-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection5:0 to [target: iqn.2001-05.com.equallogic:4-42a846-223238a33-7301185c70e57598-cuuk1ionhurdb02-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection6:0 to [target: iqn.2001-05.com.equallogic:4-42a846-223238a33-7301185c70e57598-cuuk1ionhurdb02-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection7:0 to [target: iqn.2001-05.com.equallogic:4-42a846-212238a33-2a61185c719576bd-lnd-ion-anv-test-lin-64-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection8:0 to [target: iqn.2001-05.com.equallogic:4-42a846-212238a33-2a61185c719576bd-lnd-ion-anv-test-lin-64-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection9:0 to [target: iqn.2001-05.com.equallogic:4-42a846-ad4238a33-1b31185c75157c7e-lnd-ion-lindev-14-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection10:0 to [target: iqn.2001-05.com.equallogic:4-42a846-ad4238a33-1b31185c75157c7e-lnd-ion-lindev-14-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection11:0 to [target: iqn.2001-05.com.equallogic:4-42a846-b99479033-9a788b6aa6857d3b-lnd-anv-sup-03-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection12:0 to [target: iqn.2001-05.com.equallogic:4-42a846-b99479033-9a788b6aa6857d3b-lnd-anv-sup-03-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection13:0 to [target: iqn.2001-05.com.equallogic:4-42a846-cd9479033-ffc88b6aa6b57d3b-lnd-linsup-02-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection14:0 to [target: iqn.2001-05.com.equallogic:4-42a846-cd9479033-ffc88b6aa6b57d3b-lnd-linsup-02-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection15:0 to [target: iqn.2001-05.com.equallogic:4-42a846-db8479033-96f88b6aa6e57d3b-lnd-linsup-03-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection16:0 to [target: iqn.2001-05.com.equallogic:4-42a846-db8479033-96f88b6aa6e57d3b-lnd-linsup-03-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection17:0 to [target: iqn.2001-05.com.equallogic:4-42a846-eae479033-f6588b6aa7157d3b-lnd-linsup-04-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection18:0 to [target: iqn.2001-05.com.equallogic:4-42a846-eae479033-f6588b6aa7157d3b-lnd-linsup-04-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection19:0 to [target: iqn.2001-05.com.equallogic:4-42a846-fac479033-bf888b6aa7757d3b-lnd-linsup-u01-dstore01, portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now</div><div>Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection20:0 to [target: iqn.2001-05.com.equallogic:4-42a846-fac479033-bf888b6aa7757d3b-lnd-linsup-u01-dstore01, portal: 10.100.214.77,3260] through [iface: default] is operational now </div><div><br></div><div><br></div><div>1. There is no point in so many connections.</div><div>2. Certainly not the same portal - you really should have more.</div><div>3. Note that some go via bond1 - and some via 'default' interface. Is that intended?</div><div>4. Your multipath.conf is using rr_min_io - where it should use rr_min_io_rq most likely.</div><div><br></div><div><br></div><div>Unrelated, your engine.log is quite flooded with:</div><div>2017-01-11 15:07:46,085 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (DefaultQuartzScheduler9) [31a71bf5] Invalid or unknown guest architecture type '' received from guest agent<br></div><div><br></div><div>Any idea what kind of guest you are running?</div><div><br></div><div><br></div><div>You have a lot of host devices - we have patches to improve their enumeration (coming in 4.0.7)</div><div>Y.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
>> Can you try and disable (mask) the lvmetad service on the hosts and see if it improves matters?<br>
<br>
</span>Disabled and masked the lvmetad service and tried again this morning. It seemed to be less of a load / quicker getting the initial activation of the host working but the end result was still the same. Just under 10 minutes later the node went non-operational and the cycle began again. By 09:27 we had the high CPU load and repeating lvm cycle.<br>
<br>
Host Activation: 09:06<br>
Host Up: 09:08<br>
Non-Operational: 09:16<br>
LVM Load: 09:27<br>
Host Reboot: 09:30<br>
<br>
>From yesterday and today I've attached messages, sanlock.log and multipath.conf files too. Although I'm not sure the messages file will be of much use as it looks like log rate limiting kicked in and supressed messages for the duration of the process. I'm booted off the kernel with debugging but maybe that's generating too much info? Let me know if you want me to change anything here to get additional information.<br>
<br>
As added configuration information we also have the following settings from the Equallogic and Linux install guide:<br>
<br>
/etc/sysctl.conf:<br>
<br>
# Prevent ARP Flux for multiple NICs on the same subnet:<br>
net.ipv4.conf.all.arp_ignore = 1<br>
net.ipv4.conf.all.arp_announce = 2<br>
# Loosen RP Filter to alow multiple iSCSI connections<br>
net.ipv4.conf.all.rp_filter = 2<br>
<br>
<br>
And the following /lib/udev/rules.d/99-eqlsd.<wbr>rules:<br>
<br>
#-----------------------------<wbr>------------------------------<wbr>------------------<br>
# Copyright (c) 2010-2012 by Dell, Inc.<br>
#<br>
# All rights reserved. This software may not be copied, disclosed,<br>
# transferred, or used except in accordance with a license granted<br>
# by Dell, Inc. This software embodies proprietary information<br>
# and trade secrets of Dell, Inc.<br>
#<br>
#-----------------------------<wbr>------------------------------<wbr>------------------<br>
#<br>
# Various Settings for Dell Equallogic disks based on Dell Optimizing SAN Environment for Linux Guide<br>
#<br>
# Modify disk scheduler mode to noop<br>
ACTION=="add|change", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo noop > /sys/${DEVPATH}/queue/<wbr>scheduler'"<br>
# Modify disk timeout value to 60 seconds<br>
ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo 60 > /sys/%p/device/timeout'"<br>
# Modify read ahead value to 1024<br>
ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo 1024 > /sys/${DEVPATH}/bdi/read_<wbr>ahead_kb'"<br>
<br>
Many Thanks,<br>
Mark<br>
</blockquote></div><br></div></div>