[ovirt-users] oVirt Instability with Dell Compellent via iSCSI/Multipath
Chris Jones - BookIt.com Systems Administrator
chris.jones at bookit.com
Thu May 21 00:47:17 UTC 2015
> Another issue may be that the setting for COMPELNT/Compellent Vol are wrong;
> the setting we ship is missing lot of settings that exists in the builtin
> setting, and this may have bad effect. If your devices match this , I would
> try this multipath configuration, instead of the one vdsm configures.
>
> device {
> vendor "COMPELNT"
> product "Compellent Vol"
> path_grouping_policy "multibus"
> path_checker "tur"
> features "0"
> hardware_handler "0"
> prio "const"
> failback "immediate"
> rr_weight "uniform"
> no_path_retry fail
> }
I wish I could. We're using the CentOS 7 ovirt-node-iso. The
multipath.conf is less than ideal but when I tried updating it, oVirt
instantly overwrites it. To be clear, yes I know changes do not survive
reboots and yes I know about persist, but it changes it while running.
Live! Persist won't help there.
I also tried building a CentOS 7 "thick client" where I set up CentOS 7
first, added the oVirt repo, then let the engine provision it. Same
problem with multipath.conf being overwritten with the default oVirt setup.
So I tried to be slick about it. I made the multipath.conf immutable.
That prevented the engine from being able to activate the node. It would
fail on a vds command that gets the nodes capabilities and part of what
it does is reads then overwrites multipath.conf.
How do I safely update multipath.conf?
>
> To verify that your devices match this, you can check the devices vendor and procut
> strings in the output of "multipath -ll". I would like to see the output of this
> command.
multipath -ll (default setup) can be seen here.
http://paste.linux-help.org/view/430c7538
> Another platform issue is bad default SCSI node.session.timeo.replacement_timeout
> value, which is set to 120 seconds. This setting mean that the SCSI layer will
> wait 120 seconds for io to complete on one path, before failing the io request.
> So you may have one bad path, causing 120 second delay, while you could complete
> the request using another path.
>
> Multipath is trying to set this value to 5 seconds, but this value is reverting
> to the default 120 seconds after a device has trouble. There is an open bug about
> this which we hope to get fixed in the rhel/centos 7.2.
> https://bugzilla.redhat.com/1139038
>
> This issue together with "no_path_retry queue" is a very bad mix for ovirt.
>
> You can fix this timeout by setting:
>
> # /etc/iscsi/iscsid.conf
> node.session.timeo.replacement_timeout = 5
I'll see if that's possible with persist. Will this change survive node
upgrades?
Thanks for the reply and the suggestions.
More information about the Users
mailing list