[ovirt-users] oVirt Instability with Dell Compellent via iSCSI/Multipath

Chris Jones - BookIt.com Systems Administrator chris.jones at bookit.com
Wed May 20 20:47:17 EDT 2015


> Another issue may be that the setting for COMPELNT/Compellent Vol are wrong;
> the setting we ship is missing lot of settings that exists in the builtin
> setting, and this may have bad effect. If your devices match this , I would
> try this multipath configuration, instead of the one vdsm configures.
>
>     device {
>         vendor "COMPELNT"
>         product "Compellent Vol"
>         path_grouping_policy "multibus"
>         path_checker "tur"
>         features "0"
>         hardware_handler "0"
>         prio "const"
>         failback "immediate"
>         rr_weight "uniform"
>         no_path_retry fail
>     }

I wish I could. We're using the CentOS 7 ovirt-node-iso. The 
multipath.conf is less than ideal but when I tried updating it, oVirt 
instantly overwrites it. To be clear, yes I know changes do not survive 
reboots and yes I know about persist, but it changes it while running. 
Live! Persist won't help there.

I also tried building a CentOS 7 "thick client" where I set up CentOS 7 
first, added the oVirt repo, then let the engine provision it. Same 
problem with multipath.conf being overwritten with the default oVirt setup.

So I tried to be slick about it. I made the multipath.conf immutable. 
That prevented the engine from being able to activate the node. It would 
fail on a vds command that gets the nodes capabilities and part of what 
it does is reads then overwrites multipath.conf.

How do I safely update multipath.conf?


>
> To verify that your devices match this, you can check the devices vendor and procut
> strings in the output of "multipath -ll". I would like to see the output of this
> command.

multipath -ll (default setup) can be seen here.
http://paste.linux-help.org/view/430c7538

> Another platform issue is bad default SCSI node.session.timeo.replacement_timeout
> value, which is set to 120 seconds. This setting mean that the SCSI layer will
> wait 120 seconds for io to complete on one path, before failing the io request.
> So you may have one bad path, causing 120 second delay, while you could complete
> the request using another path.
>
> Multipath is trying to set this value to 5 seconds, but this value is reverting
> to the default 120 seconds after a device has trouble. There is an open bug about
> this which we hope to get fixed in the rhel/centos 7.2.
> https://bugzilla.redhat.com/1139038
>
> This issue together with "no_path_retry queue" is a very bad mix for ovirt.
>
> You can fix this timeout by setting:
>
> # /etc/iscsi/iscsid.conf
> node.session.timeo.replacement_timeout = 5

I'll see if that's possible with persist. Will this change survive node 
upgrades?

Thanks for the reply and the suggestions.


More information about the Users mailing list