[ovirt-users] oVirt Instability with Dell Compellent via iSCSI/Multipath
Daniel Helgenberger
daniel.helgenberger at m-box.de
Thu May 21 09:25:09 UTC 2015
On 21.05.2015 02:48, Chris Jones - BookIt.com Systems Administrator wrote:
>> Another issue may be that the setting for COMPELNT/Compellent Vol are wrong;
>> the setting we ship is missing lot of settings that exists in the builtin
>> setting, and this may have bad effect. If your devices match this , I would
>> try this multipath configuration, instead of the one vdsm configures.
>>
>> device {
>> vendor "COMPELNT"
>> product "Compellent Vol"
>> path_grouping_policy "multibus"
>> path_checker "tur"
>> features "0"
>> hardware_handler "0"
>> prio "const"
>> failback "immediate"
>> rr_weight "uniform"
>> no_path_retry fail
>> }
>
> I wish I could. We're using the CentOS 7 ovirt-node-iso. The
> multipath.conf is less than ideal
I have this issue also. I think about opening a BZ ;)
but when I tried updating it, oVirt
> instantly overwrites it. To be clear, yes I know changes do not survive
> reboots and yes I know about persist, but it changes it while running.
> Live! Persist won't help there.
>
> I also tried building a CentOS 7 "thick client" where I set up CentOS 7
> first, added the oVirt repo, then let the engine provision it. Same
> problem with multipath.conf being overwritten with the default oVirt setup.
>
> So I tried to be slick about it. I made the multipath.conf immutable.
> That prevented the engine from being able to activate the node. It would
> fail on a vds command that gets the nodes capabilities and part of what
> it does is reads then overwrites multipath.conf.
>
> How do I safely update multipath.conf?
In the second line of your multipath conf, add:
# RHEV PRIVATE
Then, host deploy will ignore it and never change it.
>
>
>>
>> To verify that your devices match this, you can check the devices vendor and procut
>> strings in the output of "multipath -ll". I would like to see the output of this
>> command.
>
> multipath -ll (default setup) can be seen here.
> http://paste.linux-help.org/view/430c7538
>
>> Another platform issue is bad default SCSI node.session.timeo.replacement_timeout
>> value, which is set to 120 seconds. This setting mean that the SCSI layer will
>> wait 120 seconds for io to complete on one path, before failing the io request.
>> So you may have one bad path, causing 120 second delay, while you could complete
>> the request using another path.
>>
>> Multipath is trying to set this value to 5 seconds, but this value is reverting
>> to the default 120 seconds after a device has trouble. There is an open bug about
>> this which we hope to get fixed in the rhel/centos 7.2.
>> https://bugzilla.redhat.com/1139038
>>
>> This issue together with "no_path_retry queue" is a very bad mix for ovirt.
>>
>> You can fix this timeout by setting:
>>
>> # /etc/iscsi/iscsid.conf
>> node.session.timeo.replacement_timeout = 5
>
> I'll see if that's possible with persist. Will this change survive node
> upgrades?
>
> Thanks for the reply and the suggestions.
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
More information about the Users
mailing list