[ovirt-users] iSCSI Multipathing -> host inactive

Fri Aug 26 20:40:15 UTC 2016

On Fri, Aug 26, 2016 at 1:33 PM, InterNetX - Juergen Gotteswinter <
jg at internetx.com> wrote:

>
>
> Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:
> >
> >
> > On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter
> > <juergen.gotteswinter at internetx.com
> > <mailto:juergen.gotteswinter at internetx.com>> wrote:
> >
> >     iSCSI & Ovirt is an awful combination, no matter if multipathed or
> >     bonded. its always gambling how long it will work, and when it fails
> why
> >     did it fail.
> >
> >
> > I disagree. In most cases, it's actually a lower layer issues. In most
> > cases, btw, it's because multipathing was not configured (or not
> > configured correctly).
> >
>
> experience tells me it is like is said, this was something i have seen
> from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same
> way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing
> is very widespread in use in our DC. But such problems are an excluse
> Ovirt/RHEV Feature.
>

I don't think the resentful tone is appropriate for the oVirt community
mailing list.

>
> >
> >
> >     its supersensitive to latency, and superfast with setting an host to
> >     inactive because the engine thinks something is wrong with it. in
> most
> >     cases there was no real reason for.
> >
> >
> > Did you open bugs for those issues? I'm not aware of 'no real reason'
> > issues.
> >
>
> Support Tickets for Rhev Installation, after Support (even after massive
> escalation requests) kept telling me the same again and again i gave up
> and we dropped the RHEV Subscriptions to Migrate the VMS to a different
> Plattform Solution (still iSCSI Backend). Problems gone.
>

I wish you peace of mind with your new platform solution.
>From a (shallow!) search I've made on oVirt bugs, I could not find any
oVirt issue you've reported or commented on.
I am aware of the request to set rp_filter correctly for setups with
multiple interfaces in the same IP subnet.

>
>
> >
> >
> >     we had this in several different hardware combinations, self built
> >     filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer
> >
> >     Been there, done that, wont do again.
> >
> >
> > We've had good success and reliability with most enterprise level
> > storage, such as EMC, NetApp, Dell filers.
> > When properly configured, of course.
> > Y.
> >
>
> Dell Equallogic? Cant really believe since Ovirt / Rhev and the
> Equallogic Network Configuration wont play nice together (EQL wants all
> Interfaces in the same Subnet). And they only work like expected when
> there Hit Kit Driverpackage is installed. Without Path Failover is like
> russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in
> a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit
> -> Kaboom. Host not available.
>

Thanks - I'll look into this specific storage.
I'm aware it's unique in some cases, but I don't have experience with it
specifically.

>
> There are several KB Articles in the RHN, without real Solution.
>
>
> But like you try to tell between the lines, this must be the Customers
> misconfiguration. Yep, typical Supportkilleranswer. Same Style than in
> RHN Tickets, i am done with this.
>

A funny sentence I've read yesterday:
Schrodinger's backup: "the condition of any backup is unknown until restore
is attempted."

In a sense, this is similar to no SPOF high availability setup - in many
cases you don't know it works well until needed.
There are simply many variables and components involved.
That was all I meant, nothing between the lines and I apologize if I've
given you a different impression.
Y.

> Thanks.
>
> >
> >
> >
> >     Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:
> >     > Hi Elad,
> >     >
> >     > thank you very much for clearing things up.
> >     >
> >     > Initiator/iface 'a' tries to connect target 'b' and vice versa. As
> 'a'
> >     > and 'b' are in completely separate networks this can never work as
> >     long
> >     > as there is no routing between the networks.
> >     >
> >     > So it seems the iSCSI-bonding feature is not useful for my setup. I
> >     > still wonder how and where this feature is supposed to be used?
> >     >
> >     > thank you,
> >     > Uwe
> >     >
> >     > Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:
> >     >> Thanks.
> >     >>
> >     >> You're getting an iSCSI connection timeout [1], [2]. It means the
> >     host
> >     >> cannot connect to the targets from iface: enp9s0f1 nor iface:
> >     enp9s0f0.
> >     >>
> >     >> This causes the host to loose its connection to the storage and
> also,
> >     >> the connection to the engine becomes inactive. Therefore, the host
> >     >> changes its status to Non-responsive [3] and since it's the SPM,
> the
> >     >> whole DC, with all its storage domains become inactive.
> >     >>
> >     >>
> >     >> vdsm.log:
> >     >> [1]
> >     >> Traceback (most recent call last):
> >     >>   File "/usr/share/vdsm/storage/hsm.py", line 2400, in
> >     >> connectStorageServer
> >     >>     conObj.connect()
> >     >>   File "/usr/share/vdsm/storage/storageServer.py", line 508, in
> >     connect
> >     >>     iscsi.addIscsiNode(self._iface, self._target, self._cred)
> >     >>   File "/usr/share/vdsm/storage/iscsi.py", line 204, in
> addIscsiNode
> >     >>     iscsiadm.node_login(iface.name <http://iface.name>
> >     <http://iface.name>, portalStr,
> >     >> target.iqn)
> >     >>   File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in
> node_login
> >     >>     raise IscsiNodeError(rc, out, err)
> >     >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target:
> >     >> iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260]
> >     >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0,
> targ
> >     >> et: iqn.2005-10.org.freenas.ctl:tgtb, portal:
> 10.0.132.121,3260].',
> >     >> 'iscsiadm: initiator reported error (8 - connection timed out)',
> >     >> 'iscsiadm: Could not log into all portals'])
> >     >>
> >     >>
> >     >>
> >     >> vdsm.log:
> >     >> [2]
> >     >> Traceback (most recent call last):
> >     >>   File "/usr/share/vdsm/storage/hsm.py", line 2400, in
> >     >> connectStorageServer
> >     >>     conObj.connect()
> >     >>   File "/usr/share/vdsm/storage/storageServer.py", line 508, in
> >     connect
> >     >>     iscsi.addIscsiNode(self._iface, self._target, self._cred)
> >     >>   File "/usr/share/vdsm/storage/iscsi.py", line 204, in
> addIscsiNode
> >     >>     iscsiadm.node_login(iface.name <http://iface.name>
> >     <http://iface.name>, portalStr,
> >     >> target.iqn)
> >     >>   File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in
> node_login
> >     >>     raise IscsiNodeError(rc, out, err)
> >     >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target:
> >     >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260]
> >     >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1,
> >     target:
> >     >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].',
> >     >> 'iscsiadm: initiator reported error (8 - connection timed out)',
> >     >> 'iscsiadm: Could not log into all portals'])
> >     >>
> >     >>
> >     >> engine.log:
> >     >> [3]
> >     >>
> >     >>
> >     >> 2016-08-24 14:10:23,222 WARN
> >     >>
> >     [org.ovirt.engine.core.dal.dbbroker.auditloghandling.
> AuditLogDirector]
> >     >> (default task-25) [15d1637f] Correlation ID: 15d1637f, Call
> >     Stack: null,
> >     >> Custom Event ID:
> >     >>  -1, Message: iSCSI bond 'iBond' was successfully created in Data
> >     Center
> >     >> 'Default' but some of the hosts encountered connection issues.
> >     >>
> >     >>
> >     >>
> >     >> 2016-08-24 14:10:23,208 INFO
> >     >>
> >     [org.ovirt.engine.core.vdsbroker.vdsbroker.
> ConnectStorageServerVDSCommand]
> >     >>
> >     >> (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command
> >     >> 'org.ovirt.engine.core.vdsbrok
> >     >> er.vdsbroker.ConnectStorageServerVDSCommand' return value '
> >     >> ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc
> >     >> [code=5022, message=Message timeout which can be caused by
> >     communication
> >     >> issues]'}
> >     >>
> >     >>
> >     >>
> >     >> On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe at laverenz.de
> >     <mailto:uwe at laverenz.de>
> >     >> <mailto:uwe at laverenz.de <mailto:uwe at laverenz.de>>> wrote:
> >     >>
> >     >>     Hi Elad,
> >     >>
> >     >>     I sent you a download message.
> >     >>
> >     >>     thank you,
> >     >>     Uwe
> >     >>     _______________________________________________
> >     >>     Users mailing list
> >     >>     Users at ovirt.org <mailto:Users at ovirt.org>
> >     <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
> >     >>     http://lists.ovirt.org/mailman/listinfo/users
> >     <http://lists.ovirt.org/mailman/listinfo/users>
> >     >>     <http://lists.ovirt.org/mailman/listinfo/users
> >     <http://lists.ovirt.org/mailman/listinfo/users>>
> >     >>
> >     >>
> >     > _______________________________________________
> >     > Users mailing list
> >     > Users at ovirt.org <mailto:Users at ovirt.org>
> >     > http://lists.ovirt.org/mailman/listinfo/users
> >     <http://lists.ovirt.org/mailman/listinfo/users>
> >     _______________________________________________
> >     Users mailing list
> >     Users at ovirt.org <mailto:Users at ovirt.org>
> >     http://lists.ovirt.org/mailman/listinfo/users
> >     <http://lists.ovirt.org/mailman/listinfo/users>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160826/2fcf330f/attachment-0001.html>