iSCSI Multipathing -> host inactive

Uwe Laverenz

15 Aug 2016 15 Aug '16

4:31 p.m.

Hi all, I'd like to test iSCSI multipathing with OVirt 4.02 and see the following problem: if I try to add an iSCSI-Bond the host loses connection to _all_ storage domains. I guess I'm doing something wrong. :) I have built a small test environment for this: The storage is provided by a freenas box with two dedicated interfaces for two separate iSCSI networks. Each interface has one address in one network (no VLANs, no trunking). For each network there is one portal configured. Both portals point to the same target. The target has 4 LUNs. The host also has two dedicated interfaces for iSCSI and finds the target over both portals, all LUNs are selected for use and show 2 paths. My questions: 1) Is this setup ok or did I miss something? 2) The LUNs already show 2 paths (multipath -ll), one "active" and one "enabled", what difference would a Datacenter iSCSI-bond make? 3) What combination of checkboxes do I have to use? Logical Letworks [ ] ISCSIA [ ] ISCSIB Storage Targets [ ] iqn.2005-10.org.freenas.ctl:tgt01 10.0.131.121 3260 [ ] iqn.2005-10.org.freenas.ctl:tgt01 10.0.132.121 3260 As stated in the beginning: all my tests made the host lose connection to all storage domains (NFS included) and I can not see what I am doing wrong. Thank you very much! cu, Uwe

Show replies by date

Elad Ben Aharon

15 Aug 15 Aug

5:53 p.m.

Hi, Is the iSCSI domain that supposed to be connected through the bond the current master domain? Also, can you please provide the output of 'iscsiadm -m session -P3' ? Thanks On Mon, Aug 15, 2016 at 4:31 PM, Uwe Laverenz <uwe@laverenz.de> wrote:

...

Hi all,

I'd like to test iSCSI multipathing with OVirt 4.02 and see the following problem: if I try to add an iSCSI-Bond the host loses connection to _all_ storage domains.

I guess I'm doing something wrong. :)

I have built a small test environment for this:

The storage is provided by a freenas box with two dedicated interfaces for two separate iSCSI networks.

Each interface has one address in one network (no VLANs, no trunking). For each network there is one portal configured. Both portals point to the same target. The target has 4 LUNs.

The host also has two dedicated interfaces for iSCSI and finds the target over both portals, all LUNs are selected for use and show 2 paths.

My questions:

1) Is this setup ok or did I miss something?

2) The LUNs already show 2 paths (multipath -ll), one "active" and one "enabled", what difference would a Datacenter iSCSI-bond make?

3) What combination of checkboxes do I have to use?

Logical Letworks

[ ] ISCSIA [ ] ISCSIB

Storage Targets

[ ] iqn.2005-10.org.freenas.ctl:tgt01 10.0.131.121 3260 [ ] iqn.2005-10.org.freenas.ctl:tgt01 10.0.132.121 3260

As stated in the beginning: all my tests made the host lose connection to all storage domains (NFS included) and I can not see what I am doing wrong.

Thank you very much!

cu, Uwe _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

16 Aug 16 Aug

9:26 a.m.

Hi, Am 15.08.2016 um 16:53 schrieb Elad Ben Aharon:

...

Is the iSCSI domain that supposed to be connected through the bond the current master domain?

No, it isn't. An NFS share is the master domain.

...

Also, can you please provide the output of 'iscsiadm -m session -P3' ?

Yes, of course (meanwhile I have switched to 2 targets, 1 per portal). This is _without_ iSCSI-Bond: [root@ovh01 ~]# iscsiadm -m session -P3 iSCSI Transport Class version 2.0-870 version 6.2.0.873-33.2 Target: iqn.2005-10.org.freenas.ctl:tgta (non-flash) Current Portal: 10.0.131.121:3260,257 Persistent Portal: 10.0.131.121:3260,257 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.131.122 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 34 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: <empty> password: ******** username_in: <empty> password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 131072 FirstBurstLength: 131072 MaxBurstLength: 16776192 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 44 State: running scsi44 Channel 00 Id 0 Lun: 0 Attached scsi disk sdf State: running scsi44 Channel 00 Id 0 Lun: 1 Attached scsi disk sdg State: running scsi44 Channel 00 Id 0 Lun: 2 Attached scsi disk sdh State: running scsi44 Channel 00 Id 0 Lun: 3 Attached scsi disk sdi State: running Target: iqn.2005-10.org.freenas.ctl:tgtb (non-flash) Current Portal: 10.0.132.121:3260,258 Persistent Portal: 10.0.132.121:3260,258 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.132.122 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 35 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: <empty> password: ******** username_in: <empty> password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 131072 FirstBurstLength: 131072 MaxBurstLength: 16776192 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 45 State: running scsi45 Channel 00 Id 0 Lun: 0 Attached scsi disk sdj State: running scsi45 Channel 00 Id 0 Lun: 1 Attached scsi disk sdk State: running scsi45 Channel 00 Id 0 Lun: 2 Attached scsi disk sdl State: running scsi45 Channel 00 Id 0 Lun: 3 Attached scsi disk sdm State: running And `multipath -ll`: [root@ovh01 ~]# multipath -ll 36589cfc000000fafcc87da5ddd69c7e2 dm-2 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:2 sdl 8:176 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:2 sdh 8:112 active ready running 36589cfc000000d403b89dc6ad86ef2c0 dm-0 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:3 sdm 8:192 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:3 sdi 8:128 active ready running 36589cfc0000001976e4e013ce9074fa7 dm-1 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:1 sdk 8:160 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:1 sdg 8:96 active ready running 36589cfc0000002af0e93a13f7a73a54b dm-3 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:0 sdj 8:144 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:0 sdf 8:80 active ready running [root@ovh01 ~]# multipath -ll 36589cfc000000fafcc87da5ddd69c7e2 dm-2 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:2 sdl 8:176 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:2 sdh 8:112 active ready running 36589cfc000000d403b89dc6ad86ef2c0 dm-0 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:3 sdm 8:192 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:3 sdi 8:128 active ready running 36589cfc0000001976e4e013ce9074fa7 dm-1 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:1 sdk 8:160 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:1 sdg 8:96 active ready running 36589cfc0000002af0e93a13f7a73a54b dm-3 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:0 sdj 8:144 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:0 sdf 8:80 active ready running Thanks, Uwe

Elad Ben Aharon

10:26 a.m.

Currently, your host is connected through a single initiator, the 'Default' interface (Iface Name: default), to 2 targets: tgta and tgtb (Target: iqn.2005-10.org.freenas.ctl:tgta and Target: iqn.2005-10.org.freenas.ctl:tgtb). Hence, each LUN is exposed from the storage server via 2 paths. Since the connection to the storage is done via the 'Default' interface and not via the 2 iSCSI networks you've configured, currently, the iSCSI bond is not operational. For the iSCSI bond to be operational, you'll have to do the following: - Create 2 networks in RHEVM under the relevant cluster (not sure if you've already did it) - iSCSI1 and iSCSI2 . Configure both networks to be non-required networks for the cluster (should be also non-VM networks). - Attach the networks to the host's 2 interfaces using hosts Setup-networks. - Create a new iSCSI bond / modify the bond you've created and pick the 2 newly created networks along with all storage targets. Make sure that the Default network is not part of the bond (usually, the Default network is the management one - 'ovirtmgmt'). - Put the host in maintenance and re-activate it so the iSCSI sessions will be refreshed with the new connection specifications. Please let me know if it works for you. Elad On Tue, Aug 16, 2016 at 9:26 AM, Uwe Laverenz <uwe@laverenz.de> wrote:

...

Hi,

Am 15.08.2016 um 16:53 schrieb Elad Ben Aharon:

Is the iSCSI domain that supposed to be connected through the bond the

...
current master domain?

No, it isn't. An NFS share is the master domain.

Also, can you please provide the output of 'iscsiadm -m session -P3' ?

...
Yes, of course (meanwhile I have switched to 2 targets, 1 per portal). This is _without_ iSCSI-Bond:

[root@ovh01 ~]# iscsiadm -m session -P3 iSCSI Transport Class version 2.0-870 version 6.2.0.873-33.2 Target: iqn.2005-10.org.freenas.ctl:tgta (non-flash) Current Portal: 10.0.131.121:3260,257 Persistent Portal: 10.0.131.121:3260,257 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.131.122 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 34 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: <empty> password: ******** username_in: <empty> password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 131072 FirstBurstLength: 131072 MaxBurstLength: 16776192 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 44 State: running scsi44 Channel 00 Id 0 Lun: 0 Attached scsi disk sdf State: running scsi44 Channel 00 Id 0 Lun: 1 Attached scsi disk sdg State: running scsi44 Channel 00 Id 0 Lun: 2 Attached scsi disk sdh State: running scsi44 Channel 00 Id 0 Lun: 3 Attached scsi disk sdi State: running Target: iqn.2005-10.org.freenas.ctl:tgtb (non-flash) Current Portal: 10.0.132.121:3260,258 Persistent Portal: 10.0.132.121:3260,258 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.132.122 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 35 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: <empty> password: ******** username_in: <empty> password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 131072 FirstBurstLength: 131072 MaxBurstLength: 16776192 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 45 State: running scsi45 Channel 00 Id 0 Lun: 0 Attached scsi disk sdj State: running scsi45 Channel 00 Id 0 Lun: 1 Attached scsi disk sdk State: running scsi45 Channel 00 Id 0 Lun: 2 Attached scsi disk sdl State: running scsi45 Channel 00 Id 0 Lun: 3 Attached scsi disk sdm State: running

And `multipath -ll`:

[root@ovh01 ~]# multipath -ll 36589cfc000000fafcc87da5ddd69c7e2 dm-2 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:2 sdl 8:176 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:2 sdh 8:112 active ready running 36589cfc000000d403b89dc6ad86ef2c0 dm-0 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:3 sdm 8:192 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:3 sdi 8:128 active ready running 36589cfc0000001976e4e013ce9074fa7 dm-1 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:1 sdk 8:160 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:1 sdg 8:96 active ready running 36589cfc0000002af0e93a13f7a73a54b dm-3 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:0 sdj 8:144 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:0 sdf 8:80 active ready running [root@ovh01 ~]# multipath -ll 36589cfc000000fafcc87da5ddd69c7e2 dm-2 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:2 sdl 8:176 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:2 sdh 8:112 active ready running 36589cfc000000d403b89dc6ad86ef2c0 dm-0 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:3 sdm 8:192 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:3 sdi 8:128 active ready running 36589cfc0000001976e4e013ce9074fa7 dm-1 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:1 sdk 8:160 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:1 sdg 8:96 active ready running 36589cfc0000002af0e93a13f7a73a54b dm-3 FreeNAS ,iSCSI Disk size=128G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 45:0:0:0 sdj 8:144 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 44:0:0:0 sdf 8:80 active ready running

Thanks,

Uwe _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

11:29 a.m.

Hi, Am 16.08.2016 um 09:26 schrieb Elad Ben Aharon:

...

Currently, your host is connected through a single initiator, the 'Default' interface (Iface Name: default), to 2 targets: tgta and tgtb

I see what you mean, but the "Iface Name" is somewhat irritating here, it does not mean that the wrong interface (ovirtmgmt) is used. If you have a look at "Iface IPaddress" for both you can see that the correct, dedicated interfaces are used: Iface IPaddress: 10.0.131.122 (iSCSIA network) Iface IPaddress: 10.0.132.122 (iSCSIB network)

...

(Target: iqn.2005-10.org.freenas.ctl:tgta and Target: iqn.2005-10.org.freenas.ctl:tgtb). Hence, each LUN is exposed from the storage server via 2 paths. Since the connection to the storage is done via the 'Default' interface and not via the 2 iSCSI networks you've configured, currently, the iSCSI bond is not operational.

Please see above. The storage servers iSCSI-addresses aren't even reachable from the ovirtmgmt net, they are in completely isolated networks.

...

For the iSCSI bond to be operational, you'll have to do the following: - Create 2 networks in RHEVM under the relevant cluster (not sure if you've already did it) - iSCSI1 and iSCSI2 . Configure both networks to be non-required networks for the cluster (should be also non-VM networks). - Attach the networks to the host's 2 interfaces using hosts Setup-networks. - Create a new iSCSI bond / modify the bond you've created and pick the 2 newly created networks along with all storage targets. Make sure that the Default network is not part of the bond (usually, the Default network is the management one - 'ovirtmgmt'). - Put the host in maintenance and re-activate it so the iSCSI sessions will be refreshed with the new connection specifications.

This is exactly what I did, expect that I had to add the iSCSI-storage first, otherwise the "iSCSI Multipathing" tab does not appear in the data center section. I configured an iSCSI-Bond and the problem seems to be that it leads to conflicting iSCSI-settings on the host. The host uses the very same interface twice only with different "IFace Name": iSCSIA: Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.131.122 Iface Name: enp9s0f0 Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.131.122 iSCSIB: Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.132.122 Iface Name: enp9s0f1 Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:cda91b279ac5 Iface IPaddress: 10.0.132.122 I guess this is the reason why the host has problems to attach the storage domain, it toggles all storage domains on and off all the time. Thank you, Uwe

Elad Ben Aharon

11:52 a.m.

Please be sure that ovirtmgmt is not part of the iSCSI bond. It does seem to have a conflict between default and enp9s0f0 / enp9s0f1. Try to put the host in maintenance and then delete the iscsi nodes using 'iscsiadm -m node -o delete'. Then activate the host. Thanks, Elad

Uwe Laverenz

4:39 p.m.

Hi Elad, Am 16.08.2016 um 10:52 schrieb Elad Ben Aharon:

...

Please be sure that ovirtmgmt is not part of the iSCSI bond.

Yes, I made sure it is not part of the bond.

...

It does seem to have a conflict between default and enp9s0f0/ enp9s0f1. Try to put the host in maintenance and then delete the iscsi nodes using 'iscsiadm -m node -o delete'. Then activate the host.

I tried that, I managed to get the iSCSI interface clean, no "default" anymore. But that didn't solve the problem of the host becoming "inactive". Not even the NFS domains would come up. As soon as I remove the iSCSI-bond, the host becomes responsive again and I can activate all storage domains. Removing the bond also brings the duplicated "Iface Name" back (but this time causes no problems). ... I wonder if there is a basic misunderstanding on my side: wouldn't it be necessary that all targets are reachable from all interfaces that are configured into the bond to make it work? But this would either mean two interfaces in the same network or routing between the iSCSI networks. Thanks, Uwe

Elad Ben Aharon

18 Aug 18 Aug

12:10 p.m.

I don't think it's necessary. Please provide the host's routing table and interfaces list ('ip a' or ifconfing) while it's configured with the bond. Thanks On Tue, Aug 16, 2016 at 4:39 PM, Uwe Laverenz <uwe@laverenz.de> wrote:

...

Hi Elad,

Am 16.08.2016 um 10:52 schrieb Elad Ben Aharon:

Please be sure that ovirtmgmt is not part of the iSCSI bond.

...
Yes, I made sure it is not part of the bond.

It does seem to have a conflict between default and enp9s0f0/ enp9s0f1.

...
Try to put the host in maintenance and then delete the iscsi nodes using 'iscsiadm -m node -o delete'. Then activate the host.

I tried that, I managed to get the iSCSI interface clean, no "default" anymore. But that didn't solve the problem of the host becoming "inactive". Not even the NFS domains would come up.

As soon as I remove the iSCSI-bond, the host becomes responsive again and I can activate all storage domains. Removing the bond also brings the duplicated "Iface Name" back (but this time causes no problems).

...

I wonder if there is a basic misunderstanding on my side: wouldn't it be necessary that all targets are reachable from all interfaces that are configured into the bond to make it work?

But this would either mean two interfaces in the same network or routing between the iSCSI networks.

Thanks,

Uwe _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

24 Aug 24 Aug

3:22 p.m.

Hi, sorry for the delay, I reinstalled everything, configured the networks, attached the iSCSI storage with 2 interfaces and finally created the iSCSI-bond:

...

[root@ovh01 ~]# route Kernel IP Routentabelle Ziel Router Genmask Flags Metric Ref Use Iface default hp5406-1-srv.mo 0.0.0.0 UG 0 0 0 ovirtmgmt 10.0.24.0 0.0.0.0 255.255.255.0 U 0 0 0 ovirtmgmt 10.0.131.0 0.0.0.0 255.255.255.0 U 0 0 0 enp9s0f0 10.0.132.0 0.0.0.0 255.255.255.0 U 0 0 0 enp9s0f1 link-local 0.0.0.0 255.255.0.0 U 1005 0 0 enp9s0f0 link-local 0.0.0.0 255.255.0.0 U 1006 0 0 enp9s0f1 link-local 0.0.0.0 255.255.0.0 U 1008 0 0 ovirtmgmt link-local 0.0.0.0 255.255.0.0 U 1015 0 0 bond0 link-local 0.0.0.0 255.255.0.0 U 1017 0 0 ADMIN link-local 0.0.0.0 255.255.0.0 U 1021 0 0 SRV

and:

...

[root@ovh01 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp13s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000 link/ether e0:3f:49:6d:68:c4 brd ff:ff:ff:ff:ff:ff 3: enp8s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 4: enp8s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 5: enp9s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:11:21:d4 brd ff:ff:ff:ff:ff:ff inet 10.0.131.181/24 brd 10.0.131.255 scope global enp9s0f0 valid_lft forever preferred_lft forever 6: enp9s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:11:21:d5 brd ff:ff:ff:ff:ff:ff inet 10.0.132.181/24 brd 10.0.132.255 scope global enp9s0f1 valid_lft forever preferred_lft forever 7: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 26:b2:4e:5e:f0:60 brd ff:ff:ff:ff:ff:ff 8: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether e0:3f:49:6d:68:c4 brd ff:ff:ff:ff:ff:ff inet 10.0.24.181/24 brd 10.0.24.255 scope global ovirtmgmt valid_lft forever preferred_lft forever 14: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN qlen 500 link/ether fe:16:3e:79:25:86 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe79:2586/64 scope link valid_lft forever preferred_lft forever 15: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 16: bond0.32@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ADMIN state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 17: ADMIN: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 20: bond0.24@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master SRV state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 21: SRV: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff

The host keeps toggling all storage domains on and off as soon as there is an iSCSI bond configured. Thank you for your patience. cu, Uwe Am 18.08.2016 um 11:10 schrieb Elad Ben Aharon:

...

I don't think it's necessary. Please provide the host's routing table and interfaces list ('ip a' or ifconfing) while it's configured with the bond.

Thanks

On Tue, Aug 16, 2016 at 4:39 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

Am 16.08.2016 um 10:52 schrieb Elad Ben Aharon:

Please be sure that ovirtmgmt is not part of the iSCSI bond.

Yes, I made sure it is not part of the bond.

It does seem to have a conflict between default and enp9s0f0/ enp9s0f1. Try to put the host in maintenance and then delete the iscsi nodes using 'iscsiadm -m node -o delete'. Then activate the host.

I tried that, I managed to get the iSCSI interface clean, no "default" anymore. But that didn't solve the problem of the host becoming "inactive". Not even the NFS domains would come up.

As soon as I remove the iSCSI-bond, the host becomes responsive again and I can activate all storage domains. Removing the bond also brings the duplicated "Iface Name" back (but this time causes no problems).

...

I wonder if there is a basic misunderstanding on my side: wouldn't it be necessary that all targets are reachable from all interfaces that are configured into the bond to make it work?

But this would either mean two interfaces in the same network or routing between the iSCSI networks.

Thanks,

Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

Elad Ben Aharon

3:39 p.m.

Network configuration seems OK. Please provide engine.log and vdsm.log Thanks On Wed, Aug 24, 2016 at 3:22 PM, Uwe Laverenz <uwe@laverenz.de> wrote:

...

Hi,

sorry for the delay, I reinstalled everything, configured the networks, attached the iSCSI storage with 2 interfaces and finally created the iSCSI-bond:

[root@ovh01 ~]# route

...
Kernel IP Routentabelle Ziel Router Genmask Flags Metric Ref Use Iface default hp5406-1-srv.mo 0.0.0.0 UG 0 0 0 ovirtmgmt 10.0.24.0 0.0.0.0 255.255.255.0 U 0 0 0 ovirtmgmt 10.0.131.0 0.0.0.0 255.255.255.0 U 0 0 0 enp9s0f0 10.0.132.0 0.0.0.0 255.255.255.0 U 0 0 0 enp9s0f1 link-local 0.0.0.0 255.255.0.0 U 1005 0 0 enp9s0f0 link-local 0.0.0.0 255.255.0.0 U 1006 0 0 enp9s0f1 link-local 0.0.0.0 255.255.0.0 U 1008 0 0 ovirtmgmt link-local 0.0.0.0 255.255.0.0 U 1015 0 0 bond0 link-local 0.0.0.0 255.255.0.0 U 1017 0 0 ADMIN link-local 0.0.0.0 255.255.0.0 U 1021 0 0 SRV

and:

[root@ovh01 ~]# ip a

...
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp13s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000 link/ether e0:3f:49:6d:68:c4 brd ff:ff:ff:ff:ff:ff 3: enp8s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 4: enp8s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 5: enp9s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:11:21:d4 brd ff:ff:ff:ff:ff:ff inet 10.0.131.181/24 brd 10.0.131.255 scope global enp9s0f0 valid_lft forever preferred_lft forever 6: enp9s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:11:21:d5 brd ff:ff:ff:ff:ff:ff inet 10.0.132.181/24 brd 10.0.132.255 scope global enp9s0f1 valid_lft forever preferred_lft forever 7: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 26:b2:4e:5e:f0:60 brd ff:ff:ff:ff:ff:ff 8: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether e0:3f:49:6d:68:c4 brd ff:ff:ff:ff:ff:ff inet 10.0.24.181/24 brd 10.0.24.255 scope global ovirtmgmt valid_lft forever preferred_lft forever 14: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN qlen 500 link/ether fe:16:3e:79:25:86 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe79:2586/64 scope link valid_lft forever preferred_lft forever 15: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 16: bond0.32@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ADMIN state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 17: ADMIN: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 20: bond0.24@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master SRV state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff 21: SRV: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:e2:ba:11:21:d0 brd ff:ff:ff:ff:ff:ff

The host keeps toggling all storage domains on and off as soon as there is an iSCSI bond configured.

Thank you for your patience.

cu, Uwe

Am 18.08.2016 um 11:10 schrieb Elad Ben Aharon:

...
I don't think it's necessary. Please provide the host's routing table and interfaces list ('ip a' or ifconfing) while it's configured with the bond.

Thanks

On Tue, Aug 16, 2016 at 4:39 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

Am 16.08.2016 um 10:52 schrieb Elad Ben Aharon:

Please be sure that ovirtmgmt is not part of the iSCSI bond.

Yes, I made sure it is not part of the bond.

It does seem to have a conflict between default and enp9s0f0/ enp9s0f1. Try to put the host in maintenance and then delete the iscsi nodes using 'iscsiadm -m node -o delete'. Then activate the host.

I tried that, I managed to get the iSCSI interface clean, no "default" anymore. But that didn't solve the problem of the host becoming "inactive". Not even the NFS domains would come up.

As soon as I remove the iSCSI-bond, the host becomes responsive again and I can activate all storage domains. Removing the bond also brings the duplicated "Iface Name" back (but this time causes no problems).

...

I wonder if there is a basic misunderstanding on my side: wouldn't it be necessary that all targets are reachable from all interfaces that are configured into the bond to make it work?

But this would either mean two interfaces in the same network or routing between the iSCSI networks.

Thanks,

Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

4:04 p.m.

Hi Elad, I sent you a download message. thank you, Uwe

Elad Ben Aharon

4:35 p.m.

Thanks. You're getting an iSCSI connection timeout [1], [2]. It means the host cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0. This causes the host to loose its connection to the storage and also, the connection to the engine becomes inactive. Therefore, the host changes its status to Non-responsive [3] and since it's the SPM, the whole DC, with all its storage domains become inactive. vdsm.log: [1] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals']) vdsm.log: [2] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals']) engine.log: [3] 2016-08-24 14:10:23,222 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, Custom Event ID: -1, Message: iSCSI bond 'iBond' was successfully created in Data Center 'Default' but some of the hosts encountered connection issues. 2016-08-24 14:10:23,208 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command 'org.ovirt.engine.core.vdsbrok er.vdsbroker.ConnectStorageServerVDSCommand' return value ' ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]'} On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de> wrote:

...

Hi Elad,

I sent you a download message.

thank you, Uwe _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

5:04 p.m.

Hi Elad, thank you very much for clearing things up. Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' and 'b' are in completely separate networks this can never work as long as there is no routing between the networks. So it seems the iSCSI-bonding feature is not useful for my setup. I still wonder how and where this feature is supposed to be used? thank you, Uwe Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:

...

Thanks.

You're getting an iSCSI connection timeout [1], [2]. It means the host cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0.

This causes the host to loose its connection to the storage and also, the connection to the engine becomes inactive. Therefore, the host changes its status to Non-responsive [3] and since it's the SPM, the whole DC, with all its storage domains become inactive.

vdsm.log: [1] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

vdsm.log: [2] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

engine.log: [3]

2016-08-24 14:10:23,222 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, Custom Event ID: -1, Message: iSCSI bond 'iBond' was successfully created in Data Center 'Default' but some of the hosts encountered connection issues.

2016-08-24 14:10:23,208 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command 'org.ovirt.engine.core.vdsbrok er.vdsbroker.ConnectStorageServerVDSCommand' return value ' ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]'}

On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

I sent you a download message.

thank you, Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

InterNetX - Juergen Gotteswinter

6:15 p.m.

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail. its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for. we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer Been there, done that, wont do again. Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:

...

Hi Elad,

thank you very much for clearing things up.

Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' and 'b' are in completely separate networks this can never work as long as there is no routing between the networks.

So it seems the iSCSI-bonding feature is not useful for my setup. I still wonder how and where this feature is supposed to be used?

thank you, Uwe

Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:

...
Thanks.

You're getting an iSCSI connection timeout [1], [2]. It means the host cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0.

This causes the host to loose its connection to the storage and also, the connection to the engine becomes inactive. Therefore, the host changes its status to Non-responsive [3] and since it's the SPM, the whole DC, with all its storage domains become inactive.

vdsm.log: [1] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

vdsm.log: [2] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

engine.log: [3]

2016-08-24 14:10:23,222 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, Custom Event ID: -1, Message: iSCSI bond 'iBond' was successfully created in Data Center 'Default' but some of the hosts encountered connection issues.

2016-08-24 14:10:23,208 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]

(org.ovirt.thread.pool-8-thread-25) [15d1637f] Command 'org.ovirt.engine.core.vdsbrok er.vdsbroker.ConnectStorageServerVDSCommand' return value ' ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]'}

On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

I sent you a download message.

thank you, Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Uwe Laverenz

25 Aug 25 Aug

9:42 a.m.

Hi Jürgen, Am 24.08.2016 um 17:15 schrieb InterNetX - Juergen Gotteswinter:

...

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

Thank you, I take this as a warning. :) For my testbed I chose to ignore the iSCSI-bond feature and change the multipath default to round robin instead. What kind of storage do you use in production? Fibre channel, gluster, ceph, ...? thanks, Uwe

InterNetX - Juergen Gotteswinter

2:37 p.m.

Am 25.08.2016 um 08:42 schrieb Uwe Laverenz:

...

Hi Jürgen,

Am 24.08.2016 um 17:15 schrieb InterNetX - Juergen Gotteswinter:

...
iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

Thank you, I take this as a warning. :)

For my testbed I chose to ignore the iSCSI-bond feature and change the multipath default to round robin instead.

What kind of storage do you use in production? Fibre channel, gluster, ceph, ...?

...

...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

currently, iscsi multipathed with solaris based filer as backend. but this is already in progress of getting migrated to a different, less fragile, plattform. ovirt is nice, but too bleeding edge and way to much acting like a girly

...

thanks, Uwe _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Nicolas Ecarnot

4:34 p.m.

Le 25/08/2016 à 13:37, InterNetX - Juergen Gotteswinter a écrit :

...

...
Am 24.08.2016 um 17:15 schrieb InterNetX - Juergen Gotteswinter:

...
iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

We are using oVirt + iSCSI, but using bonding mode 1 (active/passive) and no multipathing. It is running OK. So far.

...

...
...
its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

You are not wrong...

...

...
...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

Thank you, I take this as a warning. :)

For my testbed I chose to ignore the iSCSI-bond feature and change the multipath default to round robin instead.

What kind of storage do you use in production? Fibre channel, gluster, ceph, ...?

...
...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

currently, iscsi multipathed with solaris based filer as backend. but this is already in progress of getting migrated to a different, less fragile, plattform. ovirt is nice, but too bleeding edge and way to much

...

acting like a _girly_

Man, you killed me! :) :) :) -- Nicolas ECARNOT, crying laughing :) :) :)

Nir Soffer

29 Aug 29 Aug

1:25 p.m.

On Thu, Aug 25, 2016 at 2:37 PM, InterNetX - Juergen Gotteswinter <jg@internetx.com> wrote:

...

currently, iscsi multipathed with solaris based filer as backend. but this is already in progress of getting migrated to a different, less fragile, plattform. ovirt is nice, but too bleeding edge and way to much acting like a girly

"acting like a girly" is not appropriate for this list. Nir

InterNetX - Juergen Gotteswinter

1:31 p.m.

Am 29.08.2016 um 12:25 schrieb Nir Soffer:

...

On Thu, Aug 25, 2016 at 2:37 PM, InterNetX - Juergen Gotteswinter <jg@internetx.com> wrote:

...
currently, iscsi multipathed with solaris based filer as backend. but this is already in progress of getting migrated to a different, less fragile, plattform. ovirt is nice, but too bleeding edge and way to much acting like a girly

"acting like a girly" is not appropriate for this list.

Nir

i am sorry, this was never meant to discriminate any human. if it did, i promise that it was not meant to.

Yaniv Kaul

25 Aug 25 Aug

4:53 p.m.

On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter < juergen.gotteswinter@internetx.com> wrote:

...

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

I disagree. In most cases, it's actually a lower layer issues. In most cases, btw, it's because multipathing was not configured (or not configured correctly).

...

its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

Did you open bugs for those issues? I'm not aware of 'no real reason' issues.

...

we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

We've had good success and reliability with most enterprise level storage, such as EMC, NetApp, Dell filers. When properly configured, of course. Y.

...

Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:

...
Hi Elad,

thank you very much for clearing things up.

Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' and 'b' are in completely separate networks this can never work as long as there is no routing between the networks.

So it seems the iSCSI-bonding feature is not useful for my setup. I still wonder how and where this feature is supposed to be used?

thank you, Uwe

Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:

...
Thanks.

You're getting an iSCSI connection timeout [1], [2]. It means the host cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0.

This causes the host to loose its connection to the storage and also, the connection to the engine becomes inactive. Therefore, the host changes its status to Non-responsive [3] and since it's the SPM, the whole DC, with all its storage domains become inactive.

vdsm.log: [1] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

vdsm.log: [2] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

engine.log: [3]

2016-08-24 14:10:23,222 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, Custom Event ID: -1, Message: iSCSI bond 'iBond' was successfully created in Data Center 'Default' but some of the hosts encountered connection issues.

2016-08-24 14:10:23,208 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker. ConnectStorageServerVDSCommand]

(org.ovirt.thread.pool-8-thread-25) [15d1637f] Command 'org.ovirt.engine.core.vdsbrok er.vdsbroker.ConnectStorageServerVDSCommand' return value ' ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]'}

On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

I sent you a download message.

thank you, Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

InterNetX - Juergen Gotteswinter

26 Aug 26 Aug

1:33 p.m.

Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:

...

On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com <mailto:juergen.gotteswinter@internetx.com>> wrote:

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

I disagree. In most cases, it's actually a lower layer issues. In most cases, btw, it's because multipathing was not configured (or not configured correctly).

experience tells me it is like is said, this was something i have seen from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing is very widespread in use in our DC. But such problems are an excluse Ovirt/RHEV Feature.

...

its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

Did you open bugs for those issues? I'm not aware of 'no real reason' issues.

Support Tickets for Rhev Installation, after Support (even after massive escalation requests) kept telling me the same again and again i gave up and we dropped the RHEV Subscriptions to Migrate the VMS to a different Plattform Solution (still iSCSI Backend). Problems gone.

...

we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

We've had good success and reliability with most enterprise level storage, such as EMC, NetApp, Dell filers. When properly configured, of course. Y.

Dell Equallogic? Cant really believe since Ovirt / Rhev and the Equallogic Network Configuration wont play nice together (EQL wants all Interfaces in the same Subnet). And they only work like expected when there Hit Kit Driverpackage is installed. Without Path Failover is like russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit -> Kaboom. Host not available. There are several KB Articles in the RHN, without real Solution. But like you try to tell between the lines, this must be the Customers misconfiguration. Yep, typical Supportkilleranswer. Same Style than in RHN Tickets, i am done with this. Thanks.

...

Am 24.08.2016 um 16:04 schrieb Uwe Laverenz: > Hi Elad, > > thank you very much for clearing things up. > > Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' > and 'b' are in completely separate networks this can never work as long > as there is no routing between the networks. > > So it seems the iSCSI-bonding feature is not useful for my setup. I > still wonder how and where this feature is supposed to be used? > > thank you, > Uwe > > Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon: >> Thanks. >> >> You're getting an iSCSI connection timeout [1], [2]. It means the host >> cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0. >> >> This causes the host to loose its connection to the storage and also, >> the connection to the engine becomes inactive. Therefore, the host >> changes its status to Non-responsive [3] and since it's the SPM, the >> whole DC, with all its storage domains become inactive. >> >> >> vdsm.log: >> [1] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode >> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login >> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: >> iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ >> et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', >> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> >> vdsm.log: >> [2] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode >> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login >> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', >> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> engine.log: >> [3] >> >> >> 2016-08-24 14:10:23,222 WARN >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] >> (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, >> Custom Event ID: >> -1, Message: iSCSI bond 'iBond' was successfully created in Data Center >> 'Default' but some of the hosts encountered connection issues. >> >> >> >> 2016-08-24 14:10:23,208 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] >> >> (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command >> 'org.ovirt.engine.core.vdsbrok >> er.vdsbroker.ConnectStorageServerVDSCommand' return value ' >> ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc >> [code=5022, message=Message timeout which can be caused by communication >> issues]'} >> >> >> >> On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de> >> <mailto:uwe@laverenz.de <mailto:uwe@laverenz.de>>> wrote: >> >> Hi Elad, >> >> I sent you a download message. >> >> thank you, >> Uwe >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> >> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> >> <http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>> >> >> > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

InterNetX - Juergen Gotteswinter

1:38 p.m.

one more thing, which i am sure that most ppl are not aware of. when using thin provisioned disks for vms, hosted on iSCSI SAN Ovirt uses a for me unusual way to do this. Ovirt adds a new LVM LV for a vm, generates a Thin Qcow Image which is written directly raw onto that LV. So for, ok, can be done like this. But try generating some Write Load from within the Guest and see yourself what will happen. Supports answer to this is: use Raw, without thin. Seems to me like a wrong design decision, and i can only warn everyone to use this. Am 26.08.2016 um 12:33 schrieb InterNetX - Juergen Gotteswinter:

...

Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:

...
On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com <mailto:juergen.gotteswinter@internetx.com>> wrote:

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

I disagree. In most cases, it's actually a lower layer issues. In most cases, btw, it's because multipathing was not configured (or not configured correctly).

experience tells me it is like is said, this was something i have seen from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing is very widespread in use in our DC. But such problems are an excluse Ovirt/RHEV Feature.

...
its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

Did you open bugs for those issues? I'm not aware of 'no real reason' issues.

Support Tickets for Rhev Installation, after Support (even after massive escalation requests) kept telling me the same again and again i gave up and we dropped the RHEV Subscriptions to Migrate the VMS to a different Plattform Solution (still iSCSI Backend). Problems gone.

...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

We've had good success and reliability with most enterprise level storage, such as EMC, NetApp, Dell filers. When properly configured, of course. Y.

Dell Equallogic? Cant really believe since Ovirt / Rhev and the Equallogic Network Configuration wont play nice together (EQL wants all Interfaces in the same Subnet). And they only work like expected when there Hit Kit Driverpackage is installed. Without Path Failover is like russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit -> Kaboom. Host not available.

There are several KB Articles in the RHN, without real Solution.

But like you try to tell between the lines, this must be the Customers misconfiguration. Yep, typical Supportkilleranswer. Same Style than in RHN Tickets, i am done with this.

Thanks.

...
Am 24.08.2016 um 16:04 schrieb Uwe Laverenz: > Hi Elad, > > thank you very much for clearing things up. > > Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' > and 'b' are in completely separate networks this can never work as long > as there is no routing between the networks. > > So it seems the iSCSI-bonding feature is not useful for my setup. I > still wonder how and where this feature is supposed to be used? > > thank you, > Uwe > > Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon: >> Thanks. >> >> You're getting an iSCSI connection timeout [1], [2]. It means the host >> cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0. >> >> This causes the host to loose its connection to the storage and also, >> the connection to the engine becomes inactive. Therefore, the host >> changes its status to Non-responsive [3] and since it's the SPM, the >> whole DC, with all its storage domains become inactive. >> >> >> vdsm.log: >> [1] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode >> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login >> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: >> iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ >> et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', >> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> >> vdsm.log: >> [2] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode >> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login >> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', >> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> engine.log: >> [3] >> >> >> 2016-08-24 14:10:23,222 WARN >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] >> (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, >> Custom Event ID: >> -1, Message: iSCSI bond 'iBond' was successfully created in Data Center >> 'Default' but some of the hosts encountered connection issues. >> >> >> >> 2016-08-24 14:10:23,208 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] >> >> (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command >> 'org.ovirt.engine.core.vdsbrok >> er.vdsbroker.ConnectStorageServerVDSCommand' return value ' >> ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc >> [code=5022, message=Message timeout which can be caused by communication >> issues]'} >> >> >> >> On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de> >> <mailto:uwe@laverenz.de <mailto:uwe@laverenz.de>>> wrote: >> >> Hi Elad, >> >> I sent you a download message. >> >> thank you, >> Uwe >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> >> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> >> <http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>> >> >> > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Nicolas Ecarnot

3:35 p.m.

Le 26/08/2016 à 12:33, InterNetX - Juergen Gotteswinter a écrit :

...

Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:

...
On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com <mailto:juergen.gotteswinter@internetx.com>> wrote:

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

I disagree. In most cases, it's actually a lower layer issues. In most cases, btw, it's because multipathing was not configured (or not configured correctly).

experience tells me it is like is said, this was something i have seen from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing is very widespread in use in our DC. But such problems are an excluse Ovirt/RHEV Feature.

...
its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in most cases there was no real reason for.

Did you open bugs for those issues? I'm not aware of 'no real reason' issues.

Support Tickets for Rhev Installation, after Support (even after massive escalation requests) kept telling me the same again and again i gave up and we dropped the RHEV Subscriptions to Migrate the VMS to a different Plattform Solution (still iSCSI Backend). Problems gone.

...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

We've had good success and reliability with most enterprise level storage, such as EMC, NetApp, Dell filers. When properly configured, of course. Y.

Dell Equallogic? Cant really believe since Ovirt / Rhev and the Equallogic Network Configuration wont play nice together (EQL wants all Interfaces in the same Subnet). And they only work like expected when there Hit Kit Driverpackage is installed. Without Path Failover is like russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit -> Kaboom. Host not available.

There are several KB Articles in the RHN, without real Solution.

I'm working with Dell, EMC and EQL for 7 years, and with oVirt for 4 years, and I must recognize that what Juergen said is true : HIT is not compatible with the way oVirt is using iSCSI (or the other way round). What he says about LVM is also true. I still love oVirt, and won't quit. I'm just very patient and hoping my BZ and RFE get fixed. -- Nicolas ECARNOT

Yaniv Kaul

11:40 p.m.

On Fri, Aug 26, 2016 at 1:33 PM, InterNetX - Juergen Gotteswinter < jg@internetx.com> wrote:

...

Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:

...
On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com <mailto:juergen.gotteswinter@internetx.com>> wrote:

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails

why

...
did it fail.

I disagree. In most cases, it's actually a lower layer issues. In most cases, btw, it's because multipathing was not configured (or not configured correctly).

experience tells me it is like is said, this was something i have seen from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing is very widespread in use in our DC. But such problems are an excluse Ovirt/RHEV Feature.

I don't think the resentful tone is appropriate for the oVirt community mailing list.

...

...
its supersensitive to latency, and superfast with setting an host to inactive because the engine thinks something is wrong with it. in

most

...
cases there was no real reason for.

Did you open bugs for those issues? I'm not aware of 'no real reason' issues.

Support Tickets for Rhev Installation, after Support (even after massive escalation requests) kept telling me the same again and again i gave up and we dropped the RHEV Subscriptions to Migrate the VMS to a different Plattform Solution (still iSCSI Backend). Problems gone.

I wish you peace of mind with your new platform solution.

...

From a (shallow!) search I've made on oVirt bugs, I could not find any oVirt issue you've reported or commented on. I am aware of the request to set rp_filter correctly for setups with multiple interfaces in the same IP subnet.

...

...
we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

We've had good success and reliability with most enterprise level storage, such as EMC, NetApp, Dell filers. When properly configured, of course. Y.

Dell Equallogic? Cant really believe since Ovirt / Rhev and the Equallogic Network Configuration wont play nice together (EQL wants all Interfaces in the same Subnet). And they only work like expected when there Hit Kit Driverpackage is installed. Without Path Failover is like russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit -> Kaboom. Host not available.

Thanks - I'll look into this specific storage. I'm aware it's unique in some cases, but I don't have experience with it specifically.

...

There are several KB Articles in the RHN, without real Solution.

But like you try to tell between the lines, this must be the Customers misconfiguration. Yep, typical Supportkilleranswer. Same Style than in RHN Tickets, i am done with this.

A funny sentence I've read yesterday: Schrodinger's backup: "the condition of any backup is unknown until restore is attempted." In a sense, this is similar to no SPOF high availability setup - in many cases you don't know it works well until needed. There are simply many variables and components involved. That was all I meant, nothing between the lines and I apologize if I've given you a different impression. Y.

...

Thanks.

...
Am 24.08.2016 um 16:04 schrieb Uwe Laverenz: > Hi Elad, > > thank you very much for clearing things up. > > Initiator/iface 'a' tries to connect target 'b' and vice versa. As

'a'

...
> and 'b' are in completely separate networks this can never work as long > as there is no routing between the networks. > > So it seems the iSCSI-bonding feature is not useful for my setup. I > still wonder how and where this feature is supposed to be used? > > thank you, > Uwe > > Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon: >> Thanks. >> >> You're getting an iSCSI connection timeout [1], [2]. It means the host >> cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0. >> >> This causes the host to loose its connection to the storage and

also,

...
>> the connection to the engine becomes inactive. Therefore, the host >> changes its status to Non-responsive [3] and since it's the SPM,

the

...
>> whole DC, with all its storage domains become inactive. >> >> >> vdsm.log: >> [1] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in

addIscsiNode

...
>> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in

node_login

...
>> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: >> iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0,

targ

...
>> et: iqn.2005-10.org.freenas.ctl:tgtb, portal:

10.0.132.121,3260].',

...
>> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> >> vdsm.log: >> [2] >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/hsm.py", line 2400, in >> connectStorageServer >> conObj.connect() >> File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect >> iscsi.addIscsiNode(self._iface, self._target, self._cred) >> File "/usr/share/vdsm/storage/iscsi.py", line 204, in

addIscsiNode

...
>> iscsiadm.node_login(iface.name <http://iface.name> <http://iface.name>, portalStr, >> target.iqn) >> File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in

node_login

...
>> raise IscsiNodeError(rc, out, err) >> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] >> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: >> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', >> 'iscsiadm: initiator reported error (8 - connection timed out)', >> 'iscsiadm: Could not log into all portals']) >> >> >> engine.log: >> [3] >> >> >> 2016-08-24 14:10:23,222 WARN >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.

AuditLogDirector]

...
>> (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, >> Custom Event ID: >> -1, Message: iSCSI bond 'iBond' was successfully created in Data Center >> 'Default' but some of the hosts encountered connection issues. >> >> >> >> 2016-08-24 14:10:23,208 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.

ConnectStorageServerVDSCommand]

...
>> >> (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command >> 'org.ovirt.engine.core.vdsbrok >> er.vdsbroker.ConnectStorageServerVDSCommand' return value ' >> ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc >> [code=5022, message=Message timeout which can be caused by communication >> issues]'} >> >> >> >> On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de> >> <mailto:uwe@laverenz.de <mailto:uwe@laverenz.de>>> wrote: >> >> Hi Elad, >> >> I sent you a download message. >> >> thank you, >> Uwe >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> >> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> >> <http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>> >> >> > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

Nir Soffer

29 Aug 29 Aug

1:24 p.m.

On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:

...

iSCSI & Ovirt is an awful combination, no matter if multipathed or bonded. its always gambling how long it will work, and when it fails why did it fail.

its supersensitive to latency,

Can you elaborate on this?

...

and superfast with setting an host to inactive because the engine thinks something is wrong with it.

Typically it takes at least 5 minutes with abnormal monitoring conditions before engine will make a host non-operational, is this realy superfast?

...

in most cases there was no real reason for.

I think the main issue was mixing of storage monitoring and lvm refreshes, unneeded serialization of lvm commands, and bad locking in engine side. The engine side was fixed in 3.6, and the vdsm side in 4.0. See https://bugzilla.redhat.com/1081962 In rhel/centos 7.2, lot of lot of multipath related issues were fixed, and ovirt multipath configuration was fixed to prevent unwanted io queuing with some devices, that could lead to long delays and failures in many flows. However, I think our configuration is too extreme, and you may like to use the configuration in this patch: https://gerrit.ovirt.org/61281 I guess trying 4.0 may be too bleeding edge for you, but hopefully you will find that your iscsi setup is much more reliable now. Please file bugs if you still have issues with 4.0. Nir

...

we had this in several different hardware combinations, self built filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:

...
Hi Elad,

thank you very much for clearing things up.

Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a' and 'b' are in completely separate networks this can never work as long as there is no routing between the networks.

So it seems the iSCSI-bonding feature is not useful for my setup. I still wonder how and where this feature is supposed to be used?

thank you, Uwe

Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:

...
Thanks.

You're getting an iSCSI connection timeout [1], [2]. It means the host cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0.

This causes the host to loose its connection to the storage and also, the connection to the engine becomes inactive. Therefore, the host changes its status to Non-responsive [3] and since it's the SPM, the whole DC, with all its storage domains become inactive.

vdsm.log: [1] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

vdsm.log: [2] Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode iscsiadm.node_login(iface.name <http://iface.name>, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260] (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target: iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

engine.log: [3]

2016-08-24 14:10:23,222 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null, Custom Event ID: -1, Message: iSCSI bond 'iBond' was successfully created in Data Center 'Default' but some of the hosts encountered connection issues.

2016-08-24 14:10:23,208 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]

(org.ovirt.thread.pool-8-thread-25) [15d1637f] Command 'org.ovirt.engine.core.vdsbrok er.vdsbroker.ConnectStorageServerVDSCommand' return value ' ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]'}

On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz <uwe@laverenz.de <mailto:uwe@laverenz.de>> wrote:

Hi Elad,

I sent you a download message.

thank you, Uwe _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3266

Age (days ago)

3280

Last active (days ago)

List overview

Download

24 comments

7 participants

participants (7)

Elad Ben Aharon
InterNetX - Juergen Gotteswinter
InterNetX - Juergen Gotteswinter
Nicolas Ecarnot
Nir Soffer
Uwe Laverenz
Yaniv Kaul

iSCSI Multipathing -> host inactive

tags

participants (7)