[ovirt-users] R: Re: R: Re: Network instability after upgrade 3.6.0 ->? 3.6.1

Jarod Wilson jarod at redhat.com
Tue Feb 9 18:32:10 UTC 2016


On Mon, Feb 08, 2016 at 03:40:15PM +0200, Dan Kenigsberg wrote:
> On Thu, Feb 04, 2016 at 10:21:10PM +0100, Stefano Danzi wrote:
> > I have only one switch so two interfaces are connected to the same switch. The configuration in switch is corrected.  I opened a ticket for switch Tech support and the configuration was validated.
> > This configuration worked without problems h24 for one year! !!!!  All problems started after a kernel update.... so something was changed in kernel. ....
> 
> Jarod, do you have a clue why AggregatorIDs may be mismatching with
> recent el7.2 kernels?

Ah, the mail chain includes the reporter of bug 1295423. In that bug,
we've already discovered that even with the older supposedly working
kernel, the two interfaces had different LAG IDs, so I don't think it's
recent kernels causing the IDs to mis-match, but something recent
certainly makes that more fatal than it did before. I'm not sure what the
core problem is here, don't want to chase anything without being able to
say for sure that things are 100% correct on the switch side.

> > -------- Messaggio originale --------
> > Da: Dan Kenigsberg <danken at redhat.com> 
> > Data: 04/02/2016  22:02  (GMT+01:00) 
> > A: Stefano Danzi <s.danzi at hawai.it>, ydary at redhat.com 
> > Cc: Jon Archer <jon at rosslug.org.uk>, mburman at redhat.com, users at ovirt.org 
> > Oggetto: Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 ->
> >   3.6.1 
> > 
> > On Thu, Feb 04, 2016 at 06:26:14PM +0100, Stefano Danzi wrote:
> > > 
> > > 
> > > Il 04/02/2016 16.55, Dan Kenigsberg ha scritto:
> > > >On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:
> > > >>On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> > > >>>On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> > > >>>>I did some tests:
> > > >>>>
> > > >>>>kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> > > >>>>one network cable the network is stable)
> > > >>>>kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> > > >>>Would you be kind to file a kernel bug in bugzilla.redhat.com?
> > > >>>Summarize the information from this thread (e.g. your ifcfgs and in what
> > > >>>way does mode 4 doesn't work).
> > > >>>
> > > >>>To get the bug solved quickly we'd better find paying RHEL7 customer
> > > >>>subscribing to it. But I'll try to push from my direction.
> > > >>Stefano has been kind to open
> > > >>
> > > >>     Bug 1295423 - Unstable network link using bond mode = 4
> > > >>     https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> > > >>
> > > >>which we fail to reproduce on our own lab. I'd be pleased if anybody who
> > > >>experiences it, and their networking config to the bug (if it is
> > > >>different). Can you also lay out your switch's hardware and
> > > >>configuration?
> > > >Stefano, could you share your /proc/net/bonding/* files with us?
> > > >I heard about similar reports were the bond slaves had mismatching
> > > >aggregator id. Could it be your case as well?
> > > >
> > > 
> > > Here:
> > > 
> > > [root at ovirt01 ~]# cat /proc/net/bonding/bond0
> > > Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> > > 
> > > Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> > > Transmit Hash Policy: layer2 (0)
> > > MII Status: up
> > > MII Polling Interval (ms): 100
> > > Up Delay (ms): 0
> > > Down Delay (ms): 0
> > > 
> > > 802.3ad info
> > > LACP rate: slow
> > > Min links: 0
> > > Aggregator selection policy (ad_select): stable
> > > Active Aggregator Info:
> > >         Aggregator ID: 2
> > >         Number of ports: 1
> > >         Actor Key: 9
> > >         Partner Key: 1
> > >         Partner Mac Address: 00:00:00:00:00:00
> > > 
> > > Slave Interface: enp4s0
> > > MII Status: up
> > > Speed: 1000 Mbps
> > > Duplex: full
> > > Link Failure Count: 2
> > > Permanent HW addr: **:**:**:**:**:f1
> > > Slave queue ID: 0
> > > Aggregator ID: 1
> > 
> > ---------------^^^
> > 
> > 
> > > Actor Churn State: churned
> > > Partner Churn State: churned
> > > Actor Churned Count: 4
> > > Partner Churned Count: 5
> > > details actor lacp pdu:
> > >     system priority: 65535
> > >     port key: 9
> > >     port priority: 255
> > >     port number: 1
> > >     port state: 69
> > > details partner lacp pdu:
> > >     system priority: 65535
> > >     oper key: 1
> > >     port priority: 255
> > >     port number: 1
> > >     port state: 1
> > > 
> > > Slave Interface: enp5s0
> > > MII Status: up
> > > Speed: 1000 Mbps
> > > Duplex: full
> > > Link Failure Count: 1
> > > Permanent HW addr: **:**:**:**:**:f2
> > > Slave queue ID: 0
> > > Aggregator ID: 2
> > 
> > ---------------^^^
> > 
> > 
> > it sounds awfully familiar - mismatching aggregator IDs, and an all-zero
> > partner mac. Can you double-check that both your nics are wired to the
> > same switch, which is properly configured to use lacp on these two
> > ports?
> > 

-- 
Jarod Wilson
jarod at redhat.com




More information about the Users mailing list