[ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

Dan Kenigsberg danken at redhat.com
Thu Feb 4 21:02:43 UTC 2016


On Thu, Feb 04, 2016 at 06:26:14PM +0100, Stefano Danzi wrote:
> 
> 
> Il 04/02/2016 16.55, Dan Kenigsberg ha scritto:
> >On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:
> >>On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> >>>On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> >>>>I did some tests:
> >>>>
> >>>>kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> >>>>one network cable the network is stable)
> >>>>kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> >>>Would you be kind to file a kernel bug in bugzilla.redhat.com?
> >>>Summarize the information from this thread (e.g. your ifcfgs and in what
> >>>way does mode 4 doesn't work).
> >>>
> >>>To get the bug solved quickly we'd better find paying RHEL7 customer
> >>>subscribing to it. But I'll try to push from my direction.
> >>Stefano has been kind to open
> >>
> >>     Bug 1295423 - Unstable network link using bond mode = 4
> >>     https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> >>
> >>which we fail to reproduce on our own lab. I'd be pleased if anybody who
> >>experiences it, and their networking config to the bug (if it is
> >>different). Can you also lay out your switch's hardware and
> >>configuration?
> >Stefano, could you share your /proc/net/bonding/* files with us?
> >I heard about similar reports were the bond slaves had mismatching
> >aggregator id. Could it be your case as well?
> >
> 
> Here:
> 
> [root at ovirt01 ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> 
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
> 
> 802.3ad info
> LACP rate: slow
> Min links: 0
> Aggregator selection policy (ad_select): stable
> Active Aggregator Info:
>         Aggregator ID: 2
>         Number of ports: 1
>         Actor Key: 9
>         Partner Key: 1
>         Partner Mac Address: 00:00:00:00:00:00
> 
> Slave Interface: enp4s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 2
> Permanent HW addr: **:**:**:**:**:f1
> Slave queue ID: 0
> Aggregator ID: 1

---------------^^^


> Actor Churn State: churned
> Partner Churn State: churned
> Actor Churned Count: 4
> Partner Churned Count: 5
> details actor lacp pdu:
>     system priority: 65535
>     port key: 9
>     port priority: 255
>     port number: 1
>     port state: 69
> details partner lacp pdu:
>     system priority: 65535
>     oper key: 1
>     port priority: 255
>     port number: 1
>     port state: 1
> 
> Slave Interface: enp5s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 1
> Permanent HW addr: **:**:**:**:**:f2
> Slave queue ID: 0
> Aggregator ID: 2

---------------^^^


it sounds awfully familiar - mismatching aggregator IDs, and an all-zero
partner mac. Can you double-check that both your nics are wired to the
same switch, which is properly configured to use lacp on these two
ports?




More information about the Users mailing list