I have only one switch so two interfaces are connected to the same
switch. The configuration in switch is corrected. I opened a ticket for switch Tech
support and the configuration was validated.
This configuration worked without problems h24 for one year! !!!! All problems started
after a kernel update.... so something was changed in kernel. ....
Jarod, do you have a clue why AggregatorIDs may be mismatching with
recent el7.2 kernels?
-------- Messaggio originale --------
Da: Dan Kenigsberg <danken(a)redhat.com>
Data: 04/02/2016 22:02 (GMT+01:00)
A: Stefano Danzi <s.danzi(a)hawai.it>, ydary(a)redhat.com
Cc: Jon Archer <jon(a)rosslug.org.uk>, mburman(a)redhat.com, users(a)ovirt.org
Oggetto: Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 ->
3.6.1
On Thu, Feb 04, 2016 at 06:26:14PM +0100, Stefano Danzi wrote:
>
>
> Il 04/02/2016 16.55, Dan Kenigsberg ha scritto:
> >On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:
> >>On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> >>>On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> >>>>I did some tests:
> >>>>
> >>>>kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work
(if I detach
> >>>>one network cable the network is stable)
> >>>>kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> >>>Would you be kind to file a kernel bug in bugzilla.redhat.com?
> >>>Summarize the information from this thread (e.g. your ifcfgs and in
what
> >>>way does mode 4 doesn't work).
> >>>
> >>>To get the bug solved quickly we'd better find paying RHEL7
customer
> >>>subscribing to it. But I'll try to push from my direction.
> >>Stefano has been kind to open
> >>
> >> Bug 1295423 - Unstable network link using bond mode = 4
> >>
https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> >>
> >>which we fail to reproduce on our own lab. I'd be pleased if anybody
who
> >>experiences it, and their networking config to the bug (if it is
> >>different). Can you also lay out your switch's hardware and
> >>configuration?
> >Stefano, could you share your /proc/net/bonding/* files with us?
> >I heard about similar reports were the bond slaves had mismatching
> >aggregator id. Could it be your case as well?
> >
>
> Here:
>
> [root@ovirt01 ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> 802.3ad info
> LACP rate: slow
> Min links: 0
> Aggregator selection policy (ad_select): stable
> Active Aggregator Info:
> Aggregator ID: 2
> Number of ports: 1
> Actor Key: 9
> Partner Key: 1
> Partner Mac Address: 00:00:00:00:00:00
>
> Slave Interface: enp4s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 2
> Permanent HW addr: **:**:**:**:**:f1
> Slave queue ID: 0
> Aggregator ID: 1
---------------^^^
> Actor Churn State: churned
> Partner Churn State: churned
> Actor Churned Count: 4
> Partner Churned Count: 5
> details actor lacp pdu:
> system priority: 65535
> port key: 9
> port priority: 255
> port number: 1
> port state: 69
> details partner lacp pdu:
> system priority: 65535
> oper key: 1
> port priority: 255
> port number: 1
> port state: 1
>
> Slave Interface: enp5s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 1
> Permanent HW addr: **:**:**:**:**:f2
> Slave queue ID: 0
> Aggregator ID: 2
---------------^^^
it sounds awfully familiar - mismatching aggregator IDs, and an all-zero
partner mac. Can you double-check that both your nics are wired to the
same switch, which is properly configured to use lacp on these two
ports?