[ovirt-users] R: R: R: R: R: R: R: R: PXE boot of a VM on vdsm don't read DHCP offer

NUNIN Roberto Roberto.Nunin at comifar.it
Thu Aug 20 19:51:40 UTC 2015


> -----Messaggio originale-----
> Da: Michael S. Tsirkin [mailto:mst at redhat.com]
> Inviato: mercoledì 29 luglio 2015 12:03
> A: NUNIN Roberto
> Cc: Fabian Deutsch; users at ovirt.org
> Oggetto: Re: R: [ovirt-users] R: R: R: R: R: R: PXE boot of a VM on vdsm don't
> read DHCP offer
>
> On Wed, Jul 29, 2015 at 12:00:38PM +0200, NUNIN Roberto wrote:
> >
> > > -----Messaggio originale-----
> > > Da: users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] Per conto
> di
> > > Michael S. Tsirkin
> > > Inviato: giovedì 9 luglio 2015 15:15
> > > A: Fabian Deutsch
> > > Cc: users at ovirt.org
> > > Oggetto: Re: [ovirt-users] R: R: R: R: R: R: PXE boot of a VM on vdsm don't
> read
> > > DHCP offer
> > >
> > > On Thu, Jul 09, 2015 at 08:57:50AM -0400, Fabian Deutsch wrote:
> > > > ----- Original Message -----
> > > > > On Wed, Jul 08, 2015 at 09:11:42AM +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 07, 2015 at 05:13:28PM +0100, Dan Kenigsberg wrote:
> > > > > > > On Tue, Jul 07, 2015 at 10:14:54AM +0200, NUNIN Roberto wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jul 06, 2015 at 10:33:59AM +0200, NUNIN Roberto
> wrote:
> > > > > > > > > > Hi Dan
> > > > > > > > > >
> > > > > > > > > > Sorry for question: what do you mean for interface vnetxxxx ?
> > > > > > > > > > Currently our path is :
> > > > > > > > > > eno1 - eno2  ---- bond0 ----- bond.3500 (VLAN) ------ bridge -----
> > > > > > > > > > vm.
> > > > > > > > > >
> > > > > > > > > > Which one of these ?
> > > > > > > > > > Moreover, reading Fabian statements about bonding limits,
> > > today I
> > > > > > > > > > can try
> > > > > > > > > to switch to a config without bonding.
> > > > > > > > >
> > > > > > > > > "vm" is a complicated term.
> > > > > > > > >
> > > > > > > > > `brctl show` would not show you a "vm" connected to a bridge.
> > > When
> > > > > > > > > you
> > > > > > > > > WOULD see is a vnet888 tap device. The "other side" of this
> device
> > > is
> > > > > > > > > held by qemu, which implement the VM.
> > > > > > > >
> > > > > > > > Ok, understood and found it, vnet2
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm asking if the dhcp offer has reached that tap device.
> > > > > > > >
> > > > > > > > No, the DHCP offer packet do not reach the vnet2 interface, I can
> see
> > > > > > > > only DHCP DISCOVER.
> > > > > > >
> > > > > > > Ok, so it seems that we have a problem in the host bridging.
> > > > > > >
> > > > > > > Is it the latest kernel-3.10.0-229.7.2.el7.x86_64 ?
> > > > > > >
> > > > > > > Michael, a DHCP DISCOVER is sent out of a just-booted guest, and
> > > OFFER
> > > > > > > returns to the bridge, but is not propagated to the tap device.
> > > > > > > Can you suggest how to debug this further?
> > > > > >
> > > > > > Dump packets including the ethernet headers.
> > > > > > Likely something interfered with them so the eth address is wrong.
> > > > > >
> > > > > > Since bonding does this sometimes, this is the most likely culprit.
> > > > >
> > > > > We've ruled this out already - Roberto reproduces the issue without a
> > > > > bond.
> > > >
> > > > To me this looks like either a regression in the host side bridging. But
> otoh it
> > > doesn't look
> > > > like it's happening always, because otherwise I'd expect more noise
> around
> > > this issue.
> > > >
> > > > - fabian
> > >
> > > Hard to say. E.g. forwarding delay would do this for a while.
> > > If eth address of the packets is okay, poke at the fbd, maybe there's
> > > something wrong there. Maybe stp is detecting a loop - try checking that.
> >
> > Someone is checking this ?
> > In tested config SPT was off.
>
> Then maybe you have a loop :)

No, it is not a loop.

I've done further tests today and finally I've defined the following conditions.
Erratic behavior is detected only within a cluster where nodes are HP Proliant BL660cGen8, connected to Cisco Nexus 7K thru HP FEX B22 blade interconnects and Cisco Nexus 5596 switches. All nic cards are 10Gbit.

It doesn't happen with two HP Proliant DL380G5 with 10Gbit nics, connected directly to Cisco Nexus 5548UP switches and not happen with two HP Proliant ML350eGen8 nic 1Gbit connected to Cisco 4948 and next the same Nexus 5548UP.

All nodes are running Centos 7.1 with latest updates and all networks are configured in the same mode, with bonding over two nic, then vlan interfaces and bridge towards VMs. Bonding is 4 for all and works correctly with  DL380 and ML350 clusters.

Well, I've tried to change the bonding mode on the BL660 cluster to mode 1 and the issue disappear.
In all other bonding modes, it doesn't work; bridge interfaces receive DHCP offers and do NOT reject packets, but tap interfaces aren't receiving the offer. It works only with mode 1.

How I can investigate further ? Desiderata is to have mode 4, to aggregate available bandwidth.

RN

>
> > RN
> > >
> > > --
> > > MST
> > > _______________________________________________
> > > Users mailing list
> > > Users at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/users
> >
> > Questo messaggio e' indirizzato esclusivamente al destinatario indicato e
> potrebbe contenere informazioni confidenziali, riservate o proprietarie.
> Qualora la presente venisse ricevuta per errore, si prega di segnalarlo
> immediatamente al mittente, cancellando l'originale e ogni sua copia e
> distruggendo eventuali copie cartacee. Ogni altro uso e' strettamente proibito
> e potrebbe essere fonte di violazione di legge.
> >
> > This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have received
> it in error, please notify the sender immediately, deleting the original and all
> copies and destroying any hard copies. Any other use is strictly prohibited
> and may be unlawful.

Questo messaggio e' indirizzato esclusivamente al destinatario indicato e potrebbe contenere informazioni confidenziali, riservate o proprietarie. Qualora la presente venisse ricevuta per errore, si prega di segnalarlo immediatamente al mittente, cancellando l'originale e ogni sua copia e distruggendo eventuali copie cartacee. Ogni altro uso e' strettamente proibito e potrebbe essere fonte di violazione di legge.

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately, deleting the original and all copies and destroying any hard copies. Any other use is strictly prohibited and may be unlawful.



More information about the Users mailing list