Hi,
I've done a lot more testing today. I've narrowed the issues down to
two specific VMs. When I'm running either of these two VMs I get
horrific network performance. When both of those two are stopped, my
network is just fine (like 99% of the time).
I've been spending the day gathering packet dumps. I'm running
wireshark on my host listening to the ovirtmgmt bridge (which is my only
network). So, that SHOULD be capturing everything, right?
I have not noticed anything out of the ordinary except for one odd
thing -- corellated with my network wonkiness wireshark reports a bunch
of duplicate or out-of-order TCP packets! I'll just note that
corellation does not imply causation, but I'm not seeing anything else
out of the ordinary. I certainly don't see anything that would imply
I've been hacked.
Is there something with CentOS/ovirt-host/vdsm networking that could
cause this? Or could it be a router issue? Specifically my host and my
hosted-engine are on separate logical networks (different /24s) but
both networks are on the same physical wire; my router, an ERPro8, uses
a single interface with both /24s assigned and routes between them. But
some of the duplicate/out-of-order was for the periodic host <-> engine
health checks.
Still, I'm not sure why it's these two specific VMs that are causing my
issues, other than that they have the most amount of network traffic
coming/going. If it IS a router problem (the router is relatively new,
and also updated with the latest firmware), I'm honestly not sure how to
properly test that.
Any more ideas where I can look, or what I can/should be looking for?
I'm extremely comfortable with internet technologies (25+ years
experience) but this has got me stumpted!
Thanks,
-derek
Jason Keltz <jas(a)cse.yorku.ca> writes:
Derek,
Have you used tcpdump to check what network traffic is coming out of
your box? Is it possible that it is some kind of DoS attack from
outside in or that your VM was compromised and is attacking other
external hosts?
Hope you get to the bottom of it!
Jason.
Sent with AquaMail for Android
http://www.aqua-mail.com
On October 2, 2017 4:56:54 PM Derek Atkins <derek(a)ihtfp.com> wrote:
> Hi,
>
> I'm at my wits end so I'm tossing this here in the hopes that SOMEONE
> will be able to help me.
>
> tl;dr: Ovirt is doing something on my network that is causing my fiber
> modem to go from 3-5ms to 300-1000+ms round trip times. I know it's
> ovirt because when I unplug ovirt from my network the issue goes away;
> when I plug it back in, the issue recurs.
>
> Long version:
>
> I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months
> on a single host machine. Indeed, the host had an uptime of 200+ days
> and was working great until approximately midnight, September 21/22
> (just over a week ago). I was on an airplane halfway across the
> Atlantic at that time, so it wasn't anything I did.
>
> My network is configured as:
>
> fiber modem <-> edgerouter <-> switch <-> everything else
>
> ovirt is living in the "everything else" area.
>
> When I sit with a laptop connected to either the everything else range
> or even directly connected to the fiber modem, I run 'mtr' and see
> network times (starting at the fiber modem) that bounce all over the
> place. When I unplug ovirt I see consistent 3-5ms times. Plug it back
> in, voom, back up to badness.
>
> I've spent several hours plugging and unplugging different devices
> trying to isolate the issue. The only "device" that has any effect is
> my ovirt box.
>
> I have tried to debug this in several ways, but really the only thing
> that seems to have helped at all is shutting down all the VMs and the
> hosted engine. Once nothing else is running (but the host itself), only
> then does the network seem to return to normal.
>
> I'm really at my wits end on this; I have no idea what is causing this
> or what might have changed to cause the issue right at that time. I
> also can't imagine what ovirt is doing over the network that could cause
> the modem, two physical hops away, to lose its mind in this way. But my
> experiementation is definitely showing a direct correlation.
>
> Help!!
>
> -derek
>
> --
> Derek Atkins 617-623-3745
> derek(a)ihtfp.com
www.ihtfp.com
> Computer and Internet Security Consultant
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
--
Derek Atkins 617-623-3745
derek(a)ihtfp.com
www.ihtfp.com
Computer and Internet Security Consultant