Ovirt causing strange network issues?

Hi, I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me. tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs. Long version: I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did. My network is configured as: fiber modem <-> edgerouter <-> switch <-> everything else ovirt is living in the "everything else" area. When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness. I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box. I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal. I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation. Help!! -derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant

Hi We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors. HTH On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. ------------15ee1ee8ffd6c4a27ea4374613 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit I'm sorry. What is STP? And how do I turn that off? -derek Sent using my mobile device. Please excuse any typos. On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:
Hi
We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.
HTH
On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
------------15ee1ee8ffd6c4a27ea4374613 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: 8bit <html> <head> </head> <body> <div style="color: black;"> <div style="color: black;"> <p style="margin: 0 0 1em 0; color: black;">I'm sorry. What is STP? <br> And how do I turn that off? </p> <p style="margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color: black;"> <p style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 10pt 0;">On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div dir="ltr"><div dir="auto"></div><div class="gmail_extra">Hi</div><div class="gmail_extra"><br></div><div class="gmail_extra">We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.</div><div class="gmail_extra"><br></div><div class="gmail_extra">HTH</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 04:56, "Derek Atkins" <<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> I'm at my wits end so I'm tossing this here in the hopes that SOMEONE<br> will be able to help me.<br> <br> tl;dr: Ovirt is doing something on my network that is causing my fiber<br> modem to go from 3-5ms to 300-1000+ms round trip times. I know it's<br> ovirt because when I unplug ovirt from my network the issue goes away;<br> when I plug it back in, the issue recurs.<br> <br> Long version:<br> <br> I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months<br> on a single host machine. Indeed, the host had an uptime of 200+ days<br> and was working great until approximately midnight, September 21/22<br> (just over a week ago). I was on an airplane halfway across the<br> Atlantic at that time, so it wasn't anything I did.<br> <br> My network is configured as:<br> <br> fiber modem <-> edgerouter <-> switch <-> everything else<br> <br> ovirt is living in the "everything else" area.<br> <br> When I sit with a laptop connected to either the everything else range<br> or even directly connected to the fiber modem, I run 'mtr' and see<br> network times (starting at the fiber modem) that bounce all over the<br> place. When I unplug ovirt I see consistent 3-5ms times. Plug it back<br> in, voom, back up to badness.<br> <br> I've spent several hours plugging and unplugging different devices<br> trying to isolate the issue. The only "device" that has any effect is<br> my ovirt box.<br> <br> I have tried to debug this in several ways, but really the only thing<br> that seems to have helped at all is shutting down all the VMs and the<br> hosted engine. Once nothing else is running (but the host itself), only<br> then does the network seem to return to normal.<br> <br> I'm really at my wits end on this; I have no idea what is causing this<br> or what might have changed to cause the issue right at that time. I<br> also can't imagine what ovirt is doing over the network that could cause<br> the modem, two physical hops away, to lose its mind in this way. But my<br> experiementation is definitely showing a direct correlation.<br> <br> Help!!<br> <br> -derek<br> <br> --<br> Derek Atkins <a href="tel:(617)%20623-3745" value="+16176233745" target="_blank">617-623-3745</a><br> <a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a> <a href="http://www.ihtfp.com" rel="noreferrer" target="_blank">www.ihtfp.com</a><br> Computer and Internet Security Consultant<br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote></div></div> </div> </blockquote> </div> </div> </body> </html> ------------15ee1ee8ffd6c4a27ea4374613--

This is a multi-part message in MIME format. ------------15ee1f115816c4a27ea5c0165e Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit Do you mean spanning tree protocol? I'm not sure how that could cross a router boundary, but it is something to look into.. -derek Sent using my mobile device. Please excuse any typos. On October 3, 2017 7:12:00 AM Derek Atkins <derek@ihtfp.com> wrote:
I'm sorry. What is STP? And how do I turn that off?
-derek Sent using my mobile device. Please excuse any typos.
On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:
Hi
We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.
HTH
On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
---------- _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
------------15ee1f115816c4a27ea5c0165e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: 8bit <html> <head> </head> <body> <div style="color: black;"> <div style="color: black;"> <p style="margin: 0 0 1em 0; color: black;">Do you mean spanning tree protocol? <br> I'm not sure how that could cross a router boundary, but it is something to look into.. </p> <p style="margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color: black;"> <p style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 10pt 0;">On October 3, 2017 7:12:00 AM Derek Atkins <derek@ihtfp.com> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div style="color: black;"> <div style="color: black;"> <p style="margin: 0 0 1em 0; color: black;">I'm sorry. What is STP? <br> And how do I turn that off? </p> <p style="margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color: black;"> <p style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 10pt 0;">On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div dir="ltr"><div dir="auto"></div><div class="gmail_extra">Hi</div><div class="gmail_extra"><br></div><div class="gmail_extra">We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.</div><div class="gmail_extra"><br></div><div class="gmail_extra">HTH</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 04:56, "Derek Atkins" <<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> I'm at my wits end so I'm tossing this here in the hopes that SOMEONE<br> will be able to help me.<br> <br> tl;dr: Ovirt is doing something on my network that is causing my fiber<br> modem to go from 3-5ms to 300-1000+ms round trip times. I know it's<br> ovirt because when I unplug ovirt from my network the issue goes away;<br> when I plug it back in, the issue recurs.<br> <br> Long version:<br> <br> I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months<br> on a single host machine. Indeed, the host had an uptime of 200+ days<br> and was working great until approximately midnight, September 21/22<br> (just over a week ago). I was on an airplane halfway across the<br> Atlantic at that time, so it wasn't anything I did.<br> <br> My network is configured as:<br> <br> fiber modem <-> edgerouter <-> switch <-> everything else<br> <br> ovirt is living in the "everything else" area.<br> <br> When I sit with a laptop connected to either the everything else range<br> or even directly connected to the fiber modem, I run 'mtr' and see<br> network times (starting at the fiber modem) that bounce all over the<br> place. When I unplug ovirt I see consistent 3-5ms times. Plug it back<br> in, voom, back up to badness.<br> <br> I've spent several hours plugging and unplugging different devices<br> trying to isolate the issue. The only "device" that has any effect is<br> my ovirt box.<br> <br> I have tried to debug this in several ways, but really the only thing<br> that seems to have helped at all is shutting down all the VMs and the<br> hosted engine. Once nothing else is running (but the host itself), only<br> then does the network seem to return to normal.<br> <br> I'm really at my wits end on this; I have no idea what is causing this<br> or what might have changed to cause the issue right at that time. I<br> also can't imagine what ovirt is doing over the network that could cause<br> the modem, two physical hops away, to lose its mind in this way. But my<br> experiementation is definitely showing a direct correlation.<br> <br> Help!!<br> <br> -derek<br> <br> --<br> Derek Atkins <a href="tel:(617)%20623-3745" value="+16176233745" target="_blank">617-623-3745</a><br> <a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a> <a href="http://www.ihtfp.com" rel="noreferrer" target="_blank">www.ihtfp.com</a><br> Computer and Internet Security Consultant<br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote></div></div> </div> </blockquote> </div> </div> _______________________________________________<br> Users mailing list<br> <a class="aqm-autolink aqm-autowrap" href="mailto:Users%40ovirt.org">Users@ovirt.org</a><br> <a class="aqm-autolink aqm-autowrap" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br></blockquote> </div> </div> </body> </html> ------------15ee1f115816c4a27ea5c0165e--

Spanning Tree Protocol. Make sure the /etc/sysconfig/network-scripts/ifcfg-eth0 (or whatever) does not have an STP=yes line. CC On 3 Oct. 2017 19:11, "Derek Atkins" <derek@ihtfp.com> wrote:
I'm sorry. What is STP? And how do I turn that off?
-derek Sent using my mobile device. Please excuse any typos.
On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:
Hi
We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.
HTH
On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. ------------15ee1f64bf56c4a27ea22b7738 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit On the host or in the guests? -derek Sent using my mobile device. Please excuse any typos. On October 3, 2017 7:15:35 AM Colin Coe <colin.coe@gmail.com> wrote:
Spanning Tree Protocol.
Make sure the /etc/sysconfig/network-scripts/ifcfg-eth0 (or whatever) does not have an STP=yes line.
CC
On 3 Oct. 2017 19:11, "Derek Atkins" <derek@ihtfp.com> wrote:
I'm sorry. What is STP? And how do I turn that off?
-derek Sent using my mobile device. Please excuse any typos.
On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:
Hi
We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.
HTH
On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
------------15ee1f64bf56c4a27ea22b7738 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: 8bit <html> <head> </head> <body> <div style="color: black;"> <div style="color: black;"> <p style="margin: 0 0 1em 0; color: black;">On the host or in the guests? </p> <p style="margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color: black;"> <p style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 10pt 0;">On October 3, 2017 7:15:35 AM Colin Coe <colin.coe@gmail.com> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div dir="auto">Spanning Tree Protocol.<div dir="auto"><br></div><div dir="auto">Make sure the /etc/sysconfig/network-scripts/ifcfg-eth0 (or whatever) does not have an STP=yes line.</div><div dir="auto"><br></div><div dir="auto">CC</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 19:11, "Derek Atkins" <<a href="mailto:derek@ihtfp.com">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div> <div style="color:black"> <div style="color:black"> <p style="margin:0 0 1em 0;color:black">I'm sorry. What is STP? <br> And how do I turn that off? </p> <p style="margin:0 0 1em 0;color:black">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color:black"> <p style="color:black;font-size:10pt;font-family:Arial,sans-serif;margin:10pt 0">On October 2, 2017 7:41:15 PM Colin Coe <<a href="mailto:colin.coe@gmail.com" target="_blank">colin.coe@gmail.com</a>> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin:0 0 0 0.75ex;border-left:1px solid #808080;padding-left:0.75ex"> <div dir="ltr"><div dir="auto"></div><div class="gmail_extra">Hi</div><div class="gmail_extra"><br></div><div class="gmail_extra">We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.</div><div class="gmail_extra"><br></div><div class="gmail_extra">HTH</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 04:56, "Derek Atkins" <<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> I'm at my wits end so I'm tossing this here in the hopes that SOMEONE<br> will be able to help me.<br> <br> tl;dr: Ovirt is doing something on my network that is causing my fiber<br> modem to go from 3-5ms to 300-1000+ms round trip times. I know it's<br> ovirt because when I unplug ovirt from my network the issue goes away;<br> when I plug it back in, the issue recurs.<br> <br> Long version:<br> <br> I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months<br> on a single host machine. Indeed, the host had an uptime of 200+ days<br> and was working great until approximately midnight, September 21/22<br> (just over a week ago). I was on an airplane halfway across the<br> Atlantic at that time, so it wasn't anything I did.<br> <br> My network is configured as:<br> <br> fiber modem <-> edgerouter <-> switch <-> everything else<br> <br> ovirt is living in the "everything else" area.<br> <br> When I sit with a laptop connected to either the everything else range<br> or even directly connected to the fiber modem, I run 'mtr' and see<br> network times (starting at the fiber modem) that bounce all over the<br> place. When I unplug ovirt I see consistent 3-5ms times. Plug it back<br> in, voom, back up to badness.<br> <br> I've spent several hours plugging and unplugging different devices<br> trying to isolate the issue. The only "device" that has any effect is<br> my ovirt box.<br> <br> I have tried to debug this in several ways, but really the only thing<br> that seems to have helped at all is shutting down all the VMs and the<br> hosted engine. Once nothing else is running (but the host itself), only<br> then does the network seem to return to normal.<br> <br> I'm really at my wits end on this; I have no idea what is causing this<br> or what might have changed to cause the issue right at that time. I<br> also can't imagine what ovirt is doing over the network that could cause<br> the modem, two physical hops away, to lose its mind in this way. But my<br> experiementation is definitely showing a direct correlation.<br> <br> Help!!<br> <br> -derek<br> <br> --<br> Derek Atkins <a href="tel:(617)%20623-3745" value="+16176233745" target="_blank">617-623-3745</a><br> <a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a> <a href="http://www.ihtfp.com" rel="noreferrer" target="_blank">www.ihtfp.com</a><br> Computer and Internet Security Consultant<br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote></div></div> </div> </blockquote> </div> </div> </div> </blockquote></div></div> </blockquote> </div> </div> </body> </html> ------------15ee1f64bf56c4a27ea22b7738--

This is a multi-part message in MIME format. ------------15ee20900c06c4a27ea1be6ae3 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit A quick check of the host shows STP=off in ifcfg-ovirtmgmt. I see nothing about STP elsewhere in the configuration on the host. -derek Sent using my mobile device. Please excuse any typos. On October 3, 2017 7:15:35 AM Colin Coe <colin.coe@gmail.com> wrote:
Spanning Tree Protocol.
Make sure the /etc/sysconfig/network-scripts/ifcfg-eth0 (or whatever) does not have an STP=yes line.
CC
On 3 Oct. 2017 19:11, "Derek Atkins" <derek@ihtfp.com> wrote:
I'm sorry. What is STP? And how do I turn that off?
-derek Sent using my mobile device. Please excuse any typos.
On October 2, 2017 7:41:15 PM Colin Coe <colin.coe@gmail.com> wrote:
Hi
We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.
HTH
On 3 Oct. 2017 04:56, "Derek Atkins" <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 <(617)%20623-3745> derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
------------15ee20900c06c4a27ea1be6ae3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: 8bit <html> <head> </head> <body> <div style="color: black;"> <div style="color: black;"> <p style="margin: 0 0 1em 0; color: black;">A quick check of the host shows STP=off in ifcfg-ovirtmgmt. I see nothing about STP elsewhere in the configuration on the host. </p> <p style="margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color: black;"> <p style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 10pt 0;">On October 3, 2017 7:15:35 AM Colin Coe <colin.coe@gmail.com> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div dir="auto">Spanning Tree Protocol.<div dir="auto"><br></div><div dir="auto">Make sure the /etc/sysconfig/network-scripts/ifcfg-eth0 (or whatever) does not have an STP=yes line.</div><div dir="auto"><br></div><div dir="auto">CC</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 19:11, "Derek Atkins" <<a href="mailto:derek@ihtfp.com">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div> <div style="color:black"> <div style="color:black"> <p style="margin:0 0 1em 0;color:black">I'm sorry. What is STP? <br> And how do I turn that off? </p> <p style="margin:0 0 1em 0;color:black">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style="color:black"> <p style="color:black;font-size:10pt;font-family:Arial,sans-serif;margin:10pt 0">On October 2, 2017 7:41:15 PM Colin Coe <<a href="mailto:colin.coe@gmail.com" target="_blank">colin.coe@gmail.com</a>> wrote:</p> <blockquote type="cite" class="gmail_quote" style="margin:0 0 0 0.75ex;border-left:1px solid #808080;padding-left:0.75ex"> <div dir="ltr"><div dir="auto"></div><div class="gmail_extra">Hi</div><div class="gmail_extra"><br></div><div class="gmail_extra">We saw something very similar to this a couple of years ago. In our case, it was caused by STP being enabled on our hypervisors.</div><div class="gmail_extra"><br></div><div class="gmail_extra">HTH</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 Oct. 2017 04:56, "Derek Atkins" <<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> I'm at my wits end so I'm tossing this here in the hopes that SOMEONE<br> will be able to help me.<br> <br> tl;dr: Ovirt is doing something on my network that is causing my fiber<br> modem to go from 3-5ms to 300-1000+ms round trip times. I know it's<br> ovirt because when I unplug ovirt from my network the issue goes away;<br> when I plug it back in, the issue recurs.<br> <br> Long version:<br> <br> I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months<br> on a single host machine. Indeed, the host had an uptime of 200+ days<br> and was working great until approximately midnight, September 21/22<br> (just over a week ago). I was on an airplane halfway across the<br> Atlantic at that time, so it wasn't anything I did.<br> <br> My network is configured as:<br> <br> fiber modem <-> edgerouter <-> switch <-> everything else<br> <br> ovirt is living in the "everything else" area.<br> <br> When I sit with a laptop connected to either the everything else range<br> or even directly connected to the fiber modem, I run 'mtr' and see<br> network times (starting at the fiber modem) that bounce all over the<br> place. When I unplug ovirt I see consistent 3-5ms times. Plug it back<br> in, voom, back up to badness.<br> <br> I've spent several hours plugging and unplugging different devices<br> trying to isolate the issue. The only "device" that has any effect is<br> my ovirt box.<br> <br> I have tried to debug this in several ways, but really the only thing<br> that seems to have helped at all is shutting down all the VMs and the<br> hosted engine. Once nothing else is running (but the host itself), only<br> then does the network seem to return to normal.<br> <br> I'm really at my wits end on this; I have no idea what is causing this<br> or what might have changed to cause the issue right at that time. I<br> also can't imagine what ovirt is doing over the network that could cause<br> the modem, two physical hops away, to lose its mind in this way. But my<br> experiementation is definitely showing a direct correlation.<br> <br> Help!!<br> <br> -derek<br> <br> --<br> Derek Atkins <a href="tel:(617)%20623-3745" value="+16176233745" target="_blank">617-623-3745</a><br> <a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a> <a href="http://www.ihtfp.com" rel="noreferrer" target="_blank">www.ihtfp.com</a><br> Computer and Internet Security Consultant<br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote></div></div> </div> </blockquote> </div> </div> </div> </blockquote></div></div> </blockquote> </div> </div> </body> </html> ------------15ee20900c06c4a27ea1be6ae3--

Derek, Have you used tcpdump to check what network traffic is coming out of your box? Is it possible that it is some kind of DoS attack from outside in or that your VM was compromised and is attacking other external hosts? Hope you get to the bottom of it! Jason. Sent with AquaMail for Android http://www.aqua-mail.com On October 2, 2017 4:56:54 PM Derek Atkins <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, Yes, I have (well, wireshark, but effectively the same thing). Nothing is standing out. I'm trying to visually coordinate the wireshark traces with my mtr run to try to see "what's going on when my RTTs skyrocket". Honestly the only correlation I'm seeing is that it's when the ovirt host is checking the ovirt engine health (and I get a bunch of TCP out of order messages). I've already ruled out overflow of my Arris modem NAT/forwarding table. I've already ruled out Ethernet Pause Frames. I don't understand how something inside my network can affect the Arris in such a profound way across both the switch and router. -derek On Tue, October 3, 2017 7:38 am, Jason Keltz wrote:
Derek, Have you used tcpdump to check what network traffic is coming out of your box? Is it possible that it is some kind of DoS attack from outside in or that your VM was compromised and is attacking other external hosts?
Hope you get to the bottom of it! Jason.
Sent with AquaMail for Android http://www.aqua-mail.com
On October 2, 2017 4:56:54 PM Derek Atkins <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant

Hi, I've done a lot more testing today. I've narrowed the issues down to two specific VMs. When I'm running either of these two VMs I get horrific network performance. When both of those two are stopped, my network is just fine (like 99% of the time). I've been spending the day gathering packet dumps. I'm running wireshark on my host listening to the ovirtmgmt bridge (which is my only network). So, that SHOULD be capturing everything, right? I have not noticed anything out of the ordinary except for one odd thing -- corellated with my network wonkiness wireshark reports a bunch of duplicate or out-of-order TCP packets! I'll just note that corellation does not imply causation, but I'm not seeing anything else out of the ordinary. I certainly don't see anything that would imply I've been hacked. Is there something with CentOS/ovirt-host/vdsm networking that could cause this? Or could it be a router issue? Specifically my host and my hosted-engine are on separate logical networks (different /24s) but both networks are on the same physical wire; my router, an ERPro8, uses a single interface with both /24s assigned and routes between them. But some of the duplicate/out-of-order was for the periodic host <-> engine health checks. Still, I'm not sure why it's these two specific VMs that are causing my issues, other than that they have the most amount of network traffic coming/going. If it IS a router problem (the router is relatively new, and also updated with the latest firmware), I'm honestly not sure how to properly test that. Any more ideas where I can look, or what I can/should be looking for? I'm extremely comfortable with internet technologies (25+ years experience) but this has got me stumpted! Thanks, -derek Jason Keltz <jas@cse.yorku.ca> writes:
Derek, Have you used tcpdump to check what network traffic is coming out of your box? Is it possible that it is some kind of DoS attack from outside in or that your VM was compromised and is attacking other external hosts?
Hope you get to the bottom of it! Jason.
Sent with AquaMail for Android http://www.aqua-mail.com
On October 2, 2017 4:56:54 PM Derek Atkins <derek@ihtfp.com> wrote:
Hi,
I'm at my wits end so I'm tossing this here in the hopes that SOMEONE will be able to help me.
tl;dr: Ovirt is doing something on my network that is causing my fiber modem to go from 3-5ms to 300-1000+ms round trip times. I know it's ovirt because when I unplug ovirt from my network the issue goes away; when I plug it back in, the issue recurs.
Long version:
I've been running Ovirt 4.0.6 happily on CentOS 7.3 for several months on a single host machine. Indeed, the host had an uptime of 200+ days and was working great until approximately midnight, September 21/22 (just over a week ago). I was on an airplane halfway across the Atlantic at that time, so it wasn't anything I did.
My network is configured as:
fiber modem <-> edgerouter <-> switch <-> everything else
ovirt is living in the "everything else" area.
When I sit with a laptop connected to either the everything else range or even directly connected to the fiber modem, I run 'mtr' and see network times (starting at the fiber modem) that bounce all over the place. When I unplug ovirt I see consistent 3-5ms times. Plug it back in, voom, back up to badness.
I've spent several hours plugging and unplugging different devices trying to isolate the issue. The only "device" that has any effect is my ovirt box.
I have tried to debug this in several ways, but really the only thing that seems to have helped at all is shutting down all the VMs and the hosted engine. Once nothing else is running (but the host itself), only then does the network seem to return to normal.
I'm really at my wits end on this; I have no idea what is causing this or what might have changed to cause the issue right at that time. I also can't imagine what ovirt is doing over the network that could cause the modem, two physical hops away, to lose its mind in this way. But my experiementation is definitely showing a direct correlation.
Help!!
-derek
-- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant
participants (3)
-
Colin Coe
-
Derek Atkins
-
Jason Keltz