[ovirt-users] Re: Need to enable STP on ovirt bridges

24 Aug 2019

      What is your bandwidth threshold for the network used for VM migration ?
Can you set a 90 mbit/s threshold (yes, less than 100mbit/s) and try to migrate a small (1 GB RAM) VM ?

Do you see disconnects ?

If no, try a little bit up (the threshold)  and check again.

Best Regards,
Strahil NikolovOn Aug 23, 2019 23:19, "Curtis E. Combs Jr." <ej.albany@gmail.com> wrote:
...
It took a while for my servers to come back on the network this time. 
I think it's due to ovirt continuing to try to migrate the VMs around 
like I requested. The 3 servers' names are "swm-01, swm-02 and 
swm-03". Eventually (about 2-3 minutes ago) they all came back online.
So I disabled and stopped the lldpad service.
Nope. Started some more migrations and swm-02 and swm-03 disappeared 
again. No ping, SSH hung, same as before - almost as soon as the 
migration started.
If you wall have any ideas what switch-level setting might be enabled, 
let me know, cause I'm stumped. I can add it to the ticket that's 
requesting the port configurations. I've already added the port 
numbers and switch name that I got from CDP.
Thanks again, I really appreciate the help! 
cecjr
On Fri, Aug 23, 2019 at 3:28 PM Dominik Holler <dholler@redhat.com> wrote:
...
On Fri, Aug 23, 2019 at 9:19 PM Dominik Holler <dholler@redhat.com> wrote:
...
On Fri, Aug 23, 2019 at 8:03 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
...
This little cluster isn't in production or anything like that yet.
So, I went ahead and used your ethtool commands to disable pause 
frames on both interfaces of each server. I then, chose a few VMs to 
migrate around at random.
swm-02 and swm-03 both went out again. Unreachable. Can't ping, can't 
ssh, and the SSH session that I had open was unresponsive.
Any other ideas?
Sorry, no. Looks like two different NICs with different drivers and frimware goes down together. 
This is a strong indication that the root cause is related to the switch. 
Maybe you can get some information about the switch config by 
'lldptool get-tlv -n -i em1'
Another guess: 
After the optional 'lldptool get-tlv -n -i em1' 
'systemctl stop lldpad' 
another try to migrate.
...
...
On Fri, Aug 23, 2019 at 1:50 PM Dominik Holler <dholler@redhat.com> wrote:
...
On Fri, Aug 23, 2019 at 6:45 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
...
Unfortunately, I can't check on the switch. Trust me, I've tried. 
These servers are in a Co-Lo and I've put 5 tickets in asking about 
the port configuration. They just get ignored - but that's par for the 
coarse for IT here. Only about 2 out of 10 of our tickets get any 
response and usually the response doesn't help. Then the system they 
use auto-closes the ticket. That was why I was suspecting STP before.
I can do ethtool. I do have root on these servers, though. Are you 
trying to get me to turn off link-speed auto-negotiation? Would you 
like me to try that?
It is just a suspicion, that the reason is pause frames. 
Let's start on a NIC which is not used for ovirtmgmt, I guess em1. 
Does 'ethtool -S em1  | grep pause' show something? 
Does 'ethtool em1 | grep pause' indicates support for pause? 
The current config is shown by 'ethtool -a em1'. 
'-A autoneg' "Specifies whether pause autonegotiation should be enabled." according to ethtool doc. 
Assuming flow control is enabled by default, I would try to  disable it via 
'ethtool -A em1 autoneg off rx off tx off' 
and check if it is applied via 
'ethtool -a em1' 
and check if the behavior under load changes.
...
On Fri, Aug 23, 2019 at 12:24 PM Dominik Holler <dholler@redhat.com> wrote: 
> 
> 
> 
> On Fri, Aug 23, 2019 at 5:49 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote: 
>> 
>> Sure! Right now, I only have a 5