What is your bandwidth threshold for the network used for VM migration ?
Can you set a 90 mbit/s threshold (yes, less than 100mbit/s) and try to migrate a small (1
GB RAM) VM ?
Do you see disconnects ?
If no, try a little bit up (the threshold) and check again.
Best Regards,
Strahil NikolovOn Aug 23, 2019 23:19, "Curtis E. Combs Jr."
<ej.albany(a)gmail.com> wrote:
>
> It took a while for my servers to come back on the network this time.
> I think it's due to ovirt continuing to try to migrate the VMs around
> like I requested. The 3 servers' names are "swm-01, swm-02 and
> swm-03". Eventually (about 2-3 minutes ago) they all came back online.
>
> So I disabled and stopped the lldpad service.
>
> Nope. Started some more migrations and swm-02 and swm-03 disappeared
> again. No ping, SSH hung, same as before - almost as soon as the
> migration started.
>
> If you wall have any ideas what switch-level setting might be enabled,
> let me know, cause I'm stumped. I can add it to the ticket that's
> requesting the port configurations. I've already added the port
> numbers and switch name that I got from CDP.
>
> Thanks again, I really appreciate the help!
> cecjr
>
>
>
> On Fri, Aug 23, 2019 at 3:28 PM Dominik Holler <dholler(a)redhat.com> wrote:
> >
> >
> >
> > On Fri, Aug 23, 2019 at 9:19 PM Dominik Holler <dholler(a)redhat.com> wrote:
> >>
> >>
> >>
> >> On Fri, Aug 23, 2019 at 8:03 PM Curtis E. Combs Jr.
<ej.albany(a)gmail.com> wrote:
> >>>
> >>> This little cluster isn't in production or anything like that yet.
> >>>
> >>> So, I went ahead and used your ethtool commands to disable pause
> >>> frames on both interfaces of each server. I then, chose a few VMs to
> >>> migrate around at random.
> >>>
> >>> swm-02 and swm-03 both went out again. Unreachable. Can't ping,
can't
> >>> ssh, and the SSH session that I had open was unresponsive.
> >>>
> >>> Any other ideas?
> >>>
> >>
> >> Sorry, no. Looks like two different NICs with different drivers and frimware
goes down together.
> >> This is a strong indication that the root cause is related to the switch.
> >> Maybe you can get some information about the switch config by
> >> 'lldptool get-tlv -n -i em1'
> >>
> >
> > Another guess:
> > After the optional 'lldptool get-tlv -n -i em1'
> > 'systemctl stop lldpad'
> > another try to migrate.
> >
> >
> >>
> >>
> >>>
> >>> On Fri, Aug 23, 2019 at 1:50 PM Dominik Holler
<dholler(a)redhat.com> wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Aug 23, 2019 at 6:45 PM Curtis E. Combs Jr.
<ej.albany(a)gmail.com> wrote:
> >>> >>
> >>> >> Unfortunately, I can't check on the switch. Trust me,
I've tried.
> >>> >> These servers are in a Co-Lo and I've put 5 tickets in
asking about
> >>> >> the port configuration. They just get ignored - but that's
par for the
> >>> >> coarse for IT here. Only about 2 out of 10 of our tickets get
any
> >>> >> response and usually the response doesn't help. Then the
system they
> >>> >> use auto-closes the ticket. That was why I was suspecting STP
before.
> >>> >>
> >>> >> I can do ethtool. I do have root on these servers, though. Are
you
> >>> >> trying to get me to turn off link-speed auto-negotiation? Would
you
> >>> >> like me to try that?
> >>> >>
> >>> >
> >>> > It is just a suspicion, that the reason is pause frames.
> >>> > Let's start on a NIC which is not used for ovirtmgmt, I guess
em1.
> >>> > Does 'ethtool -S em1 | grep pause' show something?
> >>> > Does 'ethtool em1 | grep pause' indicates support for
pause?
> >>> > The current config is shown by 'ethtool -a em1'.
> >>> > '-A autoneg' "Specifies whether pause autonegotiation
should be enabled." according to ethtool doc.
> >>> > Assuming flow control is enabled by default, I would try to
disable it via
> >>> > 'ethtool -A em1 autoneg off rx off tx off'
> >>> > and check if it is applied via
> >>> > 'ethtool -a em1'
> >>> > and check if the behavior under load changes.
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >> On Fri, Aug 23, 2019 at 12:24 PM Dominik Holler
<dholler(a)redhat.com> wrote:
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On Fri, Aug 23, 2019 at 5:49 PM Curtis E. Combs Jr.
<ej.albany(a)gmail.com> wrote:
> >>> >> >>
> >>> >> >> Sure! Right now, I only have a 5