interfaces of 2 of the hosts. My hosts names are swm-01 and swm-02.
Creating a small VM from a Cinder template and running it gave me a test VM.
it as "NonResponsive" soon after the VM finished. The VM did finish
migrating, however I'm unsure if that's a good migration or not.
Thank you, Strahil.
On Sat, Aug 24, 2019 at 12:39 PM Strahil <
hunter86_bg@yahoo.com> wrote:
>
> What is your bandwidth threshold for the network used for VM migration ?
> Can you set a 90 mbit/s threshold (yes, less than 100mbit/s) and try to migrate a small (1 GB RAM) VM ?
>
> Do you see disconnects ?
>
> If no, try a little bit up (the threshold) and check again.
>
> Best Regards,
> Strahil NikolovOn Aug 23, 2019 23:19, "Curtis E. Combs Jr." <
ej.albany@gmail.com> wrote:
> >
> > It took a while for my servers to come back on the network this time.
> > I think it's due to ovirt continuing to try to migrate the VMs around
> > like I requested. The 3 servers' names are "swm-01, swm-02 and
> > swm-03". Eventually (about 2-3 minutes ago) they all came back online.
> >
> > So I disabled and stopped the lldpad service.
> >
> > Nope. Started some more migrations and swm-02 and swm-03 disappeared
> > again. No ping, SSH hung, same as before - almost as soon as the
> > migration started.
> >
> > If you wall have any ideas what switch-level setting might be enabled,
> > let me know, cause I'm stumped. I can add it to the ticket that's
> > requesting the port configurations. I've already added the port
> > numbers and switch name that I got from CDP.
> >
> > Thanks again, I really appreciate the help!
> > cecjr
> >
> >
> >
> > On Fri, Aug 23, 2019 at 3:28 PM Dominik Holler <
dholler@redhat.com> wrote:
> > >
> > >
> > >
> > > On Fri, Aug 23, 2019 at 9:19 PM Dominik Holler <
dholler@redhat.com> wrote:
> > >>
> > >>
> > >>
> > >> On Fri, Aug 23, 2019 at 8:03 PM Curtis E. Combs Jr. <
ej.albany@gmail.com> wrote:
> > >>>
> > >>> This little cluster isn't in production or anything like that yet.
> > >>>
> > >>> So, I went ahead and used your ethtool commands to disable pause
> > >>> frames on both interfaces of each server. I then, chose a few VMs to
> > >>> migrate around at random.
> > >>>
> > >>> swm-02 and swm-03 both went out again. Unreachable. Can't ping, can't
> > >>> ssh, and the SSH session that I had open was unresponsive.
> > >>>
> > >>> Any other ideas?
> > >>>
> > >>
> > >> Sorry, no. Looks like two different NICs with different drivers and frimware goes down together.
> > >> This is a strong indication that the root cause is related to the switch.
> > >> Maybe you can get some information about the switch config by
> > >> 'lldptool get-tlv -n -i em1'
> > >>
> > >
> > > Another guess:
> > > After the optional 'lldptool get-tlv -n -i em1'
> > > 'systemctl stop lldpad'
> > > another try to migrate.
> > >
> > >
> > >>
> > >>
> > >>>
> > >>> On Fri, Aug 23, 2019 at 1:50 PM Dominik Holler <
dholler@redhat.com> wrote:
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Fri, Aug 23, 2019 at 6:45 PM Curtis E. Combs Jr. <
ej.albany@gmail.com> wrote:
> > >>> >>
> > >>> >> Unfortunately, I can't check on the switch. Trust me, I've tried.
> > >>> >> These servers are in a Co-Lo and I've put 5 tickets in asking about
> > >>> >> the port configuration. They just get ignored - but that's par for the
> > >>> >> coarse for IT here. Only about 2 out of 10 of our tickets get any
> > >>> >> response and usually the response doesn't help. Then the system they
> > >>> >> use auto-closes the ticket. That was why I was suspecting STP before.
> > >>> >>
> > >>> >> I can do ethtool. I do have root on these servers, though. Are you
> > >>> >> trying to get me to turn off link-speed auto-negotiation? Would you
> > >>> >> like me to try that?
> > >>> >>
> > >>> >
> > >>> > It is just a suspicion, that the reason is pause frames.
> > >>> > Let's start on a NIC which is not used for ovirtmgmt, I guess em1.
> > >>> > Does 'ethtool -S em1 | grep pause' show something?
> > >>> > Does 'ethtool em1 | grep pause' indicates support for pause?
> > >>> > The current config is shown by 'ethtool -a em1'.
> > >>> > '-A autoneg' "Specifies whether pause autonegotiation should be enabled." according to ethtool doc.
> > >>> > Assuming flow control is enabled by default, I would try to disable it via
> > >>> > 'ethtool -A em1 autoneg off rx off tx off'
> > >>> > and check if it is applied via
> > >>> > 'ethtool -a em1'
> > >>> > and check if the behavior under load changes.
> > >>> >
> > >>> >
> > >>> >
> > >>> >>
> > >>> >> On Fri, Aug 23, 2019 at 12:24 PM Dominik Holler <
dholler@redhat.com> wrote:
> > >>> >> >
> > >>> >> >
> > >>> >> >
> > >>> >> > On Fri, Aug 23, 2019 at 5:49 PM Curtis E. Combs Jr. <
ej.albany@gmail.com> wrote:
> > >>> >> >>
> > >>> >> >> Sure! Right now, I only have a 5