Sorry to dead bump this, but I'm beginning to suspect that maybe it's
not STP that's the problem.
2 of my hosts just went down when a few VMs tried to migrate.
Do any of you have any idea what might be going on here? I don't even
know where to start. I'm going to include the dmesg in case it helps.
This happens on both of the hosts whenever any migration attempts to start.
[68099.245833] bnx2 0000:01:00.0 em1: NIC Copper Link is Down
[68099.246055] internal: port 1(em1) entered disabled state
[68184.177343] ixgbe 0000:03:00.0 p1p1: NIC Link is Down
[68184.177789] ovirtmgmt: port 1(p1p1) entered disabled state
[68184.177856] ovirtmgmt: topology change detected, propagating
[68277.078671] INFO: task qemu-kvm:8888 blocked for more than 120 seconds.
[68277.078700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[68277.078723] qemu-kvm D ffff9db40c359040 0 8888 1 0x000001a0
[68277.078727] Call Trace:
[68277.078738] [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0
[68277.078743] [<ffffffff97d69f19>] schedule+0x29/0x70
[68277.078746] [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100
[68277.078751] [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40
[68277.078765] [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs]
[68277.078768] [<ffffffff97848109>] vfs_getattr+0x49/0x80
[68277.078769] [<ffffffff97848185>] vfs_fstat+0x45/0x80
[68277.078771] [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60
[68277.078774] [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146
[68277.078778] [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110
[68277.078782] [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220
[68277.078784] [<ffffffff97848ace>] SyS_newfstat+0xe/0x10
[68277.078786] [<ffffffff97d7706b>] tracesys+0xa3/0xc9
[68397.072384] INFO: task qemu-kvm:8888 blocked for more than 120 seconds.
[68397.072413] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[68397.072436] qemu-kvm D ffff9db40c359040 0 8888 1 0x000001a0
[68397.072439] Call Trace:
[68397.072453] [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0
[68397.072458] [<ffffffff97d69f19>] schedule+0x29/0x70
[68397.072462] [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100
[68397.072467] [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40
[68397.072480] [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs]
[68397.072485] [<ffffffff97848109>] vfs_getattr+0x49/0x80
[68397.072486] [<ffffffff97848185>] vfs_fstat+0x45/0x80
[68397.072488] [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60
[68397.072491] [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146
[68397.072495] [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110
[68397.072498] [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220
[68397.072500] [<ffffffff97848ace>] SyS_newfstat+0xe/0x10
[68397.072502] [<ffffffff97d7706b>] tracesys+0xa3/0xc9
[68401.573141] bnx2 0000:01:00.0 em1: NIC Copper Link is Up, 1000 Mbps
full duplex
[68401.573247] internal: port 1(em1) entered blocking state
[68401.573255] internal: port 1(em1) entered listening state
[68403.576985] internal: port 1(em1) entered learning state
[68405.580907] internal: port 1(em1) entered forwarding state
[68405.580916] internal: topology change detected, propagating
[68469.565589] nfs: server swm-01.hpc.moffitt.org not responding, timed out
[68469.565840] nfs: server swm-01.hpc.moffitt.org not responding, timed out
[68487.193932] ixgbe 0000:03:00.0 p1p1: NIC Link is Up 10 Gbps, Flow
Control: RX/TX
[68487.194105] ovirtmgmt: port 1(p1p1) entered blocking state
[68487.194114] ovirtmgmt: port 1(p1p1) entered listening state
[68489.196508] ovirtmgmt: port 1(p1p1) entered learning state
[68491.200400] ovirtmgmt: port 1(p1p1) entered forwarding state
[68491.200405] ovirtmgmt: topology change detected, sending tcn bpdu
[68493.672423] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[68494.777996] NFSD: client 10.15.28.22 testing state ID with
incorrect client ID
[68494.778580] NFSD: client 10.15.28.22 testing state ID with
incorrect client ID
On Thu, Aug 22, 2019 at 2:53 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
>
> Thanks, I'm just going to revert back to bridges.
>
> On Thu, Aug 22, 2019 at 11:50 AM Dominik Holler <dholler@redhat.com> wrote:
> >
> >
> >
> > On Thu, Aug 22, 2019 at 3:06 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
> >>
> >> Seems like the STP options are so common and necessary that it would
> >> be a priority over seldom-used bridge_opts. I know what STP is and I'm
> >> not even a networking guy - never even heard of half of the
> >> bridge_opts that have switches in the UI.
> >>
> >> Anyway. I wanted to try the openvswitches, so I reinstalled all of my
> >> nodes and used "openvswitch (Technology Preview)" as the engine-setup
> >> option for the first host. I made a new Cluster for my nodes, added
> >> them all to the new cluster, created a new "logical network" for the
> >> internal network and attached it to the internal network ports.
> >>
> >> Now, when I go to create a new VM, I don't even have either the
> >> ovirtmgmt switch OR the internal switch as an option. The drop-down is
> >> empy as if I don't have any vnic-profiles.
> >>
> >
> > openvswitch clusters are limited to ovn networks.
> > You can create one like described in
> > https://www.ovirt.org/documentation/admin-guide/chap-External_Providers.html#connecting-an-ovn-network-to-a-physical-network
> >
> >
> >>
> >> On Thu, Aug 22, 2019 at 7:34 AM Tony Pearce <tonyppe@gmail.com> wrote:
> >> >
> >> > Hi Dominik, would you mind sharing the use case for stp via API Only? I am keen to know this.
> >> > Thanks
> >> >
> >> >
> >> > On Thu., 22 Aug. 2019, 19:24 Dominik Holler, <dholler@redhat.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Aug 22, 2019 at 1:08 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote:
> >> >>>
> >> >>> On Sat, Aug 17, 2019 at 11:27 AM <ej.albany@gmail.com> wrote:
> >> >>> >
> >> >>> > Hello. I have been trying to figure out an issue for a very long time.
> >> >>> > That issue relates to the ethernet and 10gb fc links that I have on my
> >> >>> > cluster being disabled any time a migration occurs.
> >> >>> >
> >> >>> > I believe this is because I need to have STP turned on in order to
> >> >>> > participate with the switch. However, there does not seem to be any
> >> >>> > way to tell oVirt to stop turning it off! Very frustrating.
> >> >>> >
> >> >>> > After entering a cronjob that enables stp on all bridges every 1
> >> >>> > minute, the migration issue disappears....
> >> >>> >
> >> >>> > Is there any way at all to do without this cronjob and set STP to be
> >> >>> > ON without having to resort to such a silly solution?
> >> >>>
> >> >>> Vdsm exposes a per bridge STP knob that you can use for this. By
> >> >>> default it is set to false, which is probably why you had to use this
> >> >>> shenanigan.
> >> >>>
> >> >>> You can, for instance:
> >> >>>
> >> >>> # show present state
> >> >>> [vagrant@vdsm ~]$ ip a
> >> >>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >> >>> group default qlen 1000
> >> >>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> >>> inet 127.0.0.1/8 scope host lo
> >> >>> valid_lft forever preferred_lft forever
> >> >>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> >> >>> state UP group default qlen 1000
> >> >>> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
> >> >>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> >> >>> state UP group default qlen 1000
> >> >>> link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff
> >> >>> inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1
> >> >>> valid_lft forever preferred_lft forever
> >> >>> inet6 fe80::5054:ff:fe83:5b6f/64 scope link
> >> >>> valid_lft forever preferred_lft forever
> >> >>> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >> >>> group default qlen 1000
> >> >>> link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff
> >> >>>
> >> >>> # show example bridge configuration - you're looking for the STP knob here.
> >> >>> [root@vdsm ~]$ cat bridged_net_with_stp
> >> >>> {
> >> >>> "bondings": {},
> >> >>> "networks": {
> >> >>> "test-network": {
> >> >>> "nic": "eth0",
> >> >>> "switch": "legacy",
> >> >>> "bridged": true,
> >> >>> "stp": true
> >> >>> }
> >> >>> },
> >> >>> "options": {
> >> >>> "connectivityCheck": false
> >> >>> }
> >> >>> }
> >> >>>
> >> >>> # issue setup networks command:
> >> >>> [root@vdsm ~]$ vdsm-client -f bridged_net_with_stp Host setupNetworks
> >> >>> {
> >> >>> "code": 0,
> >> >>> "message": "Done"
> >> >>> }
> >> >>>
> >> >>> # show bridges
> >> >>> [root@vdsm ~]$ brctl show
> >> >>> bridge name bridge id STP enabled interfaces
> >> >>> ;vdsmdummy; 8000.000000000000 no
> >> >>> test-network 8000.52540041fb37 yes eth0
> >> >>>
> >> >>> # show final state
> >> >>> [root@vdsm ~]$ ip a
> >> >>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >> >>> group default qlen 1000
> >> >>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> >>> inet 127.0.0.1/8 scope host lo
> >> >>> valid_lft forever preferred_lft forever
> >> >>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> >> >>> master test-network state UP group default qlen 1000
> >> >>> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
> >> >>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> >> >>> state UP group default qlen 1000
> >> >>> link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff
> >> >>> inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1
> >> >>> valid_lft forever preferred_lft forever
> >> >>> inet6 fe80::5054:ff:fe83:5b6f/64 scope link
> >> >>> valid_lft forever preferred_lft forever
> >> >>> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >> >>> group default qlen 1000
> >> >>> link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff
> >> >>> 432: test-network: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >> >>> noqueue state UP group default qlen 1000
> >> >>> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
> >> >>>
> >> >>> I don't think this STP parameter is exposed via engine UI; @Dominik
> >> >>> Holler , could you confirm ? What are our plans for it ?
> >> >>>
> >> >>
> >> >> STP is only available via REST-API, see
> >> >> http://ovirt.github.io/ovirt-engine-api-model/4.3/#types/network
> >> >> please find an example how to enable STP in
> >> >> https://gist.github.com/dominikholler/4e70c9ef9929d93b6807f56d43a70b95
> >> >>
> >> >> We have no plans to add STP to the web ui,
> >> >> but new feature requests are always welcome on
> >> >> https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine
> >> >>
> >> >>
> >> >>>
> >> >>> >
> >> >>> > Here are some details about my systems, if you need it.
> >> >>> >
> >> >>> >
> >> >>> > selinux is disabled.
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > [root@swm-02 ~]# rpm -qa | grep ovirt
> >> >>> > ovirt-imageio-common-1.5.1-0.el7.x86_64
> >> >>> > ovirt-release43-4.3.5.2-1.el7.noarch
> >> >>> > ovirt-imageio-daemon-1.5.1-0.el7.noarch
> >> >>> > ovirt-vmconsole-host-1.0.7-2.el7.noarch
> >> >>> > ovirt-hosted-engine-setup-2.3.11-1.el7.noarch
> >> >>> > ovirt-ansible-hosted-engine-setup-1.0.26-1.el7.noarch
> >> >>> > python2-ovirt-host-deploy-1.8.0-1.el7.noarch
> >> >>> > ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
> >> >>> > python2-ovirt-setup-lib-1.2.0-1.el7.noarch
> >> >>> > cockpit-machines-ovirt-195.1-1.el7.noarch
> >> >>> > ovirt-hosted-engine-ha-2.3.3-1.el7.noarch
> >> >>> > ovirt-vmconsole-1.0.7-2.el7.noarch
> >> >>> > cockpit-ovirt-dashboard-0.13.5-1.el7.noarch
> >> >>> > ovirt-provider-ovn-driver-1.2.22-1.el7.noarch
> >> >>> > ovirt-host-deploy-common-1.8.0-1.el7.noarch
> >> >>> > ovirt-host-4.3.4-1.el7.x86_64
> >> >>> > python-ovirt-engine-sdk4-4.3.2-2.el7.x86_64
> >> >>> > ovirt-host-dependencies-4.3.4-1.el7.x86_64
> >> >>> > ovirt-ansible-repositories-1.1.5-1.el7.noarch
> >> >>> > [root@swm-02 ~]# cat /etc/redhat-release
> >> >>> > CentOS Linux release 7.6.1810 (Core)
> >> >>> > [root@swm-02 ~]# uname -r
> >> >>> > 3.10.0-957.27.2.el7.x86_64
> >> >>> > You have new mail in /var/spool/mail/root
> >> >>> > [root@swm-02 ~]# ip a
> >> >>> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >> >>> > group default qlen 1000
> >> >>> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> >>> > inet 127.0.0.1/8 scope host lo
> >> >>> > valid_lft forever preferred_lft forever
> >> >>> > 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> >> >>> > test state UP group default qlen 1000
> >> >>> > link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff
> >> >>> > 3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> >> >>> > default qlen 1000
> >> >>> > link/ether d4:ae:52:8d:50:49 brd ff:ff:ff:ff:ff:ff
> >> >>> > 4: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> >> >>> > ovirtmgmt state UP group default qlen 1000
> >> >>> > link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff
> >> >>> > 5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> >> >>> > default qlen 1000
> >> >>> > link/ether 90:e2:ba:1e:14:81 brd ff:ff:ff:ff:ff:ff
> >> >>> > 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >> >>> > group default qlen 1000
> >> >>> > link/ether a2:b8:d6:e8:b3:d8 brd ff:ff:ff:ff:ff:ff
> >> >>> > 7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> >> >>> > default qlen 1000
> >> >>> > link/ether 96:a0:c1:4a:45:4b brd ff:ff:ff:ff:ff:ff
> >> >>> > 25: test: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> >> >>> > state UP group default qlen 1000
> >> >>> > link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff
> >> >>> > inet 10.15.11.21/24 brd 10.15.11.255 scope global test
> >> >>> > valid_lft forever preferred_lft forever
> >> >>> > 26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >> >>> > noqueue state UP group default qlen 1000
> >> >>> > link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff
> >> >>> > inet 10.15.28.31/24 brd 10.15.28.255 scope global ovirtmgmt
> >> >>> > valid_lft forever preferred_lft forever
> >> >>> > 27: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >> >>> > group default qlen 1000
> >> >>> > link/ether 62:e5:e5:07:99:eb brd ff:ff:ff:ff:ff:ff
> >> >>> > 29: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> >> >>> > ovirtmgmt state UNKNOWN group default qlen 1000
> >> >>> > link/ether fe:6f:9c:95:00:02 brd ff:ff:ff:ff:ff:ff
> >> >>> > [root@swm-02 ~]# free -m
> >> >>> > total used free shared buff/cache available
> >> >>> > Mem: 64413 1873 61804 9 735 62062
> >> >>> > Swap: 16383 0 16383
> >> >>> > [root@swm-02 ~]# free -h
> >> >>> > total used free shared buff/cache available
> >> >>> > Mem: 62G 1.8G 60G 9.5M 735M 60G
> >> >>> > Swap: 15G 0B 15G
> >> >>> > [root@swm-02 ~]# ls
> >> >>> > ls lsb_release lshw lslocks
> >> >>> > lsmod lspci lssubsys
> >> >>> > lsusb.py
> >> >>> > lsattr lscgroup lsinitrd lslogins
> >> >>> > lsns lss16toppm lstopo-no-graphics
> >> >>> > lsblk lscpu lsipc lsmem
> >> >>> > lsof lsscsi lsusb
> >> >>> > [root@swm-02 ~]# lscpu
> >> >>> > Architecture: x86_64
> >> >>> > CPU op-mode(s): 32-bit, 64-bit
> >> >>> > Byte Order: Little Endian
> >> >>> > CPU(s): 16
> >> >>> > On-line CPU(s) list: 0-15
> >> >>> > Thread(s) per core: 2
> >> >>> > Core(s) per socket: 4
> >> >>> > Socket(s): 2
> >> >>> > NUMA node(s): 2
> >> >>> > Vendor ID: GenuineIntel
> >> >>> > CPU family: 6
> >> >>> > Model: 44
> >> >>> > Model name: Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
> >> >>> > Stepping: 2
> >> >>> > CPU MHz: 3192.064
> >> >>> > BogoMIPS: 6384.12
> >> >>> > Virtualization: VT-x
> >> >>> > L1d cache: 32K
> >> >>> > L1i cache: 32K
> >> >>> > L2 cache: 256K
> >> >>> > L3 cache: 12288K
> >> >>> > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
> >> >>> > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
> >> >>> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> >> >>> > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> >> >>> > tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
> >> >>> > rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq
> >> >>> > dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
> >> >>> > sse4_1 sse4_2 popcnt aes lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi
> >> >>> > flexpriority ept vpid dtherm ida arat spec_ctrl intel_stibp flush_l1d
> >> >>> > [root@swm-02 ~]#
> >> >>> > _______________________________________________
> >> >>> > Users mailing list -- users@ovirt.org
> >> >>> > To unsubscribe send an email to users-leave@ovirt.org
> >> >>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> >> >>> > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> >> >>> > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MTMZ5MF4CF2VR2D25VVPDNFN2IKE24AR/
> >> >>
> >> >> _______________________________________________
> >> >> Users mailing list -- users@ovirt.org
> >> >> To unsubscribe send an email to users-leave@ovirt.org
> >> >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> >> >> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> >> >> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QBA7NYKAJNREIV6TN42VCW4IN3CX4VFG/