[ovirt-users] Re: Need to enable STP on ovirt bridges

23 Aug 2019

      Hey Dominik,

Thanks for helping. I really want to try to use ovirt.

When these events happen, I cannot even SSH to the nodes due to the
link being down. After a little while, the hosts come back...

On Fri, Aug 23, 2019 at 11:30 AM Dominik Holler <dholler@redhat.com> wrote:
...
Is you storage connected via NFS?
Can you manually access the storage on the host?
On Fri, Aug 23, 2019 at 5:19 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
...
Sorry to dead bump this, but I'm beginning to suspect that maybe it's
not STP that's the problem.
2 of my hosts just went down when a few VMs tried to migrate.
Do any of you have any idea what might be going on here? I don't even
know where to start. I'm going to include the dmesg in case it helps.
This happens on both of the hosts whenever any migration attempts to start.
[68099.245833] bnx2 0000:01:00.0 em1: NIC Copper Link is Down
[68099.246055] internal: port 1(em1) entered disabled state
[68184.177343] ixgbe 0000:03:00.0 p1p1: NIC Link is Down
[68184.177789] ovirtmgmt: port 1(p1p1) entered disabled state
[68184.177856] ovirtmgmt: topology change detected, propagating
[68277.078671] INFO: task qemu-kvm:8888 blocked for more than 120 seconds.
[68277.078700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[68277.078723] qemu-kvm        D ffff9db40c359040     0  8888      1 0x000001a0
[68277.078727] Call Trace:
[68277.078738]  [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0
[68277.078743]  [<ffffffff97d69f19>] schedule+0x29/0x70
[68277.078746]  [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100
[68277.078751]  [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40
[68277.078765]  [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs]
[68277.078768]  [<ffffffff97848109>] vfs_getattr+0x49/0x80
[68277.078769]  [<ffffffff97848185>] vfs_fstat+0x45/0x80
[68277.078771]  [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60
[68277.078774]  [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146
[68277.078778]  [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110
[68277.078782]  [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220
[68277.078784]  [<ffffffff97848ace>] SyS_newfstat+0xe/0x10
[68277.078786]  [<ffffffff97d7706b>] tracesys+0xa3/0xc9
[68397.072384] INFO: task qemu-kvm:8888 blocked for more than 120 seconds.
[68397.072413] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[68397.072436] qemu-kvm        D ffff9db40c359040     0  8888      1 0x000001a0
[68397.072439] Call Trace:
[68397.072453]  [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0
[68397.072458]  [<ffffffff97d69f19>] schedule+0x29/0x70
[68397.072462]  [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100
[68397.072467]  [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40
[68397.072480]  [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs]
[68397.072485]  [<ffffffff97848109>] vfs_getattr+0x49/0x80
[68397.072486]  [<ffffffff97848185>] vfs_fstat+0x45/0x80
[68397.072488]  [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60
[68397.072491]  [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146
[68397.072495]  [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110
[68397.072498]  [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220
[68397.072500]  [<ffffffff97848ace>] SyS_newfstat+0xe/0x10
[68397.072502]  [<ffffffff97d7706b>] tracesys+0xa3/0xc9
[68401.573141] bnx2 0000:01:00.0 em1: NIC Copper Link is Up, 1000 Mbps
full duplex
[68401.573247] internal: port 1(em1) entered blocking state
[68401.573255] internal: port 1(em1) entered listening state
[68403.576985] internal: port 1(em1) entered learning state
[68405.580907] internal: port 1(em1) entered forwarding state
[68405.580916] internal: topology change detected, propagating
[68469.565589] nfs: server swm-01.hpc.moffitt.org not responding, timed out
[68469.565840] nfs: server swm-01.hpc.moffitt.org not responding, timed out
[68487.193932] ixgbe 0000:03:00.0 p1p1: NIC Link is Up 10 Gbps, Flow
Control: RX/TX
[68487.194105] ovirtmgmt: port 1(p1p1) entered blocking state
[68487.194114] ovirtmgmt: port 1(p1p1) entered listening state
[68489.196508] ovirtmgmt: port 1(p1p1) entered learning state
[68491.200400] ovirtmgmt: port 1(p1p1) entered forwarding state
[68491.200405] ovirtmgmt: topology change detected, sending tcn bpdu
[68493.672423] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[68494.777996] NFSD: client 10.15.28.22 testing state ID with
incorrect client ID
[68494.778580] NFSD: client 10.15.28.22 testing state ID with
incorrect client ID
On Thu, Aug 22, 2019 at 2:53 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
...
Thanks, I'm just going to revert back to bridges.
On Thu, Aug 22, 2019 at 11:50 AM Dominik Holler <dholler@redhat.com> wrote:
...
On Thu, Aug 22, 2019 at 3:06 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
...
Seems like the STP options are so common and necessary that it would
be a priority over seldom-used bridge_opts. I know what STP is and I'm
not even a networking guy - never even heard of half of the
bridge_opts that have switches in the UI.
Anyway. I wanted to try the openvswitches, so I reinstalled all of my
nodes and used "openvswitch (Technology Preview)" as the engine-setup
option for the first host. I made a new Cluster for my nodes, added
them all to the new cluster, created a new "logical network" for the
internal network and attached it to the internal network ports.
Now, when I go to create a new VM, I don't even have either the
ovirtmgmt switch OR the internal switch as an option. The drop-down is
empy as if I don't have any vnic-profiles.
openvswitch clusters are limited to ovn networks.
You can create one like described in
https://www.ovirt.org/documentation/admin-guide/chap-External_Providers.html...
...
On Thu, Aug 22, 2019 at 7:34 AM Tony Pearce <tonyppe@gmail.com> wrote:
...
Hi Dominik, would you mind sharing the use case for stp via API Only? I am keen to know this.
Thanks
On Thu., 22 Aug. 2019, 19:24 Dominik Holler, <dholler@redhat.com> wrote:
>
>
>
> On Thu, Aug 22, 2019 at 1:08 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote:
>>
>> On Sat, Aug 17, 2019 at 11:27 AM <ej.albany@gmail.com> wrote:
>> >
>> > Hello. I have been trying to figure out an issue for a very long time.
>> > That issue relates to the ethernet and 10gb fc links that I have on my
>> > cluster being disabled any time a migration occurs.
>> >
>> > I believe this is because I need to have STP turned on in order to
>> > participate with the switch. However, there does not seem to be any
>> > way to tell oVirt to stop turning it off! Very frustrating.
>> >
>> > After entering a cronjob that enables stp on all bridges every 1
>> > minute, the migration issue disappears....
>> >
>> > Is there any way at all to do without this cronjob and set STP to be
>> > ON without having to resort to such a silly solution?
>>
>> Vdsm exposes a per bridge STP knob that you can use for this. By
>> default it is set to false, which is probably why you had to use this
>> shenanigan.
>>
>> You can, for instance:
>>
>> # show present state
>> [vagrant@vdsm ~]$ ip a
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>> group default qlen 1000
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>        valid_lft forever preferred_lft forever
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP group default qlen 1000
>>     link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP group default qlen 1000
>>     link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::5054:ff:fe83:5b6f/64 scope link
>>        valid_lft forever preferred_lft forever
>> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>> group default qlen 1000
>>     link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff
>>
>> # show example bridge configuration - you're looking for the STP knob here.
>> [root@vdsm ~]$ cat bridged_net_with_stp
>> {
>>   "bondings": {},
>>   "networks": {
>>     "test-network": {
>>       "nic": "eth0",
>>       "switch": "legacy",
>>       "bridged": true,
>>       "stp": true
>>     }
>>   },
>>   "options": {
>>     "connectivityCheck": false
>>   }
>> }
>>
>> # issue setup networks command:
>> [root@vdsm ~]$ vdsm-client -f bridged_net_with_stp Host setupNetworks
>> {
>>     "code": 0,
>>     "message": "Done"
>> }
>>
>> # show bridges
>> [root@vdsm ~]$ brctl show
>> bridge name bridge id STP enabled interfaces
>> ;vdsmdummy; 8000.000000000000 no
>> test-network 8000.52540041fb37 yes eth0
>>
>> # show final state
>> [root@vdsm ~]$ ip a
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>> group default qlen 1000
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>        valid_lft forever preferred_lft forever
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> master test-network state UP group default qlen 1000
>>     link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP group default qlen 1000
>>     link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::5054:ff:fe83:5b6f/64 scope link
>>        valid_lft forever preferred_lft forever
>> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>> group default qlen 1000
>>     link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff
>> 432: test-network: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>> noqueue state UP group default qlen 1000
>>     link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff
>>
>> I don't think this STP parameter is exposed via engine UI; @Dominik
>> Holler , could you confirm ? What are our plans for it ?
>>
>
> STP is only available via REST-API, see
> http://ovirt.github.io/ovirt-engine-api-model/4.3/#types/network
> please find an example how to enable STP in
> https://gist.github.com/dominikholler/4e70c9ef9929d93b6807f56d43a70b95
>
> We have no plans to add STP to the web ui,
> but new feature requests are always welcome on
> https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine
>
>
>>
>> >
>> > Here are some details about my systems, if you need it.
>> >
>> >
>> > selinux is disabled.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > [root@swm-02 ~]# rpm -qa | grep ovirt
>> > ovirt-imageio-common-1.5.1-0.el7.x86_64
>> > ovirt-release43-4.3.5.2-1.el7.noarch
>> > ovirt-imageio-daemon-1.5.1-0.el7.noarch
>> > ovirt-vmconsole-host-1.0.7-2.el7.noarch
>> > ovirt-hosted-engine-setup-2.3.11-1.el7.noarch
>> > ovirt-ansible-hosted-engine-setup-1.0.26-1.el7.noarch
>> > python2-ovirt-host-deploy-1.8.0-1.el7.noarch
>> > ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
>> > python2-ovirt-setup-lib-1.2.0-1.el7.noarch
>> > cockpit-machines-ovirt-195.1-1.el7.noarch
>> > ovirt-hosted-engine-ha-2.3.3-1.el7.noarch
>> > ovirt-vmconsole-1.0.7-2.el7.noarch
>> > cockpit-ovirt-dashboard-0.13.5-1.el7.noarch
>> > ovirt-provider-ovn-driver-1.2.22-1.el7.noarch
>> > ovirt-host-deploy-common-1.8.0-1.el7.noarch
>> > ovirt-host-4.3.4-1.el7.x86_64
>> > python-ovirt-engine-sdk4-4.3.2-2.el7.x86_64
>> > ovirt-host-dependencies-4.3.4-1.el7.x86_64
>> > ovirt-ansible-repositories-1.1.5-1.el7.noarch
>> > [root@swm-02 ~]# cat /etc/redhat-release
>> > CentOS Linux release 7.6.1810 (Core)
>> > [root@swm-02 ~]# uname -r
>> > 3.10.0-957.27.2.el7.x86_64
>> > You have new mail in /var/spool/mail/root
>> > [root@swm-02 ~]# ip a
>> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>> > group default qlen 1000
>> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> >     inet 127.0.0.1/8 scope host lo
>> >        valid_lft forever preferred_lft forever
>> > 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>> > test state UP group default qlen 1000
>> >     link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff
>> > 3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
>> > default qlen 1000
>> >     link/ether d4:ae:52:8d:50:49 brd ff:ff:ff:ff:ff:ff
>> > 4: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>> > ovirtmgmt state UP group default qlen 1000
>> >     link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff
>> > 5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
>> > default qlen 1000
>> >     link/ether 90:e2:ba:1e:14:81 brd ff:ff:ff:ff:ff:ff
>> > 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>> > group default qlen 1000
>> >     link/ether a2:b8:d6:e8:b3:d8 brd ff:ff:ff:ff:ff:ff
>> > 7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
>> > default qlen 1000
>> >     link/ether 96:a0:c1:4a:45:4b brd ff:ff:ff:ff:ff:ff
>> > 25: test: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> > state UP group default qlen 1000
>> >     link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff
>> >     inet 10.15.11.21/24 brd 10.15.11.255 scope global test
>> >        valid_lft forever preferred_lft forever
>> > 26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>> > noqueue state UP group default qlen 1000
>> >     link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff
>> >     inet 10.15.28.31/24 brd 10.15.28.255 scope global ovirtmgmt
>> >        valid_lft forever preferred_lft forever
>> > 27: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>> > group default qlen 1000
>> >     link/ether 62:e5:e5:07:99:eb brd ff:ff:ff:ff:ff:ff
>> > 29: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>> > ovirtmgmt state UNKNOWN group default qlen 1000
>> >     link/ether fe:6f:9c:95:00:02 brd ff:ff:ff:ff:ff:ff
>> > [root@swm-02 ~]# free -m
>> >               total        used        free      shared  buff/cache   available
>> > Mem:          64413        1873       61804           9         735       62062
>> > Swap:         16383           0       16383
>> > [root@swm-02 ~]# free -h
>> >               total        used        free      shared  buff/cache   available
>> > Mem:            62G        1.8G         60G        9.5M        735M         60G
>> > Swap:           15G          0B         15G
>> > [root@swm-02 ~]# ls
>> > ls                  lsb_release         lshw                lslocks
>> >          lsmod               lspci               lssubsys
>> > lsusb.py
>> > lsattr              lscgroup            lsinitrd            lslogins
>> >          lsns                lss16toppm          lstopo-no-graphics
>> > lsblk               lscpu               lsipc               lsmem
>> >          lsof                lsscsi              lsusb
>> > [root@swm-02 ~]# lscpu
>> > Architecture:          x86_64
>> > CPU op-mode(s):        32-bit, 64-bit
>> > Byte Order:            Little Endian
>> > CPU(s):                16
>> > On-line CPU(s) list:   0-15
>> > Thread(s) per core:    2
>> > Core(s) per socket:    4
>> > Socket(s):             2
>> > NUMA node(s):          2
>> > Vendor ID:             GenuineIntel
>> > CPU family:            6
>> > Model:                 44
>> > Model name:            Intel(R) Xeon(R) CPU           X5672  @ 3.20GHz
>> > Stepping:              2
>> > CPU MHz:               3192.064
>> > BogoMIPS:              6384.12
>> > Virtualization:        VT-x
>> > L1d cache:             32K
>> > L1i cache:             32K
>> > L2 cache:              256K
>> > L3 cache:              12288K
>> > NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
>> > NUMA node1 CPU(s):     1,3,5,7,9,11,13,15
>> > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>> > tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
>> > rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq
>> > dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
>> > sse4_1 sse4_2 popcnt aes lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi
>> > flexpriority ept vpid dtherm ida arat spec_ctrl intel_stibp flush_l1d
>> > [root@swm-02 ~]#
>> > _______________________________________________
>> > Users mailing list -- users@ovirt.org
>> > To unsubscribe send an email to users-leave@ovirt.org
>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>> > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MTMZ5MF4CF2VR2...
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QBA7NYKAJNREIV...