
Hey Dominik, Thanks for helping. I really want to try to use ovirt. When these events happen, I cannot even SSH to the nodes due to the link being down. After a little while, the hosts come back... On Fri, Aug 23, 2019 at 11:30 AM Dominik Holler <dholler@redhat.com> wrote:
Is you storage connected via NFS? Can you manually access the storage on the host?
On Fri, Aug 23, 2019 at 5:19 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
Sorry to dead bump this, but I'm beginning to suspect that maybe it's not STP that's the problem.
2 of my hosts just went down when a few VMs tried to migrate.
Do any of you have any idea what might be going on here? I don't even know where to start. I'm going to include the dmesg in case it helps. This happens on both of the hosts whenever any migration attempts to start.
[68099.245833] bnx2 0000:01:00.0 em1: NIC Copper Link is Down [68099.246055] internal: port 1(em1) entered disabled state [68184.177343] ixgbe 0000:03:00.0 p1p1: NIC Link is Down [68184.177789] ovirtmgmt: port 1(p1p1) entered disabled state [68184.177856] ovirtmgmt: topology change detected, propagating [68277.078671] INFO: task qemu-kvm:8888 blocked for more than 120 seconds. [68277.078700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [68277.078723] qemu-kvm D ffff9db40c359040 0 8888 1 0x000001a0 [68277.078727] Call Trace: [68277.078738] [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0 [68277.078743] [<ffffffff97d69f19>] schedule+0x29/0x70 [68277.078746] [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100 [68277.078751] [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40 [68277.078765] [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs] [68277.078768] [<ffffffff97848109>] vfs_getattr+0x49/0x80 [68277.078769] [<ffffffff97848185>] vfs_fstat+0x45/0x80 [68277.078771] [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60 [68277.078774] [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146 [68277.078778] [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110 [68277.078782] [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220 [68277.078784] [<ffffffff97848ace>] SyS_newfstat+0xe/0x10 [68277.078786] [<ffffffff97d7706b>] tracesys+0xa3/0xc9 [68397.072384] INFO: task qemu-kvm:8888 blocked for more than 120 seconds. [68397.072413] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [68397.072436] qemu-kvm D ffff9db40c359040 0 8888 1 0x000001a0 [68397.072439] Call Trace: [68397.072453] [<ffffffff978fd2ac>] ? avc_has_perm_flags+0xdc/0x1c0 [68397.072458] [<ffffffff97d69f19>] schedule+0x29/0x70 [68397.072462] [<ffffffff9785f3d9>] inode_dio_wait+0xd9/0x100 [68397.072467] [<ffffffff976c4010>] ? wake_bit_function+0x40/0x40 [68397.072480] [<ffffffffc09d6dd6>] nfs_getattr+0x1b6/0x250 [nfs] [68397.072485] [<ffffffff97848109>] vfs_getattr+0x49/0x80 [68397.072486] [<ffffffff97848185>] vfs_fstat+0x45/0x80 [68397.072488] [<ffffffff978486f4>] SYSC_newfstat+0x24/0x60 [68397.072491] [<ffffffff97d76d21>] ? system_call_after_swapgs+0xae/0x146 [68397.072495] [<ffffffff97739f34>] ? __audit_syscall_entry+0xb4/0x110 [68397.072498] [<ffffffff9763aaeb>] ? syscall_trace_enter+0x16b/0x220 [68397.072500] [<ffffffff97848ace>] SyS_newfstat+0xe/0x10 [68397.072502] [<ffffffff97d7706b>] tracesys+0xa3/0xc9 [68401.573141] bnx2 0000:01:00.0 em1: NIC Copper Link is Up, 1000 Mbps full duplex
[68401.573247] internal: port 1(em1) entered blocking state [68401.573255] internal: port 1(em1) entered listening state [68403.576985] internal: port 1(em1) entered learning state [68405.580907] internal: port 1(em1) entered forwarding state [68405.580916] internal: topology change detected, propagating [68469.565589] nfs: server swm-01.hpc.moffitt.org not responding, timed out [68469.565840] nfs: server swm-01.hpc.moffitt.org not responding, timed out [68487.193932] ixgbe 0000:03:00.0 p1p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [68487.194105] ovirtmgmt: port 1(p1p1) entered blocking state [68487.194114] ovirtmgmt: port 1(p1p1) entered listening state [68489.196508] ovirtmgmt: port 1(p1p1) entered learning state [68491.200400] ovirtmgmt: port 1(p1p1) entered forwarding state [68491.200405] ovirtmgmt: topology change detected, sending tcn bpdu [68493.672423] NFS: nfs4_reclaim_open_state: Lock reclaim failed! [68494.777996] NFSD: client 10.15.28.22 testing state ID with incorrect client ID [68494.778580] NFSD: client 10.15.28.22 testing state ID with incorrect client ID
On Thu, Aug 22, 2019 at 2:53 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
Thanks, I'm just going to revert back to bridges.
On Thu, Aug 22, 2019 at 11:50 AM Dominik Holler <dholler@redhat.com> wrote:
On Thu, Aug 22, 2019 at 3:06 PM Curtis E. Combs Jr. <ej.albany@gmail.com> wrote:
Seems like the STP options are so common and necessary that it would be a priority over seldom-used bridge_opts. I know what STP is and I'm not even a networking guy - never even heard of half of the bridge_opts that have switches in the UI.
Anyway. I wanted to try the openvswitches, so I reinstalled all of my nodes and used "openvswitch (Technology Preview)" as the engine-setup option for the first host. I made a new Cluster for my nodes, added them all to the new cluster, created a new "logical network" for the internal network and attached it to the internal network ports.
Now, when I go to create a new VM, I don't even have either the ovirtmgmt switch OR the internal switch as an option. The drop-down is empy as if I don't have any vnic-profiles.
openvswitch clusters are limited to ovn networks. You can create one like described in https://www.ovirt.org/documentation/admin-guide/chap-External_Providers.html...
On Thu, Aug 22, 2019 at 7:34 AM Tony Pearce <tonyppe@gmail.com> wrote:
Hi Dominik, would you mind sharing the use case for stp via API Only? I am keen to know this. Thanks
On Thu., 22 Aug. 2019, 19:24 Dominik Holler, <dholler@redhat.com> wrote: > > > > On Thu, Aug 22, 2019 at 1:08 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote: >> >> On Sat, Aug 17, 2019 at 11:27 AM <ej.albany@gmail.com> wrote: >> > >> > Hello. I have been trying to figure out an issue for a very long time. >> > That issue relates to the ethernet and 10gb fc links that I have on my >> > cluster being disabled any time a migration occurs. >> > >> > I believe this is because I need to have STP turned on in order to >> > participate with the switch. However, there does not seem to be any >> > way to tell oVirt to stop turning it off! Very frustrating. >> > >> > After entering a cronjob that enables stp on all bridges every 1 >> > minute, the migration issue disappears.... >> > >> > Is there any way at all to do without this cronjob and set STP to be >> > ON without having to resort to such a silly solution? >> >> Vdsm exposes a per bridge STP knob that you can use for this. By >> default it is set to false, which is probably why you had to use this >> shenanigan. >> >> You can, for instance: >> >> # show present state >> [vagrant@vdsm ~]$ ip a >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >> group default qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> state UP group default qlen 1000 >> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff >> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> state UP group default qlen 1000 >> link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff >> inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1 >> valid_lft forever preferred_lft forever >> inet6 fe80::5054:ff:fe83:5b6f/64 scope link >> valid_lft forever preferred_lft forever >> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >> group default qlen 1000 >> link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff >> >> # show example bridge configuration - you're looking for the STP knob here. >> [root@vdsm ~]$ cat bridged_net_with_stp >> { >> "bondings": {}, >> "networks": { >> "test-network": { >> "nic": "eth0", >> "switch": "legacy", >> "bridged": true, >> "stp": true >> } >> }, >> "options": { >> "connectivityCheck": false >> } >> } >> >> # issue setup networks command: >> [root@vdsm ~]$ vdsm-client -f bridged_net_with_stp Host setupNetworks >> { >> "code": 0, >> "message": "Done" >> } >> >> # show bridges >> [root@vdsm ~]$ brctl show >> bridge name bridge id STP enabled interfaces >> ;vdsmdummy; 8000.000000000000 no >> test-network 8000.52540041fb37 yes eth0 >> >> # show final state >> [root@vdsm ~]$ ip a >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >> group default qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> master test-network state UP group default qlen 1000 >> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff >> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> state UP group default qlen 1000 >> link/ether 52:54:00:83:5b:6f brd ff:ff:ff:ff:ff:ff >> inet 192.168.50.50/24 brd 192.168.50.255 scope global noprefixroute eth1 >> valid_lft forever preferred_lft forever >> inet6 fe80::5054:ff:fe83:5b6f/64 scope link >> valid_lft forever preferred_lft forever >> 19: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >> group default qlen 1000 >> link/ether 8e:5c:2e:87:fa:0b brd ff:ff:ff:ff:ff:ff >> 432: test-network: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >> noqueue state UP group default qlen 1000 >> link/ether 52:54:00:41:fb:37 brd ff:ff:ff:ff:ff:ff >> >> I don't think this STP parameter is exposed via engine UI; @Dominik >> Holler , could you confirm ? What are our plans for it ? >> > > STP is only available via REST-API, see > http://ovirt.github.io/ovirt-engine-api-model/4.3/#types/network > please find an example how to enable STP in > https://gist.github.com/dominikholler/4e70c9ef9929d93b6807f56d43a70b95 > > We have no plans to add STP to the web ui, > but new feature requests are always welcome on > https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine > > >> >> > >> > Here are some details about my systems, if you need it. >> > >> > >> > selinux is disabled. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > [root@swm-02 ~]# rpm -qa | grep ovirt >> > ovirt-imageio-common-1.5.1-0.el7.x86_64 >> > ovirt-release43-4.3.5.2-1.el7.noarch >> > ovirt-imageio-daemon-1.5.1-0.el7.noarch >> > ovirt-vmconsole-host-1.0.7-2.el7.noarch >> > ovirt-hosted-engine-setup-2.3.11-1.el7.noarch >> > ovirt-ansible-hosted-engine-setup-1.0.26-1.el7.noarch >> > python2-ovirt-host-deploy-1.8.0-1.el7.noarch >> > ovirt-ansible-engine-setup-1.1.9-1.el7.noarch >> > python2-ovirt-setup-lib-1.2.0-1.el7.noarch >> > cockpit-machines-ovirt-195.1-1.el7.noarch >> > ovirt-hosted-engine-ha-2.3.3-1.el7.noarch >> > ovirt-vmconsole-1.0.7-2.el7.noarch >> > cockpit-ovirt-dashboard-0.13.5-1.el7.noarch >> > ovirt-provider-ovn-driver-1.2.22-1.el7.noarch >> > ovirt-host-deploy-common-1.8.0-1.el7.noarch >> > ovirt-host-4.3.4-1.el7.x86_64 >> > python-ovirt-engine-sdk4-4.3.2-2.el7.x86_64 >> > ovirt-host-dependencies-4.3.4-1.el7.x86_64 >> > ovirt-ansible-repositories-1.1.5-1.el7.noarch >> > [root@swm-02 ~]# cat /etc/redhat-release >> > CentOS Linux release 7.6.1810 (Core) >> > [root@swm-02 ~]# uname -r >> > 3.10.0-957.27.2.el7.x86_64 >> > You have new mail in /var/spool/mail/root >> > [root@swm-02 ~]# ip a >> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >> > group default qlen 1000 >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> > inet 127.0.0.1/8 scope host lo >> > valid_lft forever preferred_lft forever >> > 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master >> > test state UP group default qlen 1000 >> > link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff >> > 3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group >> > default qlen 1000 >> > link/ether d4:ae:52:8d:50:49 brd ff:ff:ff:ff:ff:ff >> > 4: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master >> > ovirtmgmt state UP group default qlen 1000 >> > link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff >> > 5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group >> > default qlen 1000 >> > link/ether 90:e2:ba:1e:14:81 brd ff:ff:ff:ff:ff:ff >> > 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >> > group default qlen 1000 >> > link/ether a2:b8:d6:e8:b3:d8 brd ff:ff:ff:ff:ff:ff >> > 7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group >> > default qlen 1000 >> > link/ether 96:a0:c1:4a:45:4b brd ff:ff:ff:ff:ff:ff >> > 25: test: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue >> > state UP group default qlen 1000 >> > link/ether d4:ae:52:8d:50:48 brd ff:ff:ff:ff:ff:ff >> > inet 10.15.11.21/24 brd 10.15.11.255 scope global test >> > valid_lft forever preferred_lft forever >> > 26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >> > noqueue state UP group default qlen 1000 >> > link/ether 90:e2:ba:1e:14:80 brd ff:ff:ff:ff:ff:ff >> > inet 10.15.28.31/24 brd 10.15.28.255 scope global ovirtmgmt >> > valid_lft forever preferred_lft forever >> > 27: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >> > group default qlen 1000 >> > link/ether 62:e5:e5:07:99:eb brd ff:ff:ff:ff:ff:ff >> > 29: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master >> > ovirtmgmt state UNKNOWN group default qlen 1000 >> > link/ether fe:6f:9c:95:00:02 brd ff:ff:ff:ff:ff:ff >> > [root@swm-02 ~]# free -m >> > total used free shared buff/cache available >> > Mem: 64413 1873 61804 9 735 62062 >> > Swap: 16383 0 16383 >> > [root@swm-02 ~]# free -h >> > total used free shared buff/cache available >> > Mem: 62G 1.8G 60G 9.5M 735M 60G >> > Swap: 15G 0B 15G >> > [root@swm-02 ~]# ls >> > ls lsb_release lshw lslocks >> > lsmod lspci lssubsys >> > lsusb.py >> > lsattr lscgroup lsinitrd lslogins >> > lsns lss16toppm lstopo-no-graphics >> > lsblk lscpu lsipc lsmem >> > lsof lsscsi lsusb >> > [root@swm-02 ~]# lscpu >> > Architecture: x86_64 >> > CPU op-mode(s): 32-bit, 64-bit >> > Byte Order: Little Endian >> > CPU(s): 16 >> > On-line CPU(s) list: 0-15 >> > Thread(s) per core: 2 >> > Core(s) per socket: 4 >> > Socket(s): 2 >> > NUMA node(s): 2 >> > Vendor ID: GenuineIntel >> > CPU family: 6 >> > Model: 44 >> > Model name: Intel(R) Xeon(R) CPU X5672 @ 3.20GHz >> > Stepping: 2 >> > CPU MHz: 3192.064 >> > BogoMIPS: 6384.12 >> > Virtualization: VT-x >> > L1d cache: 32K >> > L1i cache: 32K >> > L2 cache: 256K >> > L3 cache: 12288K >> > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 >> > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15 >> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep >> > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht >> > tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts >> > rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq >> > dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca >> > sse4_1 sse4_2 popcnt aes lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi >> > flexpriority ept vpid dtherm ida arat spec_ctrl intel_stibp flush_l1d >> > [root@swm-02 ~]# >> > _______________________________________________ >> > Users mailing list -- users@ovirt.org >> > To unsubscribe send an email to users-leave@ovirt.org >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ >> > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MTMZ5MF4CF2VR2... > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QBA7NYKAJNREIV...