
Hi all, I'm seeing new random network test failures - see bellow. The interesting thing is I don't see these errors at all in travis. Maybe mark this as brokne_on_ci? 21:35:54 ====================================================================== 21:35:54 ERROR: test_event_groups (network.netlink_test.NetlinkEventMonitorTests) 21:35:54 ---------------------------------------------------------------------- 21:35:54 Traceback (most recent call last): 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/testValidation.py", line 97, in wrapper 21:35:54 return f(*args, **kwargs) 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/network/netlink_test.py", line 94, in test_event_groups 21:35:54 for event in mon_a: 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/lib/vdsm/network/netlink/monitor.py", line 115, in __iter__ 21:35:54 raise MonitorError(E_TIMEOUT) 21:35:54 MonitorError: 2 21:35:54 -------------------- >> begin captured logging << -------------------- 21:35:54 2016-11-27 21:35:18,631 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link add name dummy_0JsyI type dummy (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:18,944 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:18,962 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip -4 addr add dev dummy_0JsyI 192.0.2.1/24 (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,342 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:19,353 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link set dev dummy_0JsyI up (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,649 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:19,661 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link del dev dummy_0JsyI (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,948 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 --------------------- >> end captured logging << ---------------------

Another instance... 00:55:39 ====================================================================== 00:55:39 ERROR: test_event_groups (network.netlink_test.NetlinkEventMonitorTests) 00:55:39 ---------------------------------------------------------------------- 00:55:39 Traceback (most recent call last): 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/testValidation.py", line 97, in wrapper 00:55:39 return f(*args, **kwargs) 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/network/netlink_test.py", line 94, in test_event_groups 00:55:39 for event in mon_a: 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/lib/vdsm/network/netlink/monitor.py", line 115, in __iter__ 00:55:39 raise MonitorError(E_TIMEOUT) 00:55:39 MonitorError: 2 00:55:39 -------------------- >> begin captured logging << -------------------- 00:55:39 2016-11-28 00:55:01,538 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link add name dummy_bOihJ type dummy (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:01,767 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:01,795 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip -4 addr add dev dummy_bOihJ 192.0.2.1/24 (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,138 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:02,152 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link set dev dummy_bOihJ up (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,475 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:02,487 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link del dev dummy_bOihJ (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,854 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 --------------------- >> end captured logging << --------------------- 00:55:39 On Sun, Nov 27, 2016 at 11:52 PM, Nir Soffer <nsoffer@redhat.com> wrote:
Hi all,
I'm seeing new random network test failures - see bellow.
The interesting thing is I don't see these errors at all in travis.
Maybe mark this as brokne_on_ci?
21:35:54 ====================================================================== 21:35:54 ERROR: test_event_groups (network.netlink_test.NetlinkEventMonitorTests) 21:35:54 ---------------------------------------------------------------------- 21:35:54 Traceback (most recent call last): 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/testValidation.py", line 97, in wrapper 21:35:54 return f(*args, **kwargs) 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/network/netlink_test.py", line 94, in test_event_groups 21:35:54 for event in mon_a: 21:35:54 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/lib/vdsm/network/netlink/monitor.py", line 115, in __iter__ 21:35:54 raise MonitorError(E_TIMEOUT) 21:35:54 MonitorError: 2 21:35:54 -------------------- >> begin captured logging << -------------------- 21:35:54 2016-11-27 21:35:18,631 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link add name dummy_0JsyI type dummy (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:18,944 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:18,962 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip -4 addr add dev dummy_0JsyI 192.0.2.1/24 (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,342 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:19,353 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link set dev dummy_0JsyI up (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,649 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 2016-11-27 21:35:19,661 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link del dev dummy_0JsyI (cwd None) (commands:69) 21:35:54 2016-11-27 21:35:19,948 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 21:35:54 --------------------- >> end captured logging << ---------------------

On Mon, Nov 28, 2016 at 03:03:38AM +0200, Nir Soffer wrote:
Another instance...
00:55:39 ====================================================================== 00:55:39 ERROR: test_event_groups (network.netlink_test.NetlinkEventMonitorTests) 00:55:39 ---------------------------------------------------------------------- 00:55:39 Traceback (most recent call last): 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/testValidation.py", line 97, in wrapper 00:55:39 return f(*args, **kwargs) 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/network/netlink_test.py", line 94, in test_event_groups 00:55:39 for event in mon_a: 00:55:39 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/lib/vdsm/network/netlink/monitor.py", line 115, in __iter__ 00:55:39 raise MonitorError(E_TIMEOUT) 00:55:39 MonitorError: 2 00:55:39 -------------------- >> begin captured logging << -------------------- 00:55:39 2016-11-28 00:55:01,538 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link add name dummy_bOihJ type dummy (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:01,767 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:01,795 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip -4 addr add dev dummy_bOihJ 192.0.2.1/24 (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,138 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:02,152 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link set dev dummy_bOihJ up (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,475 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 2016-11-28 00:55:02,487 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-3 /sbin/ip link del dev dummy_bOihJ (cwd None) (commands:69) 00:55:39 2016-11-28 00:55:02,854 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:93) 00:55:39 --------------------- >> end captured logging << --------------------- 00:55:39
On Sun, Nov 27, 2016 at 11:52 PM, Nir Soffer <nsoffer@redhat.com> wrote:
Hi all,
I'm seeing new random network test failures - see bellow.
The interesting thing is I don't see these errors at all in travis.
Maybe mark this as brokne_on_ci?
Adding Petr; I assume he won't have plenty of time to fix the issue, we should mark it as broken. Any test with a timeout is broken-on-ci by design.

On 28 November 2016 at 09:03, Dan Kenigsberg <danken@redhat.com> wrote:
Adding Petr; I assume he won't have plenty of time to fix the issue, we should mark it as broken. Any test with a timeout is broken-on-ci by design.
What exactly is timing out? Can we help to fix that? Is it waiting forever for something that will never "show up" because it is outside the mock env (And could be bind-mounted into it)? -- Barak Korren bkorren@redhat.com RHEV-CI Team

On Mon, Nov 28, 2016 at 09:09:09AM +0200, Barak Korren wrote:
On 28 November 2016 at 09:03, Dan Kenigsberg <danken@redhat.com> wrote:
Adding Petr; I assume he won't have plenty of time to fix the issue, we should mark it as broken. Any test with a timeout is broken-on-ci by design.
What exactly is timing out? Can we help to fix that? Is it waiting forever for something that will never "show up" because it is outside the mock env (And could be bind-mounted into it)?
I did not research this deeply, but in that test we create network interfaces, and wait for the ensuing netlink events. If they don't arrive within 2 seconds, we declare failure. The failure we see is what I'd expect from a very slow slave.

On 28 November 2016 at 09:31, Dan Kenigsberg <danken@redhat.com> wrote:
The failure we see is what I'd expect from a very slow slave.
There are no more reasons of the slaves to be slow, and netlink events IMO should never have been slow (Completely unrelated to the storage which was the slow thing until we moved to local disks). Please investigate further, and open libvirt/oVirt bugs if you think the slave VMs are somehow inherently slow, as oVirt is not currently telling us the hyper-visors are overloaded in any way. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/
participants (3)
-
Barak Korren
-
Dan Kenigsberg
-
Nir Soffer