[VDSM] Random network tests failing

Seems that random network test fail now in the ci. Did we change anything in the infrastructure that can explain this? This is the same pattern we saw in other tests, all the ip calls are successful, but the test is not happy. 18:49:09 ====================================================================== 18:49:09 FAIL: testDisablePromisc (network.ipwrapper_test.TestDrvinfo) 18:49:09 ---------------------------------------------------------------------- 18:49:09 Traceback (most recent call last): 18:49:09 File "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/network/ipwrapper_test.py", line 149, in testDisablePromisc 18:49:09 "Could not disable promiscuous mode.") 18:49:09 AssertionError: Could not disable promiscuous mode. 18:49:09 -------------------- >> begin captured logging << -------------------- 18:49:09 2016-06-01 18:48:58,492 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /usr/sbin/brctl show (cwd None) 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link add name vdsm-Cnvkvg type bridge (cwd None) 18:49:09 2016-06-01 18:48:58,510 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,511 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg up (cwd None) 18:49:09 2016-06-01 18:48:58,522 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,523 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc on (cwd None) 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc off (cwd None) 18:49:09 2016-06-01 18:48:58,542 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0

On Wed, Jun 1, 2016 at 10:53 PM, Nir Soffer <nsoffer@redhat.com> wrote:
Seems that random network test fail now in the ci. Did we change anything in the infrastructure that can explain this?
This is the same pattern we saw in other tests, all the ip calls are successful, but the test is not happy.
18:49:09 ====================================================================== 18:49:09 FAIL: testDisablePromisc (network.ipwrapper_test.TestDrvinfo) 18:49:09 ---------------------------------------------------------------------- 18:49:09 Traceback (most recent call last): 18:49:09 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/network/ipwrapper_test.py", line 149, in testDisablePromisc 18:49:09 "Could not disable promiscuous mode.") 18:49:09 AssertionError: Could not disable promiscuous mode. 18:49:09 -------------------- >> begin captured logging << -------------------- 18:49:09 2016-06-01 18:48:58,492 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /usr/sbin/brctl show (cwd None) 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link add name vdsm-Cnvkvg type bridge (cwd None) 18:49:09 2016-06-01 18:48:58,510 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,511 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg up (cwd None) 18:49:09 2016-06-01 18:48:58,522 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,523 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc on (cwd None) 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc off (cwd None) 18:49:09 2016-06-01 18:48:58,542 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0
How often has this been seen? Is this failing on a specific platform? This is an integration test that should be run on a VM. We should target for running the CI integration tests with lago and a dedicated VDSM (for test) image. If this failure will occur too often, we can label it such that it will not run on CI. (but will run locally) Thanks, Edy.

On Thu, Jun 2, 2016 at 10:01 AM, Edward Haas <ehaas@redhat.com> wrote:
On Wed, Jun 1, 2016 at 10:53 PM, Nir Soffer <nsoffer@redhat.com> wrote:
Seems that random network test fail now in the ci. Did we change anything in the infrastructure that can explain this?
This is the same pattern we saw in other tests, all the ip calls are successful, but the test is not happy.
18:49:09 ====================================================================== 18:49:09 FAIL: testDisablePromisc (network.ipwrapper_test.TestDrvinfo) 18:49:09 ---------------------------------------------------------------------- 18:49:09 Traceback (most recent call last): 18:49:09 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/network/ipwrapper_test.py", line 149, in testDisablePromisc 18:49:09 "Could not disable promiscuous mode.") 18:49:09 AssertionError: Could not disable promiscuous mode. 18:49:09 -------------------- >> begin captured logging << -------------------- 18:49:09 2016-06-01 18:48:58,492 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /usr/sbin/brctl show (cwd None) 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,501 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link add name vdsm-Cnvkvg type bridge (cwd None) 18:49:09 2016-06-01 18:48:58,510 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,511 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg up (cwd None) 18:49:09 2016-06-01 18:48:58,522 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,523 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc on (cwd None) 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0 18:49:09 2016-06-01 18:48:58,533 DEBUG [root] (MainThread) /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-Cnvkvg promisc off (cwd None) 18:49:09 2016-06-01 18:48:58,542 DEBUG [root] (MainThread) SUCCESS: <err> = ''; <rc> = 0
How often has this been seen? Is this failing on a specific platform?
I seen this test fail once - it seems that some network tests are failing rarely, this is not a problem with specific test.
This is an integration test that should be run on a VM. We should target for running the CI integration tests with lago and a dedicated VDSM (for test) image.
If this failure will occur too often, we can label it such that it will not run on CI. (but will run locally)
Do we understand why they fail randomly? Is it possible to make them reliable in mock? Nir

Do we understand why they fail randomly?
Could be due to dirty slave state.
Is it possible to make them reliable in mock?
Mock will not help here as it only does FS isolation. We are working on a stateless slave solution which will hopefully solve such issues. But it'll take a while because the oVirt infra needs a few upgrades to support that. A possible alternative is to use Lago. -- Barak Korren bkorren@redhat.com RHEV-CI Team

On Thu, Jun 2, 2016 at 11:34 AM, Barak Korren <bkorren@redhat.com> wrote:
Do we understand why they fail randomly?
Could be due to dirty slave state.
Is it possible to make them reliable in mock?
Mock will not help here as it only does FS isolation.
We are working on a stateless slave solution which will hopefully solve such issues. But it'll take a while because the oVirt infra needs a few upgrades to support that. A possible alternative is to use Lago.
I've offered [1] in the past. It was isolated and quick (running both F23 and EL7 tests in parallel) and could run on anyone's laptop. Y. [1] https://gerrit.ovirt.org/#/c/56389/
-- Barak Korren bkorren@redhat.com RHEV-CI Team _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (4)
-
Barak Korren
-
Edward Haas
-
Nir Soffer
-
Yaniv Kaul