Re: Tested failing because of missing loop devices

6 Jan 2016

      On Wed, Jan 6, 2016 at 9:33 AM, Eyal Edri <eedri@redhat.com> wrote:
...
I remembered vaguely that restarting the vm helps,  but I don't think we
know the root cause.
Adding  Barak to help with the restart.
Right. If it's really about builders, then I'd favor that we create a
dedicated builder for Node.
Currently it looks like the loop issue is caused by at least one job,
and at least two jobs are suffering (one of them node).

But we are blocked by this issue, and having a dedicated builder would
help us move forward.

- fabian
...
On Jan 6, 2016 10:20 AM, "Fabian Deutsch" <fdeutsch@redhat.com> wrote:
...
Hey,
our Node Next builds are alos failing with some error around loop devices.
This worked just before christmas, but is now constantly failing this
year.
Is the root cause already known?
Ryan and Tolik were looking into this from the Node side.
- fabian
On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <nsoffer@redhat.com> wrote:
...
On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <eedri@redhat.com> wrote:
...
I'm guessing this will e solved by running it on lago?
Isn't that what yaniv is working on now?
Yes, this may be more stable, but I heard that lago setup takes about
an hour, and the whole
run about 3 hours, so lot of work is needed until we can use it.
...
or these are unit tests and not functional?
Thats the problem these tests fail because they do not test our code,
but the integration of our code in the environment. For example, if the
test
cannot find an available loop device, the test will fail.
I think we must move these tests to the integration test package,
that does not run on the ci. These tests can be run only on a vm using
root privileges, and only single test per vm in the same time, to avoid
races
when accessing shared resources (devices, network, etc.).
The best way to run such test is to start a stateless vm based on a
template
that include the entire requirements, so we don't need to pay for yum
install
on each test (may take 2-3 minutes).
Some of our customers are using similar setups. Using such setup for our
own tests is the best thing we can do to improve the product.
...
e.
On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <danken@redhat.com>
wrote:
...
On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote:
...
Hi all,
We see too many failures of tests using loop devices. Is it possible
that we run tests
concurrently on the same slave, using all the available loop
devices, or
maybe
creating races between different tests?
It seems that we need new decorator for disabling tests on the CI
slaves, since this
environment is too fragile.
Here are some failures:
01:10:33
======================================================================
01:10:33 ERROR: testLoopMount (mountTests.MountTests)
01:10:33
----------------------------------------------------------------------
01:10:33 Traceback (most recent call last):
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
line 128, in testLoopMount
01:10:33     m.mount(mntOpts="loop")
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
line 225, in mount
01:10:33     return self._runcmd(cmd, timeout)
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
line 241, in _runcmd
01:10:33     raise MountError(rc, ";".join((out, err)))
01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to setup
loop device: No such file or directory\n')
01:10:33 -------------------- >> begin captured logging <<
--------------------
01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
/sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None)
01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs
1.42.13
(17-May-2015)\n'; <rc> = 0
01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
/usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd None)
01:10:33 --------------------- >> end captured logging <<
---------------------
01:10:33
01:10:33
======================================================================
01:10:33 ERROR: testSymlinkMount (mountTests.MountTests)
01:10:33
----------------------------------------------------------------------
01:10:33 Traceback (most recent call last):
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
line 150, in testSymlinkMount
01:10:33     m.mount(mntOpts="loop")
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
line 225, in mount
01:10:33     return self._runcmd(cmd, timeout)
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
line 241, in _runcmd
01:10:33     raise MountError(rc, ";".join((out, err)))
01:10:33 MountError: (32, ';mount: /var/tmp/tmp1UQFPz/backing.img:
failed to setup loop device: No such file or directory\n')
01:10:33 -------------------- >> begin captured logging <<
--------------------
01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
/sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None)
01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs
1.42.13
(17-May-2015)\n'; <rc> = 0
01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
/usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image
/var/tmp/tmp1UQFPz/mountpoint (cwd None)
01:10:33 --------------------- >> end captured logging <<
---------------------
01:10:33
01:10:33
======================================================================
01:10:33 ERROR: test_getDevicePartedInfo
(parted_utils_tests.PartedUtilsTests)
01:10:33
----------------------------------------------------------------------
01:10:33 Traceback (most recent call last):
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py",
line 97, in wrapper
01:10:33     return f(*args, **kwargs)
01:10:33   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py",
line 61, in setUp
01:10:33     self.assertEquals(rc, 0)
01:10:33 AssertionError: 1 != 0
01:10:33 -------------------- >> begin captured logging <<
--------------------
01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd
if=/dev/zero
of=/tmp/tmpasV8TD bs=100M count=1 (cwd None)
01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records in\n1+0 records
out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n'; <rc>
=
0
01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f
--show /tmp/tmpasV8TD (cwd None)
01:10:33 root: DEBUG: FAILED: <err> = 'losetup: /tmp/tmpasV8TD:
failed
to set up loop device: No such file or directory\n'; <rc> = 1
01:10:33 --------------------- >> end captured logging <<
---------------------
I've reluctantly marked another test as broken in
https://gerrit.ovirt.org/50484
due to a similar problem.
Your idea of @brokentest_ci decorator is slightly less bad - at least
we
do not ignore errors in this test when run on non-ci platforms.
Regards,
Dan.
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra
--
Eyal Edri
Associate Manager
EMEA ENG Virtualization R&D
Red Hat Israel
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra
--
Fabian Deutsch <fdeutsch@redhat.com>
RHEV Hypervisor
Red Hat
-- 
Fabian Deutsch <fdeutsch@redhat.com>
RHEV Hypervisor
Red Hat