I remembered vaguely that restarting the vm helps, but I don't
think we
know the root cause.
Adding Barak to help with the restart.
Right. If it's really about builders, then I'd favor that we create a
dedicated builder for Node.
Currently it looks like the loop issue is caused by at least one job,
and at least two jobs are suffering (one of them node).
But we are blocked by this issue, and having a dedicated builder would
help us move forward.
- fabian
On Jan 6, 2016 10:20 AM, "Fabian Deutsch"
<fdeutsch(a)redhat.com> wrote:
>
> Hey,
>
> our Node Next builds are alos failing with some error around loop devices.
>
> This worked just before christmas, but is now constantly failing this
> year.
>
> Is the root cause already known?
>
> Ryan and Tolik were looking into this from the Node side.
>
> - fabian
>
>
> On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
> > On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <eedri(a)redhat.com> wrote:
> >> I'm guessing this will e solved by running it on lago?
> >> Isn't that what yaniv is working on now?
> >
> > Yes, this may be more stable, but I heard that lago setup takes about
> > an hour, and the whole
> > run about 3 hours, so lot of work is needed until we can use it.
> >
> >> or these are unit tests and not functional?
> >
> > Thats the problem these tests fail because they do not test our code,
> > but the integration of our code in the environment. For example, if the
> > test
> > cannot find an available loop device, the test will fail.
> >
> > I think we must move these tests to the integration test package,
> > that does not run on the ci. These tests can be run only on a vm using
> > root privileges, and only single test per vm in the same time, to avoid
> > races
> > when accessing shared resources (devices, network, etc.).
> >
> > The best way to run such test is to start a stateless vm based on a
> > template
> > that include the entire requirements, so we don't need to pay for yum
> > install
> > on each test (may take 2-3 minutes).
> >
> > Some of our customers are using similar setups. Using such setup for our
> > own tests is the best thing we can do to improve the product.
> >
> >>
> >> e.
> >>
> >> On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <danken(a)redhat.com>
> >> wrote:
> >>>
> >>> On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote:
> >>> > Hi all,
> >>> >
> >>> > We see too many failures of tests using loop devices. Is it
possible
> >>> > that we run tests
> >>> > concurrently on the same slave, using all the available loop
> >>> > devices, or
> >>> > maybe
> >>> > creating races between different tests?
> >>> >
> >>> > It seems that we need new decorator for disabling tests on the CI
> >>> > slaves, since this
> >>> > environment is too fragile.
> >>> >
> >>> > Here are some failures:
> >>> >
> >>> > 01:10:33
> >>> >
> >>> >
======================================================================
> >>> > 01:10:33 ERROR: testLoopMount (mountTests.MountTests)
> >>> > 01:10:33
> >>> >
> >>> >
----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
> >>> > line 128, in testLoopMount
> >>> > 01:10:33 m.mount(mntOpts="loop")
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 225, in mount
> >>> > 01:10:33 return self._runcmd(cmd, timeout)
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 241, in _runcmd
> >>> > 01:10:33 raise MountError(rc, ";".join((out, err)))
> >>> > 01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to
setup
> >>> > loop device: No such file or directory\n')
> >>> > 01:10:33 -------------------- >> begin captured logging
<<
> >>> > --------------------
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list
0-1
> >>> > /sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None)
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> =
'mke2fs
> >>> > 1.42.13
> >>> > (17-May-2015)\n'; <rc> = 0
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list
0-1
> >>> > /usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd
None)
> >>> > 01:10:33 --------------------- >> end captured logging
<<
> >>> > ---------------------
> >>> > 01:10:33
> >>> > 01:10:33
> >>> >
> >>> >
======================================================================
> >>> > 01:10:33 ERROR: testSymlinkMount (mountTests.MountTests)
> >>> > 01:10:33
> >>> >
> >>> >
----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
> >>> > line 150, in testSymlinkMount
> >>> > 01:10:33 m.mount(mntOpts="loop")
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 225, in mount
> >>> > 01:10:33 return self._runcmd(cmd, timeout)
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 241, in _runcmd
> >>> > 01:10:33 raise MountError(rc, ";".join((out, err)))
> >>> > 01:10:33 MountError: (32, ';mount:
/var/tmp/tmp1UQFPz/backing.img:
> >>> > failed to setup loop device: No such file or directory\n')
> >>> > 01:10:33 -------------------- >> begin captured logging
<<
> >>> > --------------------
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list
0-1
> >>> > /sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None)
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> =
'mke2fs
> >>> > 1.42.13
> >>> > (17-May-2015)\n'; <rc> = 0
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list
0-1
> >>> > /usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image
> >>> > /var/tmp/tmp1UQFPz/mountpoint (cwd None)
> >>> > 01:10:33 --------------------- >> end captured logging
<<
> >>> > ---------------------
> >>> > 01:10:33
> >>> > 01:10:33
> >>> >
> >>> >
======================================================================
> >>> > 01:10:33 ERROR: test_getDevicePartedInfo
> >>> > (parted_utils_tests.PartedUtilsTests)
> >>> > 01:10:33
> >>> >
> >>> >
----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py",
> >>> > line 97, in wrapper
> >>> > 01:10:33 return f(*args, **kwargs)
> >>> > 01:10:33 File
> >>> >
> >>> >
> >>> >
"/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py",
> >>> > line 61, in setUp
> >>> > 01:10:33 self.assertEquals(rc, 0)
> >>> > 01:10:33 AssertionError: 1 != 0
> >>> > 01:10:33 -------------------- >> begin captured logging
<<
> >>> > --------------------
> >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd
> >>> > if=/dev/zero
> >>> > of=/tmp/tmpasV8TD bs=100M count=1 (cwd None)
> >>> > 01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records
in\n1+0 records
> >>> > out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n';
<rc>
> >>> > =
> >>> > 0
> >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f
> >>> > --show /tmp/tmpasV8TD (cwd None)
> >>> > 01:10:33 root: DEBUG: FAILED: <err> = 'losetup:
/tmp/tmpasV8TD:
> >>> > failed
> >>> > to set up loop device: No such file or directory\n'; <rc>
= 1
> >>> > 01:10:33 --------------------- >> end captured logging
<<
> >>> > ---------------------
> >>> >
> >>>
> >>> I've reluctantly marked another test as broken in
> >>>
https://gerrit.ovirt.org/50484
> >>> due to a similar problem.
> >>> Your idea of @brokentest_ci decorator is slightly less bad - at least
> >>> we
> >>> do not ignore errors in this test when run on non-ci platforms.
> >>>
> >>> Regards,
> >>> Dan.
> >>>
> >>> _______________________________________________
> >>> Infra mailing list
> >>> Infra(a)ovirt.org
> >>>
http://lists.ovirt.org/mailman/listinfo/infra
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Eyal Edri
> >> Associate Manager
> >> EMEA ENG Virtualization R&D
> >> Red Hat Israel
> >>
> >> phone: +972-9-7692018
> >> irc: eedri (on #tlv #rhev-dev #rhev-integ)
> > _______________________________________________
> > Infra mailing list
> > Infra(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/infra
>
>
>
> --
> Fabian Deutsch <fdeutsch(a)redhat.com>
> RHEV Hypervisor
> Red Hat