I remembered vaguely that restarting the vm helps,  but I don't think we know the root cause.

Adding  Barak to help with the restart.

On Jan 6, 2016 10:20 AM, "Fabian Deutsch" <fdeutsch@redhat.com> wrote:
Hey,

our Node Next builds are alos failing with some error around loop devices.

This worked just before christmas, but is now constantly failing this year.

Is the root cause already known?

Ryan and Tolik were looking into this from the Node side.

- fabian


On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <nsoffer@redhat.com> wrote:
> On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <eedri@redhat.com> wrote:
>> I'm guessing this will e solved by running it on lago?
>> Isn't that what yaniv is working on now?
>
> Yes, this may be more stable, but I heard that lago setup takes about
> an hour, and the whole
> run about 3 hours, so lot of work is needed until we can use it.
>
>> or these are unit tests and not functional?
>
> Thats the problem these tests fail because they do not test our code,
> but the integration of our code in the environment. For example, if the test
> cannot find an available loop device, the test will fail.
>
> I think we must move these tests to the integration test package,
> that does not run on the ci. These tests can be run only on a vm using
> root privileges, and only single test per vm in the same time, to avoid races
> when accessing shared resources (devices, network, etc.).
>
> The best way to run such test is to start a stateless vm based on a template
> that include the entire requirements, so we don't need to pay for yum install
> on each test (may take 2-3 minutes).
>
> Some of our customers are using similar setups. Using such setup for our
> own tests is the best thing we can do to improve the product.
>
>>
>> e.
>>
>> On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <danken@redhat.com> wrote:
>>>
>>> On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote:
>>> > Hi all,
>>> >
>>> > We see too many failures of tests using loop devices. Is it possible
>>> > that we run tests
>>> > concurrently on the same slave, using all the available loop devices, or
>>> > maybe
>>> > creating races between different tests?
>>> >
>>> > It seems that we need new decorator for disabling tests on the CI
>>> > slaves, since this
>>> > environment is too fragile.
>>> >
>>> > Here are some failures:
>>> >
>>> > 01:10:33
>>> > ======================================================================
>>> > 01:10:33 ERROR: testLoopMount (mountTests.MountTests)
>>> > 01:10:33
>>> > ----------------------------------------------------------------------
>>> > 01:10:33 Traceback (most recent call last):
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
>>> > line 128, in testLoopMount
>>> > 01:10:33     m.mount(mntOpts="loop")
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
>>> > line 225, in mount
>>> > 01:10:33     return self._runcmd(cmd, timeout)
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
>>> > line 241, in _runcmd
>>> > 01:10:33     raise MountError(rc, ";".join((out, err)))
>>> > 01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to setup
>>> > loop device: No such file or directory\n')
>>> > 01:10:33 -------------------- >> begin captured logging <<
>>> > --------------------
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
>>> > /sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None)
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13
>>> > (17-May-2015)\n'; <rc> = 0
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
>>> > /usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd None)
>>> > 01:10:33 --------------------- >> end captured logging <<
>>> > ---------------------
>>> > 01:10:33
>>> > 01:10:33
>>> > ======================================================================
>>> > 01:10:33 ERROR: testSymlinkMount (mountTests.MountTests)
>>> > 01:10:33
>>> > ----------------------------------------------------------------------
>>> > 01:10:33 Traceback (most recent call last):
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
>>> > line 150, in testSymlinkMount
>>> > 01:10:33     m.mount(mntOpts="loop")
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
>>> > line 225, in mount
>>> > 01:10:33     return self._runcmd(cmd, timeout)
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
>>> > line 241, in _runcmd
>>> > 01:10:33     raise MountError(rc, ";".join((out, err)))
>>> > 01:10:33 MountError: (32, ';mount: /var/tmp/tmp1UQFPz/backing.img:
>>> > failed to setup loop device: No such file or directory\n')
>>> > 01:10:33 -------------------- >> begin captured logging <<
>>> > --------------------
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
>>> > /sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None)
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13
>>> > (17-May-2015)\n'; <rc> = 0
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
>>> > /usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image
>>> > /var/tmp/tmp1UQFPz/mountpoint (cwd None)
>>> > 01:10:33 --------------------- >> end captured logging <<
>>> > ---------------------
>>> > 01:10:33
>>> > 01:10:33
>>> > ======================================================================
>>> > 01:10:33 ERROR: test_getDevicePartedInfo
>>> > (parted_utils_tests.PartedUtilsTests)
>>> > 01:10:33
>>> > ----------------------------------------------------------------------
>>> > 01:10:33 Traceback (most recent call last):
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py",
>>> > line 97, in wrapper
>>> > 01:10:33     return f(*args, **kwargs)
>>> > 01:10:33   File
>>> >
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py",
>>> > line 61, in setUp
>>> > 01:10:33     self.assertEquals(rc, 0)
>>> > 01:10:33 AssertionError: 1 != 0
>>> > 01:10:33 -------------------- >> begin captured logging <<
>>> > --------------------
>>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd if=/dev/zero
>>> > of=/tmp/tmpasV8TD bs=100M count=1 (cwd None)
>>> > 01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records in\n1+0 records
>>> > out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n'; <rc> =
>>> > 0
>>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f
>>> > --show /tmp/tmpasV8TD (cwd None)
>>> > 01:10:33 root: DEBUG: FAILED: <err> = 'losetup: /tmp/tmpasV8TD: failed
>>> > to set up loop device: No such file or directory\n'; <rc> = 1
>>> > 01:10:33 --------------------- >> end captured logging <<
>>> > ---------------------
>>> >
>>>
>>> I've reluctantly marked another test as broken in
>>> https://gerrit.ovirt.org/50484
>>> due to a similar problem.
>>> Your idea of @brokentest_ci decorator is slightly less bad - at least we
>>> do not ignore errors in this test when run on non-ci platforms.
>>>
>>> Regards,
>>> Dan.
>>>
>>> _______________________________________________
>>> Infra mailing list
>>> Infra@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>>
>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> EMEA ENG Virtualization R&D
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
> _______________________________________________
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra



--
Fabian Deutsch <fdeutsch@redhat.com>
RHEV Hypervisor
Red Hat