<p dir="ltr">I remembered vaguely that restarting the vm helps, but I don't think we know the root cause. </p>
<p dir="ltr">Adding Barak to help with the restart. </p>
<div class="gmail_quote">On Jan 6, 2016 10:20 AM, "Fabian Deutsch" <<a href="mailto:fdeutsch@redhat.com">fdeutsch@redhat.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hey,<br>
<br>
our Node Next builds are alos failing with some error around loop devices.<br>
<br>
This worked just before christmas, but is now constantly failing this year.<br>
<br>
Is the root cause already known?<br>
<br>
Ryan and Tolik were looking into this from the Node side.<br>
<br>
- fabian<br>
<br>
<br>
On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>> wrote:<br>
> On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <<a href="mailto:eedri@redhat.com">eedri@redhat.com</a>> wrote:<br>
>> I'm guessing this will e solved by running it on lago?<br>
>> Isn't that what yaniv is working on now?<br>
><br>
> Yes, this may be more stable, but I heard that lago setup takes about<br>
> an hour, and the whole<br>
> run about 3 hours, so lot of work is needed until we can use it.<br>
><br>
>> or these are unit tests and not functional?<br>
><br>
> Thats the problem these tests fail because they do not test our code,<br>
> but the integration of our code in the environment. For example, if the test<br>
> cannot find an available loop device, the test will fail.<br>
><br>
> I think we must move these tests to the integration test package,<br>
> that does not run on the ci. These tests can be run only on a vm using<br>
> root privileges, and only single test per vm in the same time, to avoid races<br>
> when accessing shared resources (devices, network, etc.).<br>
><br>
> The best way to run such test is to start a stateless vm based on a template<br>
> that include the entire requirements, so we don't need to pay for yum install<br>
> on each test (may take 2-3 minutes).<br>
><br>
> Some of our customers are using similar setups. Using such setup for our<br>
> own tests is the best thing we can do to improve the product.<br>
><br>
>><br>
>> e.<br>
>><br>
>> On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <<a href="mailto:danken@redhat.com">danken@redhat.com</a>> wrote:<br>
>>><br>
>>> On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote:<br>
>>> > Hi all,<br>
>>> ><br>
>>> > We see too many failures of tests using loop devices. Is it possible<br>
>>> > that we run tests<br>
>>> > concurrently on the same slave, using all the available loop devices, or<br>
>>> > maybe<br>
>>> > creating races between different tests?<br>
>>> ><br>
>>> > It seems that we need new decorator for disabling tests on the CI<br>
>>> > slaves, since this<br>
>>> > environment is too fragile.<br>
>>> ><br>
>>> > Here are some failures:<br>
>>> ><br>
>>> > 01:10:33<br>
>>> > ======================================================================<br>
>>> > 01:10:33 ERROR: testLoopMount (mountTests.MountTests)<br>
>>> > 01:10:33<br>
>>> > ----------------------------------------------------------------------<br>
>>> > 01:10:33 Traceback (most recent call last):<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",<br>
>>> > line 128, in testLoopMount<br>
>>> > 01:10:33 m.mount(mntOpts="loop")<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",<br>
>>> > line 225, in mount<br>
>>> > 01:10:33 return self._runcmd(cmd, timeout)<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",<br>
>>> > line 241, in _runcmd<br>
>>> > 01:10:33 raise MountError(rc, ";".join((out, err)))<br>
>>> > 01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to setup<br>
>>> > loop device: No such file or directory\n')<br>
>>> > 01:10:33 -------------------- >> begin captured logging <<<br>
>>> > --------------------<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1<br>
>>> > /sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None)<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13<br>
>>> > (17-May-2015)\n'; <rc> = 0<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1<br>
>>> > /usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd None)<br>
>>> > 01:10:33 --------------------- >> end captured logging <<<br>
>>> > ---------------------<br>
>>> > 01:10:33<br>
>>> > 01:10:33<br>
>>> > ======================================================================<br>
>>> > 01:10:33 ERROR: testSymlinkMount (mountTests.MountTests)<br>
>>> > 01:10:33<br>
>>> > ----------------------------------------------------------------------<br>
>>> > 01:10:33 Traceback (most recent call last):<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",<br>
>>> > line 150, in testSymlinkMount<br>
>>> > 01:10:33 m.mount(mntOpts="loop")<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",<br>
>>> > line 225, in mount<br>
>>> > 01:10:33 return self._runcmd(cmd, timeout)<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",<br>
>>> > line 241, in _runcmd<br>
>>> > 01:10:33 raise MountError(rc, ";".join((out, err)))<br>
>>> > 01:10:33 MountError: (32, ';mount: /var/tmp/tmp1UQFPz/backing.img:<br>
>>> > failed to setup loop device: No such file or directory\n')<br>
>>> > 01:10:33 -------------------- >> begin captured logging <<<br>
>>> > --------------------<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1<br>
>>> > /sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None)<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13<br>
>>> > (17-May-2015)\n'; <rc> = 0<br>
>>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1<br>
>>> > /usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image<br>
>>> > /var/tmp/tmp1UQFPz/mountpoint (cwd None)<br>
>>> > 01:10:33 --------------------- >> end captured logging <<<br>
>>> > ---------------------<br>
>>> > 01:10:33<br>
>>> > 01:10:33<br>
>>> > ======================================================================<br>
>>> > 01:10:33 ERROR: test_getDevicePartedInfo<br>
>>> > (parted_utils_tests.PartedUtilsTests)<br>
>>> > 01:10:33<br>
>>> > ----------------------------------------------------------------------<br>
>>> > 01:10:33 Traceback (most recent call last):<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py",<br>
>>> > line 97, in wrapper<br>
>>> > 01:10:33 return f(*args, **kwargs)<br>
>>> > 01:10:33 File<br>
>>> ><br>
>>> > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py",<br>
>>> > line 61, in setUp<br>
>>> > 01:10:33 self.assertEquals(rc, 0)<br>
>>> > 01:10:33 AssertionError: 1 != 0<br>
>>> > 01:10:33 -------------------- >> begin captured logging <<<br>
>>> > --------------------<br>
>>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd if=/dev/zero<br>
>>> > of=/tmp/tmpasV8TD bs=100M count=1 (cwd None)<br>
>>> > 01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records in\n1+0 records<br>
>>> > out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n'; <rc> =<br>
>>> > 0<br>
>>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f<br>
>>> > --show /tmp/tmpasV8TD (cwd None)<br>
>>> > 01:10:33 root: DEBUG: FAILED: <err> = 'losetup: /tmp/tmpasV8TD: failed<br>
>>> > to set up loop device: No such file or directory\n'; <rc> = 1<br>
>>> > 01:10:33 --------------------- >> end captured logging <<<br>
>>> > ---------------------<br>
>>> ><br>
>>><br>
>>> I've reluctantly marked another test as broken in<br>
>>> <a href="https://gerrit.ovirt.org/50484" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/50484</a><br>
>>> due to a similar problem.<br>
>>> Your idea of @brokentest_ci decorator is slightly less bad - at least we<br>
>>> do not ignore errors in this test when run on non-ci platforms.<br>
>>><br>
>>> Regards,<br>
>>> Dan.<br>
>>><br>
>>> _______________________________________________<br>
>>> Infra mailing list<br>
>>> <a href="mailto:Infra@ovirt.org">Infra@ovirt.org</a><br>
>>> <a href="http://lists.ovirt.org/mailman/listinfo/infra" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/infra</a><br>
>>><br>
>>><br>
>><br>
>><br>
>><br>
>> --<br>
>> Eyal Edri<br>
>> Associate Manager<br>
>> EMEA ENG Virtualization R&D<br>
>> Red Hat Israel<br>
>><br>
>> phone: <a href="tel:%2B972-9-7692018" value="+97297692018">+972-9-7692018</a><br>
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)<br>
> _______________________________________________<br>
> Infra mailing list<br>
> <a href="mailto:Infra@ovirt.org">Infra@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/infra" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/infra</a><br>
<br>
<br>
<br>
--<br>
Fabian Deutsch <<a href="mailto:fdeutsch@redhat.com">fdeutsch@redhat.com</a>><br>
RHEV Hypervisor<br>
Red Hat<br>
</blockquote></div>