On Mon, Nov 12, 2018 at 7:03 AM Germano Veit Michel <germano(a)redhat.com>
wrote:
On Fri, Nov 9, 2018 at 3:25 PM Germano Veit Michel <germano(a)redhat.com>
wrote:
>
>
> On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>> On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano(a)redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have seen CI failing with some "random" errors which makes
submitting
>>> a series of patches a bit annoying, as some patch always fails with some
>>> error, then next run another patch fails.
>>>
>>> One of these problems is TestMount.testSymlinkMount. Looks like its
>>> getting EBUSY when doing a rmdir. So probably something still using it?
>>>
>>> =================================== FAILURES
===================================
>>> __________________________ TestMount.testSymlinkMount
__________________________
>>>
>>> args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,)
>>> kwargs = {}
>>>
>>> @wraps(f)
>>> def wrapper(*args, **kwargs):
>>> if cond:
>>> raise SkipTest(reason)
>>> > return f(*args, **kwargs)
>>>
>>> testValidation.py:274:
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_
>>> testValidation.py:333: in wrapper
>>> return f(*args, **kwargs)
>>> storage/mount_test.py:167: in testSymlinkMount
>>> m.umount()
>>> /usr/lib64/python2.7/contextlib.py:35: in __exit__
>>> self.gen.throw(type, value, traceback)
>>> testlib.py:218: in namedTemporaryDir
>>> shutil.rmtree(tmpDir)
>>> /usr/lib64/python2.7/shutil.py:261: in rmtree
>>> rmtree(fullname, ignore_errors, onerror)
>>> /usr/lib64/python2.7/shutil.py:270: in rmtree
>>> onerror(os.rmdir, path, sys.exc_info())
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_
>>>
>>> path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False
>>> onerror = <function onerror at 0x7fd68afc9938>
>>>
>>> def rmtree(path, ignore_errors=False, onerror=None):
>>> """Recursively delete a directory tree.
>>>
>>> If ignore_errors is set, errors are ignored; otherwise, if onerror
>>> is set, it is called to handle the error with arguments (func,
>>> path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
>>> path is the argument to that function that caused it to fail; and
>>> exc_info is a tuple returned by sys.exc_info(). If ignore_errors
>>> is false and onerror is None, an exception is raised.
>>>
>>> """
>>> if ignore_errors:
>>> def onerror(*args):
>>> pass
>>> elif onerror is None:
>>> def onerror(*args):
>>> raise
>>> try:
>>> if os.path.islink(path):
>>> # symlinks to directories are forbidden, see bug #1669
>>> raise OSError("Cannot call rmtree on a symbolic
link")
>>> except OSError:
>>> onerror(os.path.islink, path, sys.exc_info())
>>> # can't continue even if onerror hook returns
>>> return
>>> names = []
>>> try:
>>> names = os.listdir(path)
>>> except os.error, err:
>>> onerror(os.listdir, path, sys.exc_info())
>>> for name in names:
>>> fullname = os.path.join(path, name)
>>> try:
>>> mode = os.lstat(fullname).st_mode
>>> except os.error:
>>> mode = 0
>>> if stat.S_ISDIR(mode):
>>> rmtree(fullname, ignore_errors, onerror)
>>> else:
>>> try:
>>> os.remove(fullname)
>>> except os.error, err:
>>> onerror(os.remove, fullname, sys.exc_info())
>>> try:
>>> > os.rmdir(path)
>>> E OSError: [Errno 16] Device or resource busy:
'/var/tmp/tmpOYMiHx/mountpoint'
>>>
>>> /usr/lib64/python2.7/shutil.py:268: OSError
>>> ----------------------------- Captured stderr call
-----------------------------
>>> 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd]
/usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd
None) (commands:68)
>>> 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS:
<err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89)
>>> 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting
/var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204)
>>> 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list
0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint
(cwd None) (commands:68)
>>> 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> =
''; <rc> = 0 (commands:89)
>>> 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount]
/var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454)
>>> 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list
0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68)
>>> 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> =
''; <rc> = 0 (commands:89)
>>> 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev
mount events: 0.03 seconds (utils:454)
>>> 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting
/var/tmp/tmpOYMiHx/mountpoint (mount:212)
>>> 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list
0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68)
>>> 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> =
'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32
(commands:89)
>>>
>>>
>> The test is doing:
>>
>> 163 m.mount(mntOpts="loop")
>> 164 try:
>> 165 self.assertTrue(m.isMounted())
>> 166 finally:
>> 167 m.umount()
>>
>> So nothing can use this mount - except systemd, creating a mount unit
>> for every mount
>> created by "mount".
>>
>> If you can reproduce this with calling "mount" and "umount"
in a loop,
>> we can open
>> a systemd bug for this.
>>
>> Adding little sleep before unmounting will probably avoid this issue. In
>> production we
>> never unmount write after mount.
>>
>> A possible fix is to use m.umount(force=True, lazy=True) - this is how
>> we unmount
>> file based storage mounts in lib/vdsm/storage/storageServer.py:
>>
>> 201 def disconnect(self):
>> 202 self._mount.umount(True, True)
>>
>> Nir
>>
>
> Thanks Nir!
>
> I'll see if I can reproduce it next week. I have only seen this one on
> FC28, so it could be a FC28 bug.
> Yesterday the rebase CI also failed on 2 patches, but the errors were
> different and I did not have time to investigate.
>
Hit it again, but now on testLoopMount() instead of testSymlinkMount().
Tried reproducing it on F29 (not 28), did not hit it.
#!/bin/bash
truncate -s 100M /tmp/a
mkfs.ext2 /tmp/a
mkdir /tmp/b
while [ True ]
do
mount -o loop /tmp/a /tmp/b
umount /tmp/b
done
Do you want me to submit a patch with m.umount(force=True, lazy=True) for
all those tests? Would it be correct to do this?
I reproduce these issue today in the CI, and look at git history.
We had force=True in the past, but it was removed in:
commit c8a216677e82aa61aab0c98a1021265121f8db57
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Sun Apr 24 01:05:33 2016 +0300
mount: Remove pointless force=True usage
Using force=True when unmounting loop device is pointless - this option
will cause umount(8) to call the umount(2) syscall with the MNT_FORCE
flag, which is useful only for NFS mounts.
Change-Id: I31ff97d882f292f9c6325903ed9072b7146f682e
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
Reviewed-on:
https://gerrit.ovirt.org/56524
Reviewed-by: Dan Kenigsberg <danken(a)redhat.com>
Reviewed-by: Allon Mureinik <amureini(a)redhat.com>
Continuous-Integration: Jenkins CI
We fixed similar issue later in:
commit d158b1a6f54ba67d81cb163593e3c330c77d36fd
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Sun Apr 24 22:36:47 2016 +0300
mount: Wait for systemd temporary mount unit
Systemd is monitoring mountpoints created during runtime by creating a
temporary mount units. For example:
# systemctl status /tmp/mount-1.DUNxySxI2R/wrk
● tmp-mount\x2d1.DUNxySxI2R-wrk.mount - /tmp/mount-1.DUNxySxI2R/wrk
Loaded: loaded (/proc/self/mountinfo)
Active: active (mounted) since Sun 2016-04-24 22:37:55 IDT; 14ms
ago
Where: /tmp/mount-1.DUNxySxI2R/wrk
What: /dev/loop2
Mounting and unmounting quickly as we do in the tests, or when creating
a floppy image may confuse systemd, and lead to various errors:
- mountpoint is not mounted when trying to access files or unmount
- mountpoint is busy when trying to unmount
These are probably systemd or mount bugs. Until these issues are
resolve, we need to find a workaround to avoid failures in the tests or
when creating a floppy images.
This patch add udevadm.settle() call after successful mount and umount
operations.
Change-Id: I9b0318eb534f781f127882983527e1a221820478
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
Reviewed-on:
https://gerrit.ovirt.org/56544
Reviewed-by: Dan Kenigsberg <danken(a)redhat.com>
Continuous-Integration: Jenkins CI
Reviewed-by: Francesco Romani <fromani(a)redhat.com>
Based on this, I posted
https://gerrit.ovirt.org/c/95719/
Please try to rebase your patches on top of this, and report if it fixes
this issue.
Nir