On Fri, Nov 9, 2018 at 3:25 PM Germano Veit Michel <germano@redhat.com> wrote:


On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,

I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.

One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?

=================================== FAILURES ===================================
__________________________ TestMount.testSymlinkMount __________________________

args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,)
kwargs = {}

    @wraps(f)
    def wrapper(*args, **kwargs):
        if cond:
            raise SkipTest(reason)
>       return f(*args, **kwargs)

testValidation.py:274: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
testValidation.py:333: in wrapper
    return f(*args, **kwargs)
storage/mount_test.py:167: in testSymlinkMount
    m.umount()
/usr/lib64/python2.7/contextlib.py:35: in __exit__
    self.gen.throw(type, value, traceback)
testlib.py:218: in namedTemporaryDir
    shutil.rmtree(tmpDir)
/usr/lib64/python2.7/shutil.py:261: in rmtree
    rmtree(fullname, ignore_errors, onerror)
/usr/lib64/python2.7/shutil.py:270: in rmtree
    onerror(os.rmdir, path, sys.exc_info())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False
onerror = <function onerror at 0x7fd68afc9938>

    def rmtree(path, ignore_errors=False, onerror=None):
        """Recursively delete a directory tree.
    
        If ignore_errors is set, errors are ignored; otherwise, if onerror
        is set, it is called to handle the error with arguments (func,
        path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
        path is the argument to that function that caused it to fail; and
        exc_info is a tuple returned by sys.exc_info().  If ignore_errors
        is false and onerror is None, an exception is raised.
    
        """
        if ignore_errors:
            def onerror(*args):
                pass
        elif onerror is None:
            def onerror(*args):
                raise
        try:
            if os.path.islink(path):
                # symlinks to directories are forbidden, see bug #1669
                raise OSError("Cannot call rmtree on a symbolic link")
        except OSError:
            onerror(os.path.islink, path, sys.exc_info())
            # can't continue even if onerror hook returns
            return
        names = []
        try:
            names = os.listdir(path)
        except os.error, err:
            onerror(os.listdir, path, sys.exc_info())
        for name in names:
            fullname = os.path.join(path, name)
            try:
                mode = os.lstat(fullname).st_mode
            except os.error:
                mode = 0
            if stat.S_ISDIR(mode):
                rmtree(fullname, ignore_errors, onerror)
            else:
                try:
                    os.remove(fullname)
                except os.error, err:
                    onerror(os.remove, fullname, sys.exc_info())
        try:
>           os.rmdir(path)
E           OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'

/usr/lib64/python2.7/shutil.py:268: OSError
----------------------------- Captured stderr call -----------------------------
2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68)
2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89)
2018-11-07 00:41:55,562 INFO  (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204)
2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68)
2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89)
2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454)
2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68)
2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89)
2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454)
2018-11-07 00:41:55,628 INFO  (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212)
2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68)
2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)

The test is doing:

163             m.mount(mntOpts="loop")
164             try:
165                 self.assertTrue(m.isMounted())
166             finally:
167                 m.umount()

So nothing can use this mount - except systemd, creating a mount unit for every mount
created by "mount".

If you can reproduce this with calling "mount" and "umount" in a loop, we can open
a systemd bug for this.

Adding little sleep before unmounting will probably avoid this issue. In production we
never unmount write after mount.

A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount
file based storage mounts in lib/vdsm/storage/storageServer.py:

201     def disconnect(self):
202         self._mount.umount(True, True)

Nir 

Thanks Nir!

I'll see if I can reproduce it next week. I have only seen this one on FC28, so it could be a FC28 bug. 
Yesterday the rebase CI also failed on 2 patches, but the errors were different and I did not have time to investigate. 

Hit it again, but now on testLoopMount() instead of testSymlinkMount().

Tried reproducing it on F29 (not 28), did not hit it.

#!/bin/bash

truncate -s 100M /tmp/a
mkfs.ext2 /tmp/a
mkdir /tmp/b
while [ True ]
do
    mount -o loop /tmp/a /tmp/b
    umount /tmp/b
done

Do you want me to submit a patch with m.umount(force=True, lazy=True) for all those tests? Would it be correct to do this?