[VDSM] TestMount.testSymlinkMount failing

Hi, I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails. One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it? =================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________ args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {} @wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938> def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree. If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised. """ if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint' /usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89) Full logs here: https://jenkins.ovirt.org/job/vdsm_master_check-patch-fc28-x86_64/1811/artif...

On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,
I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.
One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?
=================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________
args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {}
@wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938>
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised.
""" if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'
/usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)
The test is doing: 163 m.mount(mntOpts="loop") 164 try: 165 self.assertTrue(m.isMounted()) 166 finally: 167 m.umount() So nothing can use this mount - except systemd, creating a mount unit for every mount created by "mount". If you can reproduce this with calling "mount" and "umount" in a loop, we can open a systemd bug for this. Adding little sleep before unmounting will probably avoid this issue. In production we never unmount write after mount. A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount file based storage mounts in lib/vdsm/storage/storageServer.py: 201 def disconnect(self): 202 self._mount.umount(True, True) Nir

On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,
I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.
One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?
=================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________
args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {}
@wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938>
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised.
""" if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'
/usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)
The test is doing:
163 m.mount(mntOpts="loop") 164 try: 165 self.assertTrue(m.isMounted()) 166 finally: 167 m.umount()
So nothing can use this mount - except systemd, creating a mount unit for every mount created by "mount".
If you can reproduce this with calling "mount" and "umount" in a loop, we can open a systemd bug for this.
Adding little sleep before unmounting will probably avoid this issue. In production we never unmount write after mount.
A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount file based storage mounts in lib/vdsm/storage/storageServer.py:
201 def disconnect(self): 202 self._mount.umount(True, True)
Nir
Thanks Nir! I'll see if I can reproduce it next week. I have only seen this one on FC28, so it could be a FC28 bug. Yesterday the rebase CI also failed on 2 patches, but the errors were different and I did not have time to investigate.

On Fri, Nov 9, 2018 at 3:25 PM Germano Veit Michel <germano@redhat.com> wrote:
On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,
I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.
One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?
=================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________
args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {}
@wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938>
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised.
""" if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'
/usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)
The test is doing:
163 m.mount(mntOpts="loop") 164 try: 165 self.assertTrue(m.isMounted()) 166 finally: 167 m.umount()
So nothing can use this mount - except systemd, creating a mount unit for every mount created by "mount".
If you can reproduce this with calling "mount" and "umount" in a loop, we can open a systemd bug for this.
Adding little sleep before unmounting will probably avoid this issue. In production we never unmount write after mount.
A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount file based storage mounts in lib/vdsm/storage/storageServer.py:
201 def disconnect(self): 202 self._mount.umount(True, True)
Nir
Thanks Nir!
I'll see if I can reproduce it next week. I have only seen this one on FC28, so it could be a FC28 bug. Yesterday the rebase CI also failed on 2 patches, but the errors were different and I did not have time to investigate.
Hit it again, but now on testLoopMount() instead of testSymlinkMount(). Tried reproducing it on F29 (not 28), did not hit it. #!/bin/bash truncate -s 100M /tmp/a mkfs.ext2 /tmp/a mkdir /tmp/b while [ True ] do mount -o loop /tmp/a /tmp/b umount /tmp/b done Do you want me to submit a patch with m.umount(force=True, lazy=True) for all those tests? Would it be correct to do this?

On Mon, Nov 12, 2018 at 7:03 AM Germano Veit Michel <germano@redhat.com> wrote:
On Fri, Nov 9, 2018 at 3:25 PM Germano Veit Michel <germano@redhat.com> wrote:
On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,
I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.
One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?
=================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________
args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {}
@wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938>
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised.
""" if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'
/usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)
The test is doing:
163 m.mount(mntOpts="loop") 164 try: 165 self.assertTrue(m.isMounted()) 166 finally: 167 m.umount()
So nothing can use this mount - except systemd, creating a mount unit for every mount created by "mount".
If you can reproduce this with calling "mount" and "umount" in a loop, we can open a systemd bug for this.
Adding little sleep before unmounting will probably avoid this issue. In production we never unmount write after mount.
A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount file based storage mounts in lib/vdsm/storage/storageServer.py:
201 def disconnect(self): 202 self._mount.umount(True, True)
Nir
Thanks Nir!
I'll see if I can reproduce it next week. I have only seen this one on FC28, so it could be a FC28 bug. Yesterday the rebase CI also failed on 2 patches, but the errors were different and I did not have time to investigate.
Hit it again, but now on testLoopMount() instead of testSymlinkMount().
Tried reproducing it on F29 (not 28), did not hit it.
#!/bin/bash
truncate -s 100M /tmp/a mkfs.ext2 /tmp/a mkdir /tmp/b while [ True ] do mount -o loop /tmp/a /tmp/b umount /tmp/b done
Do you want me to submit a patch with m.umount(force=True, lazy=True) for all those tests? Would it be correct to do this?
I reproduce these issue today in the CI, and look at git history. We had force=True in the past, but it was removed in: commit c8a216677e82aa61aab0c98a1021265121f8db57 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Apr 24 01:05:33 2016 +0300 mount: Remove pointless force=True usage Using force=True when unmounting loop device is pointless - this option will cause umount(8) to call the umount(2) syscall with the MNT_FORCE flag, which is useful only for NFS mounts. Change-Id: I31ff97d882f292f9c6325903ed9072b7146f682e Signed-off-by: Nir Soffer <nsoffer@redhat.com> Reviewed-on: https://gerrit.ovirt.org/56524 Reviewed-by: Dan Kenigsberg <danken@redhat.com> Reviewed-by: Allon Mureinik <amureini@redhat.com> Continuous-Integration: Jenkins CI We fixed similar issue later in: commit d158b1a6f54ba67d81cb163593e3c330c77d36fd Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Apr 24 22:36:47 2016 +0300 mount: Wait for systemd temporary mount unit Systemd is monitoring mountpoints created during runtime by creating a temporary mount units. For example: # systemctl status /tmp/mount-1.DUNxySxI2R/wrk ● tmp-mount\x2d1.DUNxySxI2R-wrk.mount - /tmp/mount-1.DUNxySxI2R/wrk Loaded: loaded (/proc/self/mountinfo) Active: active (mounted) since Sun 2016-04-24 22:37:55 IDT; 14ms ago Where: /tmp/mount-1.DUNxySxI2R/wrk What: /dev/loop2 Mounting and unmounting quickly as we do in the tests, or when creating a floppy image may confuse systemd, and lead to various errors: - mountpoint is not mounted when trying to access files or unmount - mountpoint is busy when trying to unmount These are probably systemd or mount bugs. Until these issues are resolve, we need to find a workaround to avoid failures in the tests or when creating a floppy images. This patch add udevadm.settle() call after successful mount and umount operations. Change-Id: I9b0318eb534f781f127882983527e1a221820478 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Reviewed-on: https://gerrit.ovirt.org/56544 Reviewed-by: Dan Kenigsberg <danken@redhat.com> Continuous-Integration: Jenkins CI Reviewed-by: Francesco Romani <fromani@redhat.com> Based on this, I posted https://gerrit.ovirt.org/c/95719/ Please try to rebase your patches on top of this, and report if it fixes this issue. Nir

On Mon, Nov 26, 2018 at 8:24 AM Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Nov 12, 2018 at 7:03 AM Germano Veit Michel <germano@redhat.com> wrote:
On Fri, Nov 9, 2018 at 3:25 PM Germano Veit Michel <germano@redhat.com> wrote:
On Wed, Nov 7, 2018 at 9:34 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Nov 7, 2018 at 6:27 AM Germano Veit Michel <germano@redhat.com> wrote:
Hi,
I have seen CI failing with some "random" errors which makes submitting a series of patches a bit annoying, as some patch always fails with some error, then next run another patch fails.
One of these problems is TestMount.testSymlinkMount. Looks like its getting EBUSY when doing a rmdir. So probably something still using it?
=================================== FAILURES =================================== __________________________ TestMount.testSymlinkMount __________________________
args = (<storage.mount_test.TestMount testMethod=testSymlinkMount>,) kwargs = {}
@wraps(f) def wrapper(*args, **kwargs): if cond: raise SkipTest(reason)
return f(*args, **kwargs)
testValidation.py:274: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ testValidation.py:333: in wrapper return f(*args, **kwargs) storage/mount_test.py:167: in testSymlinkMount m.umount() /usr/lib64/python2.7/contextlib.py:35: in __exit__ self.gen.throw(type, value, traceback) testlib.py:218: in namedTemporaryDir shutil.rmtree(tmpDir) /usr/lib64/python2.7/shutil.py:261: in rmtree rmtree(fullname, ignore_errors, onerror) /usr/lib64/python2.7/shutil.py:270: in rmtree onerror(os.rmdir, path, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = '/var/tmp/tmpOYMiHx/mountpoint', ignore_errors = False onerror = <function onerror at 0x7fd68afc9938>
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised.
""" if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try:
os.rmdir(path)
E OSError: [Errno 16] Device or resource busy: '/var/tmp/tmpOYMiHx/mountpoint'
/usr/lib64/python2.7/shutil.py:268: OSError ----------------------------- Captured stderr call ----------------------------- 2018-11-07 00:41:55,518 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-1 /sbin/mkfs.ext2 -F /var/tmp/tmpOYMiHx/backing.img (cwd None) (commands:68) 2018-11-07 00:41:55,561 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: <err> = 'mke2fs 1.44.2 (14-May-2018)\n'; <rc> = 0 (commands:89) 2018-11-07 00:41:55,562 INFO (MainThread) [storage.Mount] mounting /var/tmp/tmpOYMiHx/link_to_image at /var/tmp/tmpOYMiHx/mountpoint (mount:204) 2018-11-07 00:41:55,563 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/mount -o loop /var/tmp/tmpOYMiHx/link_to_image /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,598 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,599 DEBUG (MainThread) [storage.Mount] /var/tmp/tmpOYMiHx/mountpoint mounted: 0.03 seconds (utils:454) 2018-11-07 00:41:55,599 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:68) 2018-11-07 00:41:55,624 DEBUG (MainThread) [root] SUCCESS: <err> = ''; <rc> = 0 (commands:89) 2018-11-07 00:41:55,625 DEBUG (MainThread) [storage.Mount] Waiting for udev mount events: 0.03 seconds (utils:454) 2018-11-07 00:41:55,628 INFO (MainThread) [storage.Mount] unmounting /var/tmp/tmpOYMiHx/mountpoint (mount:212) 2018-11-07 00:41:55,629 DEBUG (MainThread) [root] /usr/bin/taskset --cpu-list 0-1 /usr/bin/umount /var/tmp/tmpOYMiHx/mountpoint (cwd None) (commands:68) 2018-11-07 00:41:55,658 DEBUG (MainThread) [root] FAILED: <err> = 'umount: /var/tmp/tmpOYMiHx/mountpoint: target is busy.\n'; <rc> = 32 (commands:89)
The test is doing:
163 m.mount(mntOpts="loop") 164 try: 165 self.assertTrue(m.isMounted()) 166 finally: 167 m.umount()
So nothing can use this mount - except systemd, creating a mount unit for every mount created by "mount".
If you can reproduce this with calling "mount" and "umount" in a loop, we can open a systemd bug for this.
Adding little sleep before unmounting will probably avoid this issue. In production we never unmount write after mount.
A possible fix is to use m.umount(force=True, lazy=True) - this is how we unmount file based storage mounts in lib/vdsm/storage/storageServer.py:
201 def disconnect(self): 202 self._mount.umount(True, True)
Nir
Thanks Nir!
I'll see if I can reproduce it next week. I have only seen this one on FC28, so it could be a FC28 bug. Yesterday the rebase CI also failed on 2 patches, but the errors were different and I did not have time to investigate.
Hit it again, but now on testLoopMount() instead of testSymlinkMount().
Tried reproducing it on F29 (not 28), did not hit it.
#!/bin/bash
truncate -s 100M /tmp/a mkfs.ext2 /tmp/a mkdir /tmp/b while [ True ] do mount -o loop /tmp/a /tmp/b umount /tmp/b done
Do you want me to submit a patch with m.umount(force=True, lazy=True) for all those tests? Would it be correct to do this?
I reproduce these issue today in the CI, and look at git history.
We had force=True in the past, but it was removed in:
commit c8a216677e82aa61aab0c98a1021265121f8db57 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Apr 24 01:05:33 2016 +0300
mount: Remove pointless force=True usage
Using force=True when unmounting loop device is pointless - this option will cause umount(8) to call the umount(2) syscall with the MNT_FORCE flag, which is useful only for NFS mounts.
Change-Id: I31ff97d882f292f9c6325903ed9072b7146f682e Signed-off-by: Nir Soffer <nsoffer@redhat.com> Reviewed-on: https://gerrit.ovirt.org/56524 Reviewed-by: Dan Kenigsberg <danken@redhat.com> Reviewed-by: Allon Mureinik <amureini@redhat.com> Continuous-Integration: Jenkins CI
We fixed similar issue later in:
commit d158b1a6f54ba67d81cb163593e3c330c77d36fd Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Apr 24 22:36:47 2016 +0300
mount: Wait for systemd temporary mount unit
Systemd is monitoring mountpoints created during runtime by creating a temporary mount units. For example:
# systemctl status /tmp/mount-1.DUNxySxI2R/wrk ● tmp-mount\x2d1.DUNxySxI2R-wrk.mount - /tmp/mount-1.DUNxySxI2R/wrk Loaded: loaded (/proc/self/mountinfo) Active: active (mounted) since Sun 2016-04-24 22:37:55 IDT; 14ms ago Where: /tmp/mount-1.DUNxySxI2R/wrk What: /dev/loop2
Mounting and unmounting quickly as we do in the tests, or when creating a floppy image may confuse systemd, and lead to various errors:
- mountpoint is not mounted when trying to access files or unmount - mountpoint is busy when trying to unmount
These are probably systemd or mount bugs. Until these issues are resolve, we need to find a workaround to avoid failures in the tests or when creating a floppy images.
This patch add udevadm.settle() call after successful mount and umount operations.
Change-Id: I9b0318eb534f781f127882983527e1a221820478 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Reviewed-on: https://gerrit.ovirt.org/56544 Reviewed-by: Dan Kenigsberg <danken@redhat.com> Continuous-Integration: Jenkins CI Reviewed-by: Francesco Romani <fromani@redhat.com>
Based on this, I posted https://gerrit.ovirt.org/c/95719/
Please try to rebase your patches on top of this, and report if it fixes this issue.
Nir
Hi Nir, It makes sense. So I did rebase my patches on top of yours, and it seems to have fixed those random errors. It used to get at least 2 failures out of 12 patches. After the rebase I hit 0 of those errors. Still, I hit 4 of these on the latest round, but its not the same problem. So it looks like your patch did the job. ERROR: Command failed: # /usr/sbin/groupadd -g 1000 mock Thanks! Germano
participants (2)
-
Germano Veit Michel
-
Nir Soffer