[VDSM] travis tests fail consistently since Apr 14

There are several issues: 1. coverage fail after this patch: https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec... https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests' 2. pywatch_test - gdb not installed We need to add gdb and install install python-debuginfo package on the test images. self = <pywatch_test.TestPyWatch object at 0x2bbc0d0> def test_timeout(self): rc, out, err = exec_cmd(['./py-watch', '0.1', 'sleep', '10']) assert b'Watched process timed out' in out
assert rc == 128 + signal.SIGTERM
E assert 1 == (128 + 15) E + where 15 = signal.SIGTERM pywatch_test.py:45: AssertionError ------------------------------ Captured log call ------------------------------- cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 59, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1 Nir

Ping On Tue, May 8, 2018 at 11:59 AM Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
2. pywatch_test - gdb not installed
We need to add gdb and install install python-debuginfo package on the test images.
self = <pywatch_test.TestPyWatch object at 0x2bbc0d0> def test_timeout(self): rc, out, err = exec_cmd(['./py-watch', '0.1', 'sleep', '10']) assert b'Watched process timed out' in out
assert rc == 128 + signal.SIGTERM
E assert 1 == (128 + 15) E + where 15 = signal.SIGTERM pywatch_test.py:45: AssertionError ------------------------------ Captured log call ------------------------------- cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 59, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1
Nir

2018-05-16 15:20 GMT+02:00 Nir Soffer <nsoffer@redhat.com>:
Ping
There's ongoing discussion about the real need of having travis testing. What do we test in travis that Jenkins is not testing already?
On Tue, May 8, 2018 at 11:59 AM Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch: https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05 f2838d2ec8
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
2. pywatch_test - gdb not installed
We need to add gdb and install install python-debuginfo package on the test images.
self = <pywatch_test.TestPyWatch object at 0x2bbc0d0> def test_timeout(self): rc, out, err = exec_cmd(['./py-watch', '0.1', 'sleep', '10']) assert b'Watched process timed out' in out
assert rc == 128 + signal.SIGTERM
E assert 1 == (128 + 15) E + where 15 = signal.SIGTERM pywatch_test.py:45: AssertionError ------------------------------ Captured log call ------------------------------- cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 59, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1
Nir
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org
-- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

2018-05-16 17:12 GMT+02:00 Sandro Bonazzola <sbonazzo@redhat.com>:
2018-05-16 15:20 GMT+02:00 Nir Soffer <nsoffer@redhat.com>:
Ping
There's ongoing discussion about the real need of having travis testing. What do we test in travis that Jenkins is not testing already?
Travis CI is still failing. https://travis-ci.org/oVirt/vdsm/jobs/386053194 OK (SKIP=63) mv .coverage .coverage-nose-py2 mv: cannot stat '.coverage': No such file or directory make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check' def _run_cmd(cmd, cwd=None): rc, out, err = commands.execCmd(cmd, raw=True, cwd=cwd) if rc != 0:
raise cmdutils.Error(cmd, rc, out, err)
E Error: Command ['/usr/bin/qemu-img', 'info', '--output', 'json', '-U', '/var/tmp/tmp89QFPD/img.img'] failed with rc=1 out='' err="qemu-img: unrecognized option '-U'\nTry 'qemu-img --help' for more information\n" ../lib/vdsm/storage/qemuimg.py:399: Error this happen 3 times in the job causing the result to be: 3 failed, 1387 passed, 7 skipped, 125 deselected, 1 xfailed, 9 xpassed in 126.99 seconds ERROR: InvocationError: '/vdsm/tests/py-watch 600 pytest -m not (slow or stress) --durations=10 --cov=vdsm.storage --cov-report=html:htmlcov-storage-py27 storage' Also: storage-py36 create: /vdsm/.tox/storage-py36 ERROR: InterpreterNotFound: python3.6 There are also several other errors.
On Tue, May 8, 2018 at 11:59 AM Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch: https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28 eefbd05f2838d2ec8
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
2. pywatch_test - gdb not installed
We need to add gdb and install install python-debuginfo package on the test images.
self = <pywatch_test.TestPyWatch object at 0x2bbc0d0> def test_timeout(self): rc, out, err = exec_cmd(['./py-watch', '0.1', 'sleep', '10']) assert b'Watched process timed out' in out
assert rc == 128 + signal.SIGTERM
E assert 1 == (128 + 15) E + where 15 = signal.SIGTERM pywatch_test.py:45: AssertionError ------------------------------ Captured log call ------------------------------- cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 59, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1
Nir
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org
--
SANDRO BONAZZOLA
ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> <https://redhat.com/summit>
-- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Thu, May 31, 2018 at 10:34 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
2018-05-16 17:12 GMT+02:00 Sandro Bonazzola <sbonazzo@redhat.com>:
2018-05-16 15:20 GMT+02:00 Nir Soffer <nsoffer@redhat.com>:
Ping
There's ongoing discussion about the real need of having travis testing. What do we test in travis that Jenkins is not testing already?
Travis give us several advantages: - Generally more reliable, less tests marked broken on travis. I think the key is having fresh vm per build. I wish we had that in ovirt CI. - Typically faster. Here is a random example: travis builds: - https://travis-ci.org/oVirt/vdsm/builds/386053193 <https://travis-ci.org/oVirt/vdsm/builds/386053193?utm_source=github_status&utm_medium=notification> 8 minutes gerrit builds: - https://jenkins.ovirt.org/job/vdsm_master_check-patch-fc27-x86_64/3571/ - https://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/23748/ 20 minutes - Need one url to refer to a build, instead of one url per platform - Easier to setup and maintain - one yaml file, one dockerfile per platform - We control the project configuration - Easy to test project configuration - no need to merge to test a change - Simpler, we run in a docker image, updated when the base image updates or manually by building from dockerfiles in vdsm source - Easier to get contributions, lot of people know travis - Anyone can run the tests in travis, no need for whitelists, just fork vdsm and enable travis builds in your private account - Easy to test multiple python versions, even nighties - Travis errors never fail builds, and sometimes even restart a build it understands the difference between "error - could not run the tests", "failure - some tests failed", and "success - no test failed". ovirt CI sometimes fail successful build because it could not clean up after itself. - Running on another platform (Ubuntu) - helps to reveal bugs that sometimes are hidden on CentOS/Fedora. (example: https://github.com/nirs/sanlock/commit/c7fd1b6915c470c6beb191a79c741fb1e6ca9... ) - No vendor lock-in (I don't like to depend on single CI provider) - I enjoy using it, I don't enjoy gerrit and jenkins I know that new ovirt CI solved some of the issues, but nobody sent patches to convert vdsm to the new standard yet. Issues with travis: - no integration with gerrit, so people tend to break it it wold be nice if we could trigger travis build for every patch, and fail the build if travis failed - need too much manual image rebuilding - image rebuilds are slow (cost of a free docker account) - we depend on 3 different services: travis, github, and docker - for single build
Travis CI is still failing. https://travis-ci.org/oVirt/vdsm/jobs/386053194
OK (SKIP=63) mv .coverage .coverage-nose-py2 mv: cannot stat '.coverage': No such file or directory make[1]: *** [check] Error 1
make[1]: Leaving directory `/vdsm/tests'
ERROR: InvocationError: '/usr/bin/make -C tests check'
I think this is the cause: https://github.com/oVirt/vdsm/ commit/6b905c2c134bcf344961d28eefbd05f2838d2ec8 This also seems to be broken when running make locally.
def _run_cmd(cmd, cwd=None): rc, out, err = commands.execCmd(cmd, raw=True, cwd=cwd) if rc != 0:
raise cmdutils.Error(cmd, rc, out, err)
E Error: Command ['/usr/bin/qemu-img', 'info', '--output', 'json', '-U', '/var/tmp/tmp89QFPD/img.img'] failed with rc=1 out='' err="qemu-img: unrecognized option '-U'\nTry 'qemu-img --help' for more information\n" ../lib/vdsm/storage/qemuimg.py:399: Error
This means we are running old qemu, need to update our images. I guess this broken in the centos build?
this happen 3 times in the job causing the result to be:
3 failed, 1387 passed, 7 skipped, 125 deselected, 1 xfailed, 9 xpassed in 126.99 seconds ERROR: InvocationError: '/vdsm/tests/py-watch 600 pytest -m not (slow or stress) --durations=10 --cov=vdsm.storage --cov-report=html:htmlcov-storage-py27 storage'
Also:
storage-py36 create: /vdsm/.tox/storage-py36 ERROR: InterpreterNotFound: python3.6
This is just a warning that python 3.6 is not available, expected when running on centos. Maybe we can eliminate these errors in the CI, since we know what python version should be tested for every image. Nir
There are also several other errors.
On Tue, May 8, 2018 at 11:59 AM Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
2. pywatch_test - gdb not installed
We need to add gdb and install install python-debuginfo package on the test images.
self = <pywatch_test.TestPyWatch object at 0x2bbc0d0> def test_timeout(self): rc, out, err = exec_cmd(['./py-watch', '0.1', 'sleep', '10']) assert b'Watched process timed out' in out
assert rc == 128 + signal.SIGTERM
E assert 1 == (128 + 15) E + where 15 = signal.SIGTERM pywatch_test.py:45: AssertionError ------------------------------ Captured log call ------------------------------- cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 59, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1
Nir
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org
--
SANDRO BONAZZOLA
ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> <https://redhat.com/summit>
--
SANDRO BONAZZOLA
ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

Nir Soffer <nsoffer@redhat.com> writes:
I know that new ovirt CI solved some of the issues, but nobody sent patches to convert vdsm to the new standard yet.
I asked Barak about possible Vdsm conversion at his deep dive and he responded that the new CI may need more real-use testing before projects such as Vdsm switch. It's a couple of weeks since then and if there are no problems with projects that already use it (such as oVirt system tests), maybe we should start working on a conversion patch?

On 31 May 2018 at 13:00, Milan Zamazal <mzamazal@redhat.com> wrote:
Nir Soffer <nsoffer@redhat.com> writes:
I know that new ovirt CI solved some of the issues, but nobody sent patches to convert vdsm to the new standard yet.
I asked Barak about possible Vdsm conversion at his deep dive and he responded that the new CI may need more real-use testing before projects such as Vdsm switch. It's a couple of weeks since then and if there are no problems with projects that already use it (such as oVirt system tests), maybe we should start working on a conversion patch?
Let me clarify what I said back then a bit - since engine and VDSM are the two big flagship projects, I want then to be the last projects to be converted. So its not a matter of time its a matter of converting all the other projects first. Now the thing is, we will not do this on our own - maintainers need to be in the loop as we move projects, so while we do want to be proactive about this, given the other task load we have, things work best when the maintainers actively approach us as some like the ovirt-provider-ovn maintainers did. So please if you're a small-ish project maintainer shoot an email to infra-support@ovirt.org asking your project to be covered and then monitor the jira ticket. The actual setup takes just a few minutes and we will use the Jira ticket to update you on progress and rely any project-specific questions you may have.
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/ message/BI2TZYNRFSYFZEKQJHZMDV5AKY2DF5QZ/
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch: https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/ BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n" Any idea what's that?

On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks! BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug. Can you file a qemu-img bug? Nir

On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks!
BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug.
Can you file a qemu-img bug?
I hope Maor can translate the test to qemu-img speak.

On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com> wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05
f2838d2ec8
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks!
BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug.
Can you file a qemu-img bug?
I hope Maor can translate the test to qemu-img speak.
Opened the following bug: https://bugzilla.redhat.com/1589738

On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk <mlipchuk@redhat.com> wrote:
On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com>
wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks!
BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug.
Can you file a qemu-img bug?
I hope Maor can translate the test to qemu-img speak.
Opened the following bug: https://bugzilla.redhat.com/1589738
Adding qemu-block

Am 11.06.2018 um 12:43 hat Nir Soffer geschrieben:
On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk <mlipchuk@redhat.com> wrote:
On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com>
wrote:
There are several issues:
1. coverage fail after this patch:
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
https://travis-ci.org/oVirt/vdsm/builds/366574414 ... pwd /vdsm/tests ls .cov* ls: cannot access .cov*: No such file or directory make[1]: *** [check] Error 2 make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks!
BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug.
Can you file a qemu-img bug?
I hope Maor can translate the test to qemu-img speak.
Opened the following bug: https://bugzilla.redhat.com/1589738
Adding qemu-block
It's related to the unaligned image size. Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors). Anyway, git bisects points to this one: a290f085901b528265787cd27ebda19c970be4ee is the first bad commit commit a290f085901b528265787cd27ebda19c970be4ee Author: Eric Blake <eblake@redhat.com> Date: Tue Feb 13 14:26:44 2018 -0600 file-posix: Switch to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the file protocol driver accordingly. In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping lseek(), and merely state that all bytes are allocated. We can also drop redundant bounds checks that are already guaranteed by the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> I think the problem is a bit higher up the call stack, but I'm not completely sure yet. It manifests in img_map(), in this code: while (curr.start + curr.length < length) { ... n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE); ret = get_block_status(bs, offset, n, &next); The loop condition is still true because a single byte is left to be processed, but n is aligned down to 0. I'm not sure why the QEMU_ALIGN_DOWN() is even there. Eric, would just removing the QEMU_ALIGN_DOWN() be correct? Kevin

On 06/11/2018 11:19 AM, Kevin Wolf wrote:
Opened the following bug: https://bugzilla.redhat.com/1589738
Adding qemu-block
It's related to the unaligned image size. Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors).
Anyway, git bisects points to this one:
a290f085901b528265787cd27ebda19c970be4ee is the first bad commit commit a290f085901b528265787cd27ebda19c970be4ee Author: Eric Blake <eblake@redhat.com> Date: Tue Feb 13 14:26:44 2018 -0600
file-posix: Switch to .bdrv_co_block_status()
Hmm, definitely fallout from my changes.
I think the problem is a bit higher up the call stack, but I'm not completely sure yet. It manifests in img_map(), in this code:
while (curr.start + curr.length < length) { ... n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE); ret = get_block_status(bs, offset, n, &next);
The loop condition is still true because a single byte is left to be processed, but n is aligned down to 0. I'm not sure why the QEMU_ALIGN_DOWN() is even there.
Eric, would just removing the QEMU_ALIGN_DOWN() be correct?
I think so, but I'm testing it now. If so, the real culprit was that I added the rounding in commit 5e344dd8 when I switched qemu-img.c get_block_status() to take bytes but operate in sectors, but didn't remove it when I later removed sector-based limitations in commit 237d78f8. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org

On Mon, Jun 11, 2018 at 7:20 PM Kevin Wolf <kwolf@redhat.com> wrote:
Am 11.06.2018 um 12:43 hat Nir Soffer geschrieben:
On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk <mlipchuk@redhat.com> wrote:
On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 8, 2018 at 11:59 AM, Nir Soffer <nsoffer@redhat.com>
wrote:
> There are several issues: > > 1. coverage fail after this patch: > >
https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec...
> > https://travis-ci.org/oVirt/vdsm/builds/366574414 > ... > pwd > /vdsm/tests > ls .cov* > ls: cannot access .cov*: No such file or directory > make[1]: *** [check] Error 2 > make[1]: Leaving directory `/vdsm/tests'
That was me, sorry. This should solve it: https://gerrit.ovirt.org/#/c/91925/
Thanks!
BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing with
E Error: Command ['/usr/bin/qemu-img', 'map', '--output', 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 out='' err="qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.\n"
Any idea what's that?
Looks like qemu-img bug.
Can you file a qemu-img bug?
I hope Maor can translate the test to qemu-img speak.
Opened the following bug: https://bugzilla.redhat.com/1589738
Adding qemu-block
It's related to the unaligned image size.
Right, I think we forgot the -1 in the test setup: f.seek(16 * 1024 - 1) f.write(b"x") But it is good that we made this error :-)
Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors).
Right. We indeed forbid uploads on unaligned images in the UI. But we found that qemu-img creates unaligned images: $ qemu-img create -f qcow2 empty.raw 10g Formatting 'empty.raw', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ls -l empty.raw -rw-r--r--. 1 nsoffer nsoffer 196768 Jun 12 01:59 empty.raw $ python -c "print 196768 % 512" 160 $ qemu-img map -f raw --output json test.raw [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 4096, "length": 12288, "depth": 0, "zero": true, "data": false, "offset": 4096}, { "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 16384}, qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed. Aborted (core dumped) The image becomes aligned once you write anything into it, but I still find it strange that qemu-img create such files. Shouldn't it create always 3 complete clusters?
Anyway, git bisects points to this one:
a290f085901b528265787cd27ebda19c970be4ee is the first bad commit commit a290f085901b528265787cd27ebda19c970be4ee Author: Eric Blake <eblake@redhat.com> Date: Tue Feb 13 14:26:44 2018 -0600
file-posix: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards byte-based. Update the file protocol driver accordingly.
In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping lseek(), and merely state that all bytes are allocated.
We can also drop redundant bounds checks that are already guaranteed by the block layer.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I think the problem is a bit higher up the call stack, but I'm not completely sure yet. It manifests in img_map(), in this code:
while (curr.start + curr.length < length) { ... n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE); ret = get_block_status(bs, offset, n, &next);
The loop condition is still true because a single byte is left to be processed, but n is aligned down to 0. I'm not sure why the QEMU_ALIGN_DOWN() is even there.
Eric, would just removing the QEMU_ALIGN_DOWN() be correct?
Kevin

On 06/11/2018 06:07 PM, Nir Soffer wrote:
Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors).
Ideally, a guest will never see an unaligned image size: qemu should round UP when creating an image, and then truncate DOWN when opening an unaligned image that it cannot resize. qemu-img DOES round the guest virtual size up on creation; and thus, for the raw format, a file created by qemu-img will be aligned, and you can only get an unaligned image by manual efforts. But for qcow2 files, the qcow2 spec is clear that an incomplete final cluster (whether or not that last cluster end is also sector-aligned) is well-formed with the tail reading as zeroes; if the last cluster is guest-visible, qemu-img will have rounded it up to sector (but not necessarily cluster) alignment. But if the last cluster is something else, like the refcount, L1, or L2 table, then yes, it is very common that the consumed disk size is not sector aligned. $ qemu-img create -f qcow2 test3 123 Formatting 'test3', fmt=qcow2 size=123 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ qemu-img info test3 image: test3 file format: qcow2 virtual size: 512 (512 bytes) disk size: 196K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false $ ls -l test3 -rw-r--r--. 1 eblake eblake 196616 Jun 11 20:32 test3 $ echo $((196616/512*512)) 196608 So, you can see that the guest-visible size was rounded up (123 became 512), which is now sector-aligned but not cluster-aligned; and that the final cluster occupies only 8 bytes at the moment (the difference between 196616 and 196608). Maybe we should improve qemu to always create cluster-aligned qcow2 images, but that's a bigger audit.
Right. We indeed forbid uploads on unaligned images in the UI.
But we found that qemu-img creates unaligned images:
$ qemu-img create -f qcow2 empty.raw 10g Formatting 'empty.raw', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ls -l empty.raw -rw-r--r--. 1 nsoffer nsoffer 196768 Jun 12 01:59 empty.raw
$ python -c "print 196768 % 512" 160
$ qemu-img map -f raw --output json test.raw
Wait; where did test.raw come from; given that your earlier commands were dealing with empty.raw? And mapping a qcow2 file as though it were raw is indeed confusing (you don't want to do that for a guest using the image).
The image becomes aligned once you write anything into it, but I still find it strange that qemu-img create such files. Shouldn't it create always 3 complete clusters?
Because historical versions of qemu do not, we already have to cope with unaligned images (when opening them as qcow2; but not when transliterating and opening them as raw, since it took this long to find the regression); we could make future versions of qemu always create cluster-aligned images, but you'd still have to cope with existing images that aren't. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
participants (8)
-
Barak Korren
-
Dan Kenigsberg
-
Eric Blake
-
Kevin Wolf
-
Maor Lipchuk
-
Milan Zamazal
-
Nir Soffer
-
Sandro Bonazzola