On Fri, Dec 7, 2018 at 3:19 PM Milan Zamazal <mzamazal(a)redhat.com> wrote:
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch:
http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/
We have several issues:
*00:12:02.358* def verify_pattern(path, format, offset=512,
len=1024, pattern=5):*00:12:02.358* read_cmd = 'read -P %d -s
0 -l %d %d %d' % (pattern, len, offset, len)*00:12:02.358* cmd
= ['qemu-io', '-f', format, '-c', read_cmd, path]*00:12:02.358*
rc, out, err = commands.execCmd(cmd, raw=True)*00:12:02.358*
if rc != 0 or err != b"":*00:12:02.358* > raise
cmdutils.Error(cmd, rc, out, err)*00:12:02.358* E Error:
Command ['qemu-io', '-f', 'qcow2', '-c', 'read -P 240
-s 0 -l 1024 0
1024',
'/var/tmp/tmpbxNeWA/mnt/blockSD/3fd2ac6f-929b-425a-a7d7-3b8519dedc91/images/e438933b-7d87-4fd6-a2fa-e53cb2e26a19/8ff32e26-b426-4ebf-a8cb-48353cf898e7']
failed with rc=1 out='Pattern verification failed at offset 0, 1024
bytes\nread 1024/1024 bytes at offset 0\n1 KiB, 1 ops; 0.0000 sec
(17.756 MiB/sec and 18181.8182 ops/sec)\n' err=''
This is the same issue fixed in master by:
commit a341dcc3ab5c36afc7a536d3d4794c37fedcda6d
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Sun Nov 25 22:48:54 2018 +0200
qemuio: Support qemu 2.12 error handling
qemu-io 2.10 was exiting with zero exit code and "Pattern verification"
error in stdout. In 2.12, non-zero code is returned when pattern
verification fails. This sounds better, but the exit code is not
specific enough to detect the verification error.
Check for pattern verification error before testing exit code, so we
raise (expected) verification error in this case, and raise a generic
error in verification error was not detected.
Change-Id: Id108d2b780b6862bed3bb7088908d911788b544d
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
Backporting this patch should fix this issue.
*00:12:02.352* Error: Command ['/usr/bin/taskset', '--cpu-list',
'0-1', '/usr/bin/nice', '-n', '19',
'/usr/bin/ionice', '-c', '3',
'/usr/bin/qemu-img', 'convert', '-p', '-t',
'none', '-T', 'none',
'-f', 'raw',
'/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/89432c41-c8c9-4d4e-ab86-ef892806162b/9dbca6ff-412a-4c85-a4eb-51dfa4fadbd8',
'-O', 'raw',
'/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/51e0d2ba-0cb0-421c-901b-4842e8612ca4/3b857f90-3062-4698-9920-2c4849b7256f']
failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134:
bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0)
&& align > offset - aligned_offset\' failed.\n")
We found this error 6 month ago on travis, and discussed this here:
https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/UGGNDS4SVJ2G...
We filed
https://bugzilla.redhat.com/1589738
Then we fixed the issue in the tests by using aligned images, since we
always use
aligned images in the real system.
commit 0cde33b4ecfcc55dee3f441749fc72c106b4dd77
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Sun Jun 24 22:14:57 2018 +0300
tests: Use aligned image size
The qcow2 tests used unaligned image size, which is not realistic test
data, and triggered a segfault in qemu-img on Fedora 28. While exposing
bugs in qemu-img is nice, this is not the place for such test.
Change-Id: I2dca62187dce72638c3f92a4694b8f287d7a048f
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
commit 3e43fff81f51002347277df83ecfff61d5c8a4a5
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Sun Jun 24 22:01:21 2018 +0300
tests: Use padded VM configuration volume
The tests was using unpadded vm configuration volume. This causes
qemu-img to segfault in Fedora 28. Since we always pad vm configuration
volumes in production code, we don't need to test unpadded values.
Change-Id: I74cb26dcc0c8633fa1de5f689d2e8ffa98013d48
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
Backporting these patches should fix the failing tests.
The qemu bug was fixed few month after we reported it.
The fix is available in:
qemu-img.x86_64 2:2.11.2-1.fc28
qemu-img.x86_64 2:3.0.0-1.fc28 (from virt-preview)
Are we using these versions in 4.2 CI? Do we have correct repos?
Strangely I see that we have also this bug, which was probably
tested on RHEL, but there is on info on the versions tested.
https://bugzilla.redhat.com/1649788
So it seems that this bug sneaked into some downstream versions.
Nir