On Fri, Dec 7, 2018 at 3:19 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch:
http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/

We have several issues:

00:12:02.358     def verify_pattern(path, format, offset=512, len=1024, pattern=5):
00:12:02.358         read_cmd = 'read -P %d -s 0 -l %d %d %d' % (pattern, len, offset, len)
00:12:02.358         cmd = ['qemu-io', '-f', format, '-c', read_cmd, path]
00:12:02.358         rc, out, err = commands.execCmd(cmd, raw=True)
00:12:02.358         if rc != 0 or err != b"":
00:12:02.358 >           raise cmdutils.Error(cmd, rc, out, err)
00:12:02.358 E           Error: Command ['qemu-io', '-f', 'qcow2', '-c', 'read -P 240 -s 0 -l 1024 0 1024', '/var/tmp/tmpbxNeWA/mnt/blockSD/3fd2ac6f-929b-425a-a7d7-3b8519dedc91/images/e438933b-7d87-4fd6-a2fa-e53cb2e26a19/8ff32e26-b426-4ebf-a8cb-48353cf898e7'] failed with rc=1 out='Pattern verification failed at offset 0, 1024 bytes\nread 1024/1024 bytes at offset 0\n1 KiB, 1 ops; 0.0000 sec (17.756 MiB/sec and 18181.8182 ops/sec)\n' err=''

This is the same issue fixed in master by:

commit a341dcc3ab5c36afc7a536d3d4794c37fedcda6d
Author: Nir Soffer <nsoffer@redhat.com>
Date:   Sun Nov 25 22:48:54 2018 +0200

    qemuio: Support qemu 2.12 error handling
    
    qemu-io 2.10 was exiting with zero exit code and "Pattern verification"
    error in stdout. In 2.12, non-zero code is returned when pattern
    verification fails. This sounds better, but the exit code is not
    specific enough to detect the verification error.
    
    Check for pattern verification error before testing exit code, so we
    raise (expected) verification error in this case, and raise a generic
    error in verification error was not detected.
    
    Change-Id: Id108d2b780b6862bed3bb7088908d911788b544d
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Backporting this patch should fix this issue.

00:12:02.352 Error: Command ['/usr/bin/taskset', '--cpu-list', '0-1', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/89432c41-c8c9-4d4e-ab86-ef892806162b/9dbca6ff-412a-4c85-a4eb-51dfa4fadbd8', '-O', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/51e0d2ba-0cb0-421c-901b-4842e8612ca4/3b857f90-3062-4698-9920-2c4849b7256f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")

We found this error 6 month ago on travis, and discussed this here:

We filed 

Then we fixed the issue in the tests by using aligned images, since we always use
aligned images in the real system.

commit 0cde33b4ecfcc55dee3f441749fc72c106b4dd77
Author: Nir Soffer <nsoffer@redhat.com>
Date:   Sun Jun 24 22:14:57 2018 +0300

    tests: Use aligned image size
    
    The qcow2 tests used unaligned image size, which is not realistic test
    data, and triggered a segfault in qemu-img on Fedora 28. While exposing
    bugs in qemu-img is nice, this is not the place for such test.
    
    Change-Id: I2dca62187dce72638c3f92a4694b8f287d7a048f
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>

commit 3e43fff81f51002347277df83ecfff61d5c8a4a5
Author: Nir Soffer <nsoffer@redhat.com>
Date:   Sun Jun 24 22:01:21 2018 +0300

    tests: Use padded VM configuration volume
    
    The tests was using unpadded vm configuration volume. This causes
    qemu-img to segfault in Fedora 28. Since we always pad vm configuration
    volumes in production code, we don't need to test unpadded values.
    
    Change-Id: I74cb26dcc0c8633fa1de5f689d2e8ffa98013d48
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Backporting these patches should fix the failing tests.

The qemu bug was fixed few month after we reported it.

The fix is available in:
qemu-img.x86_64 2:2.11.2-1.fc28
qemu-img.x86_64 2:3.0.0-1.fc28 (from virt-preview)

Are we using these versions in 4.2 CI? Do we have correct repos?

Strangely I see that we have also this bug, which was probably
tested on RHEL, but there is on info on the versions tested.

So it seems that this bug sneaked into some downstream versions.

Nir