
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch: http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/ It fails only in some testing parts, while in others it succeeds: http://jenkins.ovirt.org/job/vdsm_4.2_check-patch-el7-x86_64/553/ Nir, could you please take a look whether we need to backport your fix from master and/or fix anything else (such as ensuring the right QEMU version is used)? Thanks, Milan

Ok, I'll take a look. On Fri, Dec 7, 2018, 15:19 Milan Zamazal <mzamazal@redhat.com wrote:
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch: http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/ It fails only in some testing parts, while in others it succeeds: http://jenkins.ovirt.org/job/vdsm_4.2_check-patch-el7-x86_64/553/
Nir, could you please take a look whether we need to backport your fix from master and/or fix anything else (such as ensuring the right QEMU version is used)?
Thanks, Milan

On Fri, Dec 7, 2018 at 3:19 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch: http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/
We have several issues: *00:12:02.358* def verify_pattern(path, format, offset=512, len=1024, pattern=5):*00:12:02.358* read_cmd = 'read -P %d -s 0 -l %d %d %d' % (pattern, len, offset, len)*00:12:02.358* cmd = ['qemu-io', '-f', format, '-c', read_cmd, path]*00:12:02.358* rc, out, err = commands.execCmd(cmd, raw=True)*00:12:02.358* if rc != 0 or err != b"":*00:12:02.358* > raise cmdutils.Error(cmd, rc, out, err)*00:12:02.358* E Error: Command ['qemu-io', '-f', 'qcow2', '-c', 'read -P 240 -s 0 -l 1024 0 1024', '/var/tmp/tmpbxNeWA/mnt/blockSD/3fd2ac6f-929b-425a-a7d7-3b8519dedc91/images/e438933b-7d87-4fd6-a2fa-e53cb2e26a19/8ff32e26-b426-4ebf-a8cb-48353cf898e7'] failed with rc=1 out='Pattern verification failed at offset 0, 1024 bytes\nread 1024/1024 bytes at offset 0\n1 KiB, 1 ops; 0.0000 sec (17.756 MiB/sec and 18181.8182 ops/sec)\n' err='' This is the same issue fixed in master by: commit a341dcc3ab5c36afc7a536d3d4794c37fedcda6d Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Nov 25 22:48:54 2018 +0200 qemuio: Support qemu 2.12 error handling qemu-io 2.10 was exiting with zero exit code and "Pattern verification" error in stdout. In 2.12, non-zero code is returned when pattern verification fails. This sounds better, but the exit code is not specific enough to detect the verification error. Check for pattern verification error before testing exit code, so we raise (expected) verification error in this case, and raise a generic error in verification error was not detected. Change-Id: Id108d2b780b6862bed3bb7088908d911788b544d Signed-off-by: Nir Soffer <nsoffer@redhat.com> Backporting this patch should fix this issue. *00:12:02.352* Error: Command ['/usr/bin/taskset', '--cpu-list', '0-1', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/89432c41-c8c9-4d4e-ab86-ef892806162b/9dbca6ff-412a-4c85-a4eb-51dfa4fadbd8', '-O', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/51e0d2ba-0cb0-421c-901b-4842e8612ca4/3b857f90-3062-4698-9920-2c4849b7256f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n") We found this error 6 month ago on travis, and discussed this here: https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/UGGNDS4SVJ2GHAQ... We filed https://bugzilla.redhat.com/1589738 Then we fixed the issue in the tests by using aligned images, since we always use aligned images in the real system. commit 0cde33b4ecfcc55dee3f441749fc72c106b4dd77 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Jun 24 22:14:57 2018 +0300 tests: Use aligned image size The qcow2 tests used unaligned image size, which is not realistic test data, and triggered a segfault in qemu-img on Fedora 28. While exposing bugs in qemu-img is nice, this is not the place for such test. Change-Id: I2dca62187dce72638c3f92a4694b8f287d7a048f Signed-off-by: Nir Soffer <nsoffer@redhat.com> commit 3e43fff81f51002347277df83ecfff61d5c8a4a5 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Jun 24 22:01:21 2018 +0300 tests: Use padded VM configuration volume The tests was using unpadded vm configuration volume. This causes qemu-img to segfault in Fedora 28. Since we always pad vm configuration volumes in production code, we don't need to test unpadded values. Change-Id: I74cb26dcc0c8633fa1de5f689d2e8ffa98013d48 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Backporting these patches should fix the failing tests. The qemu bug was fixed few month after we reported it. The fix is available in: qemu-img.x86_64 2:2.11.2-1.fc28 qemu-img.x86_64 2:3.0.0-1.fc28 (from virt-preview) Are we using these versions in 4.2 CI? Do we have correct repos? Strangely I see that we have also this bug, which was probably tested on RHEL, but there is on info on the versions tested. https://bugzilla.redhat.com/1649788 So it seems that this bug sneaked into some downstream versions. Nir

Nir Soffer <nsoffer@redhat.com> writes:
On Fri, Dec 7, 2018 at 3:19 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Hi, I've seen a CI failure on qemu-io tests in ovirt-4.2 branch: http://jenkins.ovirt.org/job/vdsm_standard-check-patch/602/
We have several issues:
Thank you Nir for looking.
*00:12:02.358* def verify_pattern(path, format, offset=512, len=1024, pattern=5):*00:12:02.358* read_cmd = 'read -P %d -s 0 -l %d %d %d' % (pattern, len, offset, len)*00:12:02.358* cmd = ['qemu-io', '-f', format, '-c', read_cmd, path]*00:12:02.358* rc, out, err = commands.execCmd(cmd, raw=True)*00:12:02.358* if rc != 0 or err != b"":*00:12:02.358* > raise cmdutils.Error(cmd, rc, out, err)*00:12:02.358* E Error: Command ['qemu-io', '-f', 'qcow2', '-c', 'read -P 240 -s 0 -l 1024 0 1024', '/var/tmp/tmpbxNeWA/mnt/blockSD/3fd2ac6f-929b-425a-a7d7-3b8519dedc91/images/e438933b-7d87-4fd6-a2fa-e53cb2e26a19/8ff32e26-b426-4ebf-a8cb-48353cf898e7'] failed with rc=1 out='Pattern verification failed at offset 0, 1024 bytes\nread 1024/1024 bytes at offset 0\n1 KiB, 1 ops; 0.0000 sec (17.756 MiB/sec and 18181.8182 ops/sec)\n' err=''
This is the same issue fixed in master by:
commit a341dcc3ab5c36afc7a536d3d4794c37fedcda6d Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Nov 25 22:48:54 2018 +0200
qemuio: Support qemu 2.12 error handling
qemu-io 2.10 was exiting with zero exit code and "Pattern verification" error in stdout. In 2.12, non-zero code is returned when pattern verification fails. This sounds better, but the exit code is not specific enough to detect the verification error.
Check for pattern verification error before testing exit code, so we raise (expected) verification error in this case, and raise a generic error in verification error was not detected.
Change-Id: Id108d2b780b6862bed3bb7088908d911788b544d Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Backporting this patch should fix this issue.
Backported, fixed.
*00:12:02.352* Error: Command ['/usr/bin/taskset', '--cpu-list', '0-1', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/89432c41-c8c9-4d4e-ab86-ef892806162b/9dbca6ff-412a-4c85-a4eb-51dfa4fadbd8', '-O', 'raw', '/var/tmp/tmpmwqNI9/9a7a895a-eaed-4de0-97b5-05009e336853/15ca6f4c-2753-450f-8e4b-1e3278fc664e/images/51e0d2ba-0cb0-421c-901b-4842e8612ca4/3b857f90-3062-4698-9920-2c4849b7256f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")
We found this error 6 month ago on travis, and discussed this here: https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/UGGNDS4SVJ2GHAQ...
We filed https://bugzilla.redhat.com/1589738
Then we fixed the issue in the tests by using aligned images, since we always use aligned images in the real system.
commit 0cde33b4ecfcc55dee3f441749fc72c106b4dd77 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Jun 24 22:14:57 2018 +0300
tests: Use aligned image size
The qcow2 tests used unaligned image size, which is not realistic test data, and triggered a segfault in qemu-img on Fedora 28. While exposing bugs in qemu-img is nice, this is not the place for such test.
Change-Id: I2dca62187dce72638c3f92a4694b8f287d7a048f Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Not necessarily needed, but I'd take it anyway, see my comment in https://gerrit.ovirt.org/96224 .
commit 3e43fff81f51002347277df83ecfff61d5c8a4a5 Author: Nir Soffer <nsoffer@redhat.com> Date: Sun Jun 24 22:01:21 2018 +0300
tests: Use padded VM configuration volume
The tests was using unpadded vm configuration volume. This causes qemu-img to segfault in Fedora 28. Since we always pad vm configuration volumes in production code, we don't need to test unpadded values.
Change-Id: I74cb26dcc0c8633fa1de5f689d2e8ffa98013d48 Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Backporting these patches should fix the failing tests.
The second one backported, CI fixed.
The qemu bug was fixed few month after we reported it.
The fix is available in: qemu-img.x86_64 2:2.11.2-1.fc28 qemu-img.x86_64 2:3.0.0-1.fc28 (from virt-preview)
Are we using these versions in 4.2 CI? Do we have correct repos?
I think so. 4.2 CI should be using 2.12.0-18.el7_6.1.1 from http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/. And that it fails without "qemuio: Support qemu 2.12 error handling" confirms that 2.12 is indeed used.
Strangely I see that we have also this bug, which was probably tested on RHEL, but there is on info on the versions tested. https://bugzilla.redhat.com/1649788
So it seems that this bug sneaked into some downstream versions.
Nir
participants (2)
-
Milan Zamazal
-
Nir Soffer