Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627

10 Jan 2017


      On Tue, Jan 10, 2017 at 3:14 PM, Evgheni Dereveanchin <ederevea@redhat.com>
wrote:
...
Not sure what the initial problem was, but on my laptop (Haswell-MB)
I always use the lowest possible CPU family to ensure it's using
as few features as possible in nested VMs:
;-)

I'm doing the exact opposite, for two reasons:
1. I want the best possible performance. Specifically, I'd like the tests
to run as fast as possible.
2. I'd like to expose as many of the latest features up to the hosts (and
VMs).
...
<cpu mode='custom' match='exact'>
    <model fallback='allow'>core2duo</model>
    <feature policy='require' name='vmx'/>
  </cpu>
Respectively, I use model_Conroe on oVirt side and didn't have
problems with it. Do we really need to use newer CPU families
in our tests?
We probably don't - we used to have Conroe hard-coded in the tests (until I
changed it to use something different).

It does mean it'll be a bit challenging to run on AMD if we decide to go
back to hard-code Conroe.
Y.
...
Regards,
Evgheni Dereveanchin
----- Original Message -----
From: "Milan Zamazal" <mzamazal@redhat.com>
To: "Yaniv Kaul" <ykaul@redhat.com>
Cc: "Lev Veyde" <lveyde@redhat.com>, "Eyal Edri" <eedri@redhat.com>,
"Sandro Bonazzola" <sbonazzo@redhat.com>, "infra" <infra@ovirt.org>, "Gal
Ben Haim" <gbenhaim@redhat.com>, "Martin Polednik" <mpoledni@redhat.com>,
"Evgheni Dereveanchin" <ederevea@redhat.com>
Sent: Tuesday, 10 January, 2017 1:16:09 PM
Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
Yaniv Kaul <ykaul@redhat.com> writes:
...
On Tue, Jan 10, 2017 at 12:45 PM, Milan Zamazal <mzamazal@redhat.com>
wrote:
...
Yaniv Kaul <ykaul@redhat.com> writes:
...
On Tue, Jan 10, 2017 at 12:08 PM, Lev Veyde <lveyde@redhat.com>
wrote:
...
This patch is one that caused it probably:
https://github.com/lago-project/lago/commit/
05ccf7240976f91b0c14d6a1f88016
376d5e87f0
+Milan.
+Martin
...
I must confess that I did not like the patch to begin with...
I did not understand what real problem it solved, but Michal assured
me
there was a real issue.
Yes, there was a real issue with nested virtualization.  Some CPU flags
are missing with Haswell and Lago doesn't run properly.
Is this a libvirt bug btw?
I'm not sure.  When the sets of CPU flags on the host and in the VM with
a copied host CPU are different, it's not clear what's the right thing
to do.
...
Perhaps we need a switch to turn this feature on and off?
I think it would be useful to have a possibility to specify a particular
CPU type in the Lago configuration.
...
...
...
I know have Engine with a Java@ 100% CPU - I hope it's unrelated to
this as
well.
I suggest we do survey to see who doesn't have SandyBridge and above
and
perhaps move higher than Westmere.
We've got Westmere servers in the Brno lab.
Do we know the scope of the problem? Does it happen only on Westmere, for
example?
The problem was with Haswell-noTSX (on my Lenovo, but I think Martin has
observed the same problem too).  We don't know the scope of the problem,
but if we want to be able to run Lago on Brno servers then we must be
Westmere compatible.
...
Y.
...
...
What do we have in CI?
Y.
...
Thanks in advance,
Lev Veyde.
----- Original Message -----
From: "Lev Veyde" <lveyde@redhat.com>
To: "Eyal Edri" <eedri@redhat.com>, sbonazzo@redhat.com
Cc: infra@ovirt.org, "Gal Ben Haim" <gbenhaim@redhat.com>
Sent: Tuesday, January 10, 2017 11:50:05 AM
Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
Hi,
Checked the logs and see the following:
02:42:05 [WARNING] OVF does not contain a valid image description,
using
...
default.
02:42:05           The following CPU types are supported by this
host:
02:42:05                 - model_Westmere: Intel Westmere Family
02:42:05                 - model_Nehalem: Intel Nehalem Family
02:42:05                 - model_Penryn: Intel Penryn Family
02:42:05                 - model_Conroe: Intel Conroe Family
02:42:05 [ ERROR ] Failed to execute stage 'Environment
customization':
Invalid CPU type specified: model_SandyBridge
Barak thinks that it may be related to the recent update in the Lago
code.
Gal, any idea ?
Thanks in advance,
Lev Veyde.
----- Original Message -----
From: jenkins@jenkins.phx.ovirt.org
To: sbonazzo@redhat.com, infra@ovirt.org, lveyde@redhat.com
Sent: Tuesday, January 10, 2017 4:42:14 AM
Subject: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
See <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
changes
...
Changes:
[Lev Veyde] Mask NetworkManager service
[Eyal Edri] fix imgbased job names in jjb
[Daniel Belenky] fixing jjb version for cockpit-ovirt
[Gil Shinar] Add some more 4.1 to experimental
[Juan Hernandez] Don't build RPMs for the JBoss modules Maven plugin
[pkliczewski] jsonrpc 4.1 branch
------------------------------------------
[...truncated 749 lines...]
Finish: shell
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@ Tue Jan 10 02:42:07 UTC 2017 automation/he_basic_suite_4.0.sh
chroot
...
finished
@@      took 360 seconds
@@      rc = 1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
========== Scrubbing chroot
    mock \
        --configdir="<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/ovirt-system-tests"> \
        --root="mocker-epel-7-x86_64.el7" \
        --resultdir="./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.
el7.scrub"
\
        --scrub=chroot
WARNING: Could not find required logging config file: <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/logging.ini.> Using default...
INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
Start: init plugins
INFO: selinux enabled
Finish: init plugins
Start: run
Start: scrub ['chroot']
INFO: scrubbing chroot for mocker-epel-7-x86_64.el7
Finish: scrub ['chroot']
Finish: run
Scrub chroot took 6 seconds
============================
##########################################################
## Tue Jan 10 02:42:13 UTC 2017 Finished env: el7:epel-7-x86_64
##      took 366 seconds
##      rc = 1
##########################################################
find: ‘logs’: No such file or directory
No log files found, check command output
##!########################################################
Collecting mock logs
‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.clean_rpmdb’ ->
‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.clean_rpmdb’
‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.he_basic_suite_4.0.sh
’
->
‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.he_
basic_suite_4.0.sh’
‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.init’ ->
‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.init’
##########################################################
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for :.* : True
Logical operation result is TRUE
Running script  : #!/bin/bash -xe
echo 'shell_scripts/system_tests.collect_logs.sh'
#
# Required jjb vars:
#    version
#
VERSION=4.0
SUITE_TYPE=
WORKSPACE="$PWD"
OVIRT_SUITE="$SUITE_TYPE_suite_$VERSION"
TESTS_LOGS="$WORKSPACE/ovirt-system-tests/exported-artifacts"
rm -rf "$WORKSPACE/exported-artifacts"
mkdir -p "$WORKSPACE/exported-artifacts"
if [[ -d "$TESTS_LOGS" ]]; then
    mv "$TESTS_LOGS/"* "$WORKSPACE/exported-artifacts/"
fi
[ovirt_4.0_he-system-tests] $ /bin/bash -xe
/tmp/hudson302101162661598371.
sh
+ echo shell_scripts/system_tests.collect_logs.sh
shell_scripts/system_tests.collect_logs.sh
+ VERSION=4.0
+ SUITE_TYPE=
+ WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/
...
+ OVIRT_SUITE=4.0
+ TESTS_LOGS=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/ovirt-system-tests/exported-artifacts>
+ rm -rf <http://jenkins.ovirt.org/job/
ovirt_4.0_he-system-tests/627/
artifact/exported-artifacts>
+ mkdir -p <http://jenkins.ovirt.org/job/
ovirt_4.0_he-system-tests/627/
artifact/exported-artifacts>
+ [[ -d <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/exported-artifacts> ]]
+ mv <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/exported-artifacts/failure_msg.txt> <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/exported-artifacts/lago_logs> <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/exported-artifacts/mock_logs> <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
artifact/exported-artifacts/>
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Match found for :.* : True
Logical operation result is TRUE
Running script  : #!/bin/bash -x
echo "shell-scripts/mock_cleanup.sh"
# Make clear this is the cleanup, helps reading the jenkins logs
cat <<EOC
____________________________________________________________

...
...
...
...
############################################################
###########
#
 #
#                               CLEANUP
 #
#
 #
############################################################
###########
EOC
shopt -s nullglob
WORKSPACE="${WORKSPACE:-$PWD}"
UMOUNT_RETRIES="${UMOUNT_RETRIES:-3}"
UMOUNT_RETRY_DELAY="${UMOUNT_RETRY_DELAY:-1s}"
safe_umount() {
    local mount="${1:?}"
    local attempt
    for ((attempt=0 ; attempt < $UMOUNT_RETRIES ; attempt++)); do
        # If this is not the 1st time through the loop, Sleep a
while to
let
        # the problem "solve itself"
        [[ attempt > 0 ]] && sleep "$UMOUNT_RETRY_DELAY"
        # Try to umount
        sudo umount --lazy "$mount" && return 0
        # See if the mount is already not there despite failing
        findmnt --kernel --first "$mount" > /dev/null && return 0
    done
    echo "ERROR:  Failed to umount $mount."
    return 1
}
# restore the permissions in the working dir, as sometimes it leaves
files
# owned by root and then the 'cleanup workspace' from jenkins job
fails
to
# clean and breaks the jobs
sudo chown -R "$USER" "$WORKSPACE"
# stop any processes running inside the chroot
failed=false
mock_confs=("$WORKSPACE"/*/mocker*)
# Clean current jobs mockroot if any
for mock_conf_file in "${mock_confs[@]}"; do
    [[ "$mock_conf_file" ]] || continue
    echo "Cleaning up mock $mock_conf"
    mock_root="${mock_conf_file##*/}"
    mock_root="${mock_root%.*}"
    my_mock="/usr/bin/mock"
    my_mock+=" --configdir=${mock_conf_file%/*}"
    my_mock+=" --root=${mock_root}"
    my_mock+=" --resultdir=$WORKSPACE"
#TODO: investigate why mock --clean fails to umount certain dirs
sometimes,
    #so we can use it instead of manually doing all this.
    echo "Killing all mock orphan processes, if any."
    $my_mock \
        --orphanskill \
    || {
        echo "ERROR:  Failed to kill orphans on $chroot."
        failed=true
    }
mock_root="$(\
        grep \
            -Po "(?<=config_opts\['root'\] = ')[^']*" \
            "$mock_conf_file" \
    )" || :
    [[ "$mock_root" ]] || continue
    mounts=($(mount | awk '{print $3}' | grep "$mock_root")) || :
    if [[ "$mounts" ]]; then
        echo "Found mounted dirs inside the chroot $chroot. Trying to
umount."
    fi
    for mount in "${mounts[@]}"; do
        safe_umount "$mount" || failed=true
    done
done
# Clean any leftover chroot from other jobs
for mock_root in /var/lib/mock/*; do
    this_chroot_failed=false
    mounts=($(cut -d\  -f2 /proc/mounts | grep "$mock_root" | sort
-r))
||
:
    if [[ "$mounts" ]]; then
        echo "Found mounted dirs inside the chroot $mock_root." \
             "Trying to umount."
    fi
    for mount in "${mounts[@]}"; do
        safe_umount "$mount" && continue
        # If we got here, we failed $UMOUNT_RETRIES attempts so we
should
make
        # noise
        failed=true
        this_chroot_failed=true
    done
    if ! $this_chroot_failed; then
        sudo rm -rf "$mock_root"
    fi
done
# remove mock caches that are older then 2 days:
find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2
-print0
| \
    xargs -0 -tr sudo rm -rf
# We make no effort to leave around caches that may still be in use
because
# packages installed in them may go out of date, so may as well
recreate
them
# Drop all left over libvirt domains
for UUID in $(virsh list --all --uuid); do
  virsh destroy $UUID || :
  sleep 2
  virsh undefine --remove-all-storage --storage vda
--snapshots-metadata
$UUID || :
done
if $failed; then
    echo "Cleanup script failed, propegating failure to job"
    exit 1
fi
[ovirt_4.0_he-system-tests] $ /bin/bash -x /tmp/
hudson1888216492513466503.sh
+ echo shell-scripts/mock_cleanup.sh
shell-scripts/mock_cleanup.sh
+ cat
____________________________________________________________

...
...
...
...
############################################################
###########
#
 #
#                               CLEANUP
 #
#
 #
############################################################
###########
+ shopt -s nullglob
+ WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/
...
+ UMOUNT_RETRIES=3
+ UMOUNT_RETRY_DELAY=1s
+ sudo chown -R jenkins <http://jenkins.ovirt.org/job/
ovirt_4.0_he-system-tests/ws/>
+ failed=false
+ mock_confs=("$WORKSPACE"/*/mocker*)
+ for mock_conf_file in '"${mock_confs[@]}"'
+ [[ -n <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg> ]]
+ echo 'Cleaning up mock '
Cleaning up mock
+ mock_root=mocker-epel-7-x86_64.el7.cfg
+ mock_root=mocker-epel-7-x86_64.el7
+ my_mock=/usr/bin/mock
+ my_mock+=' --configdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/ovirt-system-tests'>
+ my_mock+=' --root=mocker-epel-7-x86_64.el7'
+ my_mock+=' --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/'>
+ echo 'Killing all mock orphan processes, if any.'
Killing all mock orphan processes, if any.
+ /usr/bin/mock --configdir=<http://jenkins.
ovirt.org/job/ovirt_4.0_he-
system-tests/ws/ovirt-system-tests> --root=mocker-epel-7-x86_64.el7
--resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
system-tests/ws/
...
--orphanskill
WARNING: Could not find required logging config file: <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/logging.ini.> Using default...
INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
Start: init plugins
INFO: selinux enabled
Finish: init plugins
Start: run
Finish: run
++ grep -Po '(?<=config_opts\['\''root'\''\] = '\'')[^'\'']*' <
http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg>
+ mock_root=epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
+ [[ -n epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b ]]
+ mounts=($(mount | awk '{print $3}' | grep "$mock_root"))
++ mount
++ awk '{print $3}'
++ grep epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
+ :
+ [[ -n '' ]]
+ find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2
-print0
+ xargs -0 -tr sudo rm -rf
++ virsh list --all --uuid
+ false
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 1
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report
files were found. Configuration error?
Archiving artifacts