Build failed in Jenkins: ovirt_4.0_he-system-tests #627

Tue Jan 10 16:18:16 UTC 2017

Hi Yaniv,

Sent a pull request to fix the issue, at least for our tests, by returning the SandyBridge CPU family:
https://github.com/lago-project/lago/pull/424/

Thanks in advance,
Lev Veyde.

----- Original Message -----
From: "Yaniv Kaul" <ykaul at redhat.com>
To: "Evgheni Dereveanchin" <ederevea at redhat.com>
Cc: "Milan Zamazal" <mzamazal at redhat.com>, "Lev Veyde" <lveyde at redhat.com>, "Eyal Edri" <eedri at redhat.com>, "Sandro Bonazzola" <sbonazzo at redhat.com>, "infra" <infra at ovirt.org>, "Gal Ben Haim" <gbenhaim at redhat.com>, "Martin Polednik" <mpoledni at redhat.com>
Sent: Tuesday, January 10, 2017 3:22:07 PM
Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627

On Tue, Jan 10, 2017 at 3:14 PM, Evgheni Dereveanchin <ederevea at redhat.com>
wrote:

> Not sure what the initial problem was, but on my laptop (Haswell-MB)
> I always use the lowest possible CPU family to ensure it's using
> as few features as possible in nested VMs:
>

;-)

I'm doing the exact opposite, for two reasons:
1. I want the best possible performance. Specifically, I'd like the tests
to run as fast as possible.
2. I'd like to expose as many of the latest features up to the hosts (and
VMs).

>   <cpu mode='custom' match='exact'>
>     <model fallback='allow'>core2duo</model>
>     <feature policy='require' name='vmx'/>
>   </cpu>
>
> Respectively, I use model_Conroe on oVirt side and didn't have
> problems with it. Do we really need to use newer CPU families
> in our tests?
>

We probably don't - we used to have Conroe hard-coded in the tests (until I
changed it to use something different).

It does mean it'll be a bit challenging to run on AMD if we decide to go
back to hard-code Conroe.
Y.

>
> Regards,
> Evgheni Dereveanchin
>
> ----- Original Message -----
> From: "Milan Zamazal" <mzamazal at redhat.com>
> To: "Yaniv Kaul" <ykaul at redhat.com>
> Cc: "Lev Veyde" <lveyde at redhat.com>, "Eyal Edri" <eedri at redhat.com>,
> "Sandro Bonazzola" <sbonazzo at redhat.com>, "infra" <infra at ovirt.org>, "Gal
> Ben Haim" <gbenhaim at redhat.com>, "Martin Polednik" <mpoledni at redhat.com>,
> "Evgheni Dereveanchin" <ederevea at redhat.com>
> Sent: Tuesday, 10 January, 2017 1:16:09 PM
> Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
>
> Yaniv Kaul <ykaul at redhat.com> writes:
>
> > On Tue, Jan 10, 2017 at 12:45 PM, Milan Zamazal <mzamazal at redhat.com>
> wrote:
> >
> >> Yaniv Kaul <ykaul at redhat.com> writes:
> >>
> >> > On Tue, Jan 10, 2017 at 12:08 PM, Lev Veyde <lveyde at redhat.com>
> wrote:
> >> >
> >> >> This patch is one that caused it probably:
> >> >> https://github.com/lago-project/lago/commit/
> >> 05ccf7240976f91b0c14d6a1f88016
> >> >> 376d5e87f0
> >> >
> >> >
> >> > +Milan.
> >>
> >> +Martin
> >>
> >> > I must confess that I did not like the patch to begin with...
> >> > I did not understand what real problem it solved, but Michal assured
> me
> >> > there was a real issue.
> >>
> >> Yes, there was a real issue with nested virtualization.  Some CPU flags
> >> are missing with Haswell and Lago doesn't run properly.
> >>
> >
> > Is this a libvirt bug btw?
>
> I'm not sure.  When the sets of CPU flags on the host and in the VM with
> a copied host CPU are different, it's not clear what's the right thing
> to do.
>
> > Perhaps we need a switch to turn this feature on and off?
>
> I think it would be useful to have a possibility to specify a particular
> CPU type in the Lago configuration.
>
> >> > I know have Engine with a Java@ 100% CPU - I hope it's unrelated to
> >> this as
> >> > well.
> >> >
> >> > I suggest we do survey to see who doesn't have SandyBridge and above
> and
> >> > perhaps move higher than Westmere.
> >>
> >> We've got Westmere servers in the Brno lab.
> >>
> >
> > Do we know the scope of the problem? Does it happen only on Westmere, for
> > example?
>
> The problem was with Haswell-noTSX (on my Lenovo, but I think Martin has
> observed the same problem too).  We don't know the scope of the problem,
> but if we want to be able to run Lago on Brno servers then we must be
> Westmere compatible.
>
> >  Y.
> >
> >
> >> > What do we have in CI?
> >> > Y.
> >> >
> >> >
> >> >>
> >> >> Thanks in advance,
> >> >> Lev Veyde.
> >> >>
> >> >> ----- Original Message -----
> >> >> From: "Lev Veyde" <lveyde at redhat.com>
> >> >> To: "Eyal Edri" <eedri at redhat.com>, sbonazzo at redhat.com
> >> >> Cc: infra at ovirt.org, "Gal Ben Haim" <gbenhaim at redhat.com>
> >> >> Sent: Tuesday, January 10, 2017 11:50:05 AM
> >> >> Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
> >> >>
> >> >> Hi,
> >> >>
> >> >> Checked the logs and see the following:
> >> >>
> >> >> 02:42:05 [WARNING] OVF does not contain a valid image description,
> using
> >> >> default.
> >> >> 02:42:05           The following CPU types are supported by this
> host:
> >> >> 02:42:05                 - model_Westmere: Intel Westmere Family
> >> >> 02:42:05                 - model_Nehalem: Intel Nehalem Family
> >> >> 02:42:05                 - model_Penryn: Intel Penryn Family
> >> >> 02:42:05                 - model_Conroe: Intel Conroe Family
> >> >> 02:42:05 [ ERROR ] Failed to execute stage 'Environment
> customization':
> >> >> Invalid CPU type specified: model_SandyBridge
> >> >>
> >> >> Barak thinks that it may be related to the recent update in the Lago
> >> code.
> >> >>
> >> >> Gal, any idea ?
> >> >>
> >> >> Thanks in advance,
> >> >> Lev Veyde.
> >> >>
> >> >> ----- Original Message -----
> >> >> From: jenkins at jenkins.phx.ovirt.org
> >> >> To: sbonazzo at redhat.com, infra at ovirt.org, lveyde at redhat.com
> >> >> Sent: Tuesday, January 10, 2017 4:42:14 AM
> >> >> Subject: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
> >> >>
> >> >> See <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
> changes
> >> >
> >> >>
> >> >> Changes:
> >> >>
> >> >> [Lev Veyde] Mask NetworkManager service
> >> >>
> >> >> [Eyal Edri] fix imgbased job names in jjb
> >> >>
> >> >> [Daniel Belenky] fixing jjb version for cockpit-ovirt
> >> >>
> >> >> [Gil Shinar] Add some more 4.1 to experimental
> >> >>
> >> >> [Juan Hernandez] Don't build RPMs for the JBoss modules Maven plugin
> >> >>
> >> >> [pkliczewski] jsonrpc 4.1 branch
> >> >>
> >> >> ------------------------------------------
> >> >> [...truncated 749 lines...]
> >> >> Finish: shell
> >> >> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> >> >> @@ Tue Jan 10 02:42:07 UTC 2017 automation/he_basic_suite_4.0.sh
> chroot
> >> >> finished
> >> >> @@      took 360 seconds
> >> >> @@      rc = 1
> >> >> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> >> >> ========== Scrubbing chroot
> >> >>     mock \
> >> >>         --configdir="<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> >> >> system-tests/ws/ovirt-system-tests"> \
> >> >>         --root="mocker-epel-7-x86_64.el7" \
> >> >>         --resultdir="./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.
> >> el7.scrub"
> >> >> \
> >> >>         --scrub=chroot
> >> >> WARNING: Could not find required logging config file: <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/logging.ini.> Using default...
> >> >> INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
> >> >> Start: init plugins
> >> >> INFO: selinux enabled
> >> >> Finish: init plugins
> >> >> Start: run
> >> >> Start: scrub ['chroot']
> >> >> INFO: scrubbing chroot for mocker-epel-7-x86_64.el7
> >> >> Finish: scrub ['chroot']
> >> >> Finish: run
> >> >> Scrub chroot took 6 seconds
> >> >> ============================
> >> >> ##########################################################
> >> >> ## Tue Jan 10 02:42:13 UTC 2017 Finished env: el7:epel-7-x86_64
> >> >> ##      took 366 seconds
> >> >> ##      rc = 1
> >> >> ##########################################################
> >> >> find: ‘logs’: No such file or directory
> >> >> No log files found, check command output
> >> >> ##!########################################################
> >> >> Collecting mock logs
> >> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.clean_rpmdb’ ->
> >> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.clean_rpmdb’
> >> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.he_basic_suite_4.0.sh
> ’
> >> ->
> >> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.he_
> >> >> basic_suite_4.0.sh’
> >> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.init’ ->
> >> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.init’
> >> >> ##########################################################
> >> >> Build step 'Execute shell' marked build as failure
> >> >> Performing Post build task...
> >> >> Match found for :.* : True
> >> >> Logical operation result is TRUE
> >> >> Running script  : #!/bin/bash -xe
> >> >> echo 'shell_scripts/system_tests.collect_logs.sh'
> >> >>
> >> >> #
> >> >> # Required jjb vars:
> >> >> #    version
> >> >> #
> >> >> VERSION=4.0
> >> >> SUITE_TYPE=
> >> >>
> >> >> WORKSPACE="$PWD"
> >> >> OVIRT_SUITE="$SUITE_TYPE_suite_$VERSION"
> >> >> TESTS_LOGS="$WORKSPACE/ovirt-system-tests/exported-artifacts"
> >> >>
> >> >> rm -rf "$WORKSPACE/exported-artifacts"
> >> >> mkdir -p "$WORKSPACE/exported-artifacts"
> >> >>
> >> >> if [[ -d "$TESTS_LOGS" ]]; then
> >> >>     mv "$TESTS_LOGS/"* "$WORKSPACE/exported-artifacts/"
> >> >> fi
> >> >>
> >> >> [ovirt_4.0_he-system-tests] $ /bin/bash -xe
> >> /tmp/hudson302101162661598371.
> >> >> sh
> >> >> + echo shell_scripts/system_tests.collect_logs.sh
> >> >> shell_scripts/system_tests.collect_logs.sh
> >> >> + VERSION=4.0
> >> >> + SUITE_TYPE=
> >> >> + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> system-tests/ws/
> >> >
> >> >> + OVIRT_SUITE=4.0
> >> >> + TESTS_LOGS=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> >> >> system-tests/ws/ovirt-system-tests/exported-artifacts>
> >> >> + rm -rf <http://jenkins.ovirt.org/job/
> ovirt_4.0_he-system-tests/627/
> >> >> artifact/exported-artifacts>
> >> >> + mkdir -p <http://jenkins.ovirt.org/job/
> ovirt_4.0_he-system-tests/627/
> >> >> artifact/exported-artifacts>
> >> >> + [[ -d <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/exported-artifacts> ]]
> >> >> + mv <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/exported-artifacts/failure_msg.txt> <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/exported-artifacts/lago_logs> <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/exported-artifacts/mock_logs> <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
> >> >> artifact/exported-artifacts/>
> >> >> POST BUILD TASK : SUCCESS
> >> >> END OF POST BUILD TASK : 0
> >> >> Match found for :.* : True
> >> >> Logical operation result is TRUE
> >> >> Running script  : #!/bin/bash -x
> >> >> echo "shell-scripts/mock_cleanup.sh"
> >> >> # Make clear this is the cleanup, helps reading the jenkins logs
> >> >> cat <<EOC
> >> >> ____________________________________________________________
> ___________
> >> >> ############################################################
> ###########
> >> >> #
>  #
> >> >> #                               CLEANUP
>  #
> >> >> #
>  #
> >> >> ############################################################
> ###########
> >> >> EOC
> >> >>
> >> >> shopt -s nullglob
> >> >>
> >> >> WORKSPACE="${WORKSPACE:-$PWD}"
> >> >> UMOUNT_RETRIES="${UMOUNT_RETRIES:-3}"
> >> >> UMOUNT_RETRY_DELAY="${UMOUNT_RETRY_DELAY:-1s}"
> >> >>
> >> >> safe_umount() {
> >> >>     local mount="${1:?}"
> >> >>     local attempt
> >> >>     for ((attempt=0 ; attempt < $UMOUNT_RETRIES ; attempt++)); do
> >> >>         # If this is not the 1st time through the loop, Sleep a
> while to
> >> >> let
> >> >>         # the problem "solve itself"
> >> >>         [[ attempt > 0 ]] && sleep "$UMOUNT_RETRY_DELAY"
> >> >>         # Try to umount
> >> >>         sudo umount --lazy "$mount" && return 0
> >> >>         # See if the mount is already not there despite failing
> >> >>         findmnt --kernel --first "$mount" > /dev/null && return 0
> >> >>     done
> >> >>     echo "ERROR:  Failed to umount $mount."
> >> >>     return 1
> >> >> }
> >> >>
> >> >> # restore the permissions in the working dir, as sometimes it leaves
> >> files
> >> >> # owned by root and then the 'cleanup workspace' from jenkins job
> fails
> >> to
> >> >> # clean and breaks the jobs
> >> >> sudo chown -R "$USER" "$WORKSPACE"
> >> >>
> >> >> # stop any processes running inside the chroot
> >> >> failed=false
> >> >> mock_confs=("$WORKSPACE"/*/mocker*)
> >> >> # Clean current jobs mockroot if any
> >> >> for mock_conf_file in "${mock_confs[@]}"; do
> >> >>     [[ "$mock_conf_file" ]] || continue
> >> >>     echo "Cleaning up mock $mock_conf"
> >> >>     mock_root="${mock_conf_file##*/}"
> >> >>     mock_root="${mock_root%.*}"
> >> >>     my_mock="/usr/bin/mock"
> >> >>     my_mock+=" --configdir=${mock_conf_file%/*}"
> >> >>     my_mock+=" --root=${mock_root}"
> >> >>     my_mock+=" --resultdir=$WORKSPACE"
> >> >>
> >> >>     #TODO: investigate why mock --clean fails to umount certain dirs
> >> >> sometimes,
> >> >>     #so we can use it instead of manually doing all this.
> >> >>     echo "Killing all mock orphan processes, if any."
> >> >>     $my_mock \
> >> >>         --orphanskill \
> >> >>     || {
> >> >>         echo "ERROR:  Failed to kill orphans on $chroot."
> >> >>         failed=true
> >> >>     }
> >> >>
> >> >>     mock_root="$(\
> >> >>         grep \
> >> >>             -Po "(?<=config_opts\['root'\] = ')[^']*" \
> >> >>             "$mock_conf_file" \
> >> >>     )" || :
> >> >>     [[ "$mock_root" ]] || continue
> >> >>     mounts=($(mount | awk '{print $3}' | grep "$mock_root")) || :
> >> >>     if [[ "$mounts" ]]; then
> >> >>         echo "Found mounted dirs inside the chroot $chroot. Trying to
> >> >> umount."
> >> >>     fi
> >> >>     for mount in "${mounts[@]}"; do
> >> >>         safe_umount "$mount" || failed=true
> >> >>     done
> >> >> done
> >> >>
> >> >> # Clean any leftover chroot from other jobs
> >> >> for mock_root in /var/lib/mock/*; do
> >> >>     this_chroot_failed=false
> >> >>     mounts=($(cut -d\  -f2 /proc/mounts | grep "$mock_root" | sort
> -r))
> >> ||
> >> >> :
> >> >>     if [[ "$mounts" ]]; then
> >> >>         echo "Found mounted dirs inside the chroot $mock_root." \
> >> >>              "Trying to umount."
> >> >>     fi
> >> >>     for mount in "${mounts[@]}"; do
> >> >>         safe_umount "$mount" && continue
> >> >>         # If we got here, we failed $UMOUNT_RETRIES attempts so we
> >> should
> >> >> make
> >> >>         # noise
> >> >>         failed=true
> >> >>         this_chroot_failed=true
> >> >>     done
> >> >>     if ! $this_chroot_failed; then
> >> >>         sudo rm -rf "$mock_root"
> >> >>     fi
> >> >> done
> >> >>
> >> >> # remove mock caches that are older then 2 days:
> >> >> find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2
> -print0
> >> | \
> >> >>     xargs -0 -tr sudo rm -rf
> >> >> # We make no effort to leave around caches that may still be in use
> >> because
> >> >> # packages installed in them may go out of date, so may as well
> recreate
> >> >> them
> >> >>
> >> >> # Drop all left over libvirt domains
> >> >> for UUID in $(virsh list --all --uuid); do
> >> >>   virsh destroy $UUID || :
> >> >>   sleep 2
> >> >>   virsh undefine --remove-all-storage --storage vda
> --snapshots-metadata
> >> >> $UUID || :
> >> >> done
> >> >>
> >> >> if $failed; then
> >> >>     echo "Cleanup script failed, propegating failure to job"
> >> >>     exit 1
> >> >> fi
> >> >>
> >> >> [ovirt_4.0_he-system-tests] $ /bin/bash -x /tmp/
> >> >> hudson1888216492513466503.sh
> >> >> + echo shell-scripts/mock_cleanup.sh
> >> >> shell-scripts/mock_cleanup.sh
> >> >> + cat
> >> >> ____________________________________________________________
> ___________
> >> >> ############################################################
> ###########
> >> >> #
>  #
> >> >> #                               CLEANUP
>  #
> >> >> #
>  #
> >> >> ############################################################
> ###########
> >> >> + shopt -s nullglob
> >> >> + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> system-tests/ws/
> >> >
> >> >> + UMOUNT_RETRIES=3
> >> >> + UMOUNT_RETRY_DELAY=1s
> >> >> + sudo chown -R jenkins <http://jenkins.ovirt.org/job/
> >> >> ovirt_4.0_he-system-tests/ws/>
> >> >> + failed=false
> >> >> + mock_confs=("$WORKSPACE"/*/mocker*)
> >> >> + for mock_conf_file in '"${mock_confs[@]}"'
> >> >> + [[ -n <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg> ]]
> >> >> + echo 'Cleaning up mock '
> >> >> Cleaning up mock
> >> >> + mock_root=mocker-epel-7-x86_64.el7.cfg
> >> >> + mock_root=mocker-epel-7-x86_64.el7
> >> >> + my_mock=/usr/bin/mock
> >> >> + my_mock+=' --configdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> >> >> system-tests/ws/ovirt-system-tests'>
> >> >> + my_mock+=' --root=mocker-epel-7-x86_64.el7'
> >> >> + my_mock+=' --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> >> >> system-tests/ws/'>
> >> >> + echo 'Killing all mock orphan processes, if any.'
> >> >> Killing all mock orphan processes, if any.
> >> >> + /usr/bin/mock --configdir=<http://jenkins.
> ovirt.org/job/ovirt_4.0_he-
> >> >> system-tests/ws/ovirt-system-tests> --root=mocker-epel-7-x86_64.el7
> >> >> --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
> system-tests/ws/
> >> >
> >> >> --orphanskill
> >> >> WARNING: Could not find required logging config file: <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/logging.ini.> Using default...
> >> >> INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
> >> >> Start: init plugins
> >> >> INFO: selinux enabled
> >> >> Finish: init plugins
> >> >> Start: run
> >> >> Finish: run
> >> >> ++ grep -Po '(?<=config_opts\['\''root'\''\] = '\'')[^'\'']*' <
> >> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
> >> >> ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg>
> >> >> + mock_root=epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
> >> >> + [[ -n epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b ]]
> >> >> + mounts=($(mount | awk '{print $3}' | grep "$mock_root"))
> >> >> ++ mount
> >> >> ++ awk '{print $3}'
> >> >> ++ grep epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
> >> >> + :
> >> >> + [[ -n '' ]]
> >> >> + find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2
> >> -print0
> >> >> + xargs -0 -tr sudo rm -rf
> >> >> ++ virsh list --all --uuid
> >> >> + false
> >> >> POST BUILD TASK : SUCCESS
> >> >> END OF POST BUILD TASK : 1
> >> >> Recording test results
> >> >> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
> >> >> files were found. Configuration error?
> >> >> Archiving artifacts
> >> >>
> >>
>