
On Tue, Jan 10, 2017 at 3:14 PM, Evgheni Dereveanchin <ederevea@redhat.com> wrote:
Not sure what the initial problem was, but on my laptop (Haswell-MB) I always use the lowest possible CPU family to ensure it's using as few features as possible in nested VMs:
;-) I'm doing the exact opposite, for two reasons: 1. I want the best possible performance. Specifically, I'd like the tests to run as fast as possible. 2. I'd like to expose as many of the latest features up to the hosts (and VMs).
<cpu mode='custom' match='exact'> <model fallback='allow'>core2duo</model> <feature policy='require' name='vmx'/> </cpu>
Respectively, I use model_Conroe on oVirt side and didn't have problems with it. Do we really need to use newer CPU families in our tests?
We probably don't - we used to have Conroe hard-coded in the tests (until I changed it to use something different). It does mean it'll be a bit challenging to run on AMD if we decide to go back to hard-code Conroe. Y.
Regards, Evgheni Dereveanchin
----- Original Message ----- From: "Milan Zamazal" <mzamazal@redhat.com> To: "Yaniv Kaul" <ykaul@redhat.com> Cc: "Lev Veyde" <lveyde@redhat.com>, "Eyal Edri" <eedri@redhat.com>, "Sandro Bonazzola" <sbonazzo@redhat.com>, "infra" <infra@ovirt.org>, "Gal Ben Haim" <gbenhaim@redhat.com>, "Martin Polednik" <mpoledni@redhat.com>, "Evgheni Dereveanchin" <ederevea@redhat.com> Sent: Tuesday, 10 January, 2017 1:16:09 PM Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
Yaniv Kaul <ykaul@redhat.com> writes:
On Tue, Jan 10, 2017 at 12:45 PM, Milan Zamazal <mzamazal@redhat.com> wrote:
Yaniv Kaul <ykaul@redhat.com> writes:
On Tue, Jan 10, 2017 at 12:08 PM, Lev Veyde <lveyde@redhat.com> wrote:
This patch is one that caused it probably: https://github.com/lago-project/lago/commit/ 05ccf7240976f91b0c14d6a1f88016 376d5e87f0
+Milan.
+Martin
I must confess that I did not like the patch to begin with... I did not understand what real problem it solved, but Michal assured me there was a real issue.
Yes, there was a real issue with nested virtualization. Some CPU flags are missing with Haswell and Lago doesn't run properly.
Is this a libvirt bug btw?
I'm not sure. When the sets of CPU flags on the host and in the VM with a copied host CPU are different, it's not clear what's the right thing to do.
Perhaps we need a switch to turn this feature on and off?
I think it would be useful to have a possibility to specify a particular CPU type in the Lago configuration.
I know have Engine with a Java@ 100% CPU - I hope it's unrelated to this as well.
I suggest we do survey to see who doesn't have SandyBridge and above and perhaps move higher than Westmere.
We've got Westmere servers in the Brno lab.
Do we know the scope of the problem? Does it happen only on Westmere, for example?
The problem was with Haswell-noTSX (on my Lenovo, but I think Martin has observed the same problem too). We don't know the scope of the problem, but if we want to be able to run Lago on Brno servers then we must be Westmere compatible.
Y.
What do we have in CI? Y.
Thanks in advance, Lev Veyde.
----- Original Message ----- From: "Lev Veyde" <lveyde@redhat.com> To: "Eyal Edri" <eedri@redhat.com>, sbonazzo@redhat.com Cc: infra@ovirt.org, "Gal Ben Haim" <gbenhaim@redhat.com> Sent: Tuesday, January 10, 2017 11:50:05 AM Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
Hi,
Checked the logs and see the following:
02:42:05 [WARNING] OVF does not contain a valid image description,
using
default. 02:42:05 The following CPU types are supported by this host: 02:42:05 - model_Westmere: Intel Westmere Family 02:42:05 - model_Nehalem: Intel Nehalem Family 02:42:05 - model_Penryn: Intel Penryn Family 02:42:05 - model_Conroe: Intel Conroe Family 02:42:05 [ ERROR ] Failed to execute stage 'Environment customization': Invalid CPU type specified: model_SandyBridge
Barak thinks that it may be related to the recent update in the Lago code.
Gal, any idea ?
Thanks in advance, Lev Veyde.
----- Original Message ----- From: jenkins@jenkins.phx.ovirt.org To: sbonazzo@redhat.com, infra@ovirt.org, lveyde@redhat.com Sent: Tuesday, January 10, 2017 4:42:14 AM Subject: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
See <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/ changes
Changes:
[Lev Veyde] Mask NetworkManager service
[Eyal Edri] fix imgbased job names in jjb
[Daniel Belenky] fixing jjb version for cockpit-ovirt
[Gil Shinar] Add some more 4.1 to experimental
[Juan Hernandez] Don't build RPMs for the JBoss modules Maven plugin
[pkliczewski] jsonrpc 4.1 branch
------------------------------------------ [...truncated 749 lines...] Finish: shell @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@ Tue Jan 10 02:42:07 UTC 2017 automation/he_basic_suite_4.0.sh
chroot
finished @@ took 360 seconds @@ rc = 1 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ========== Scrubbing chroot mock \ --configdir="<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/ovirt-system-tests"> \ --root="mocker-epel-7-x86_64.el7" \ --resultdir="./mock_logs.xGGwEk6V/mocker-epel-7-x86_64. el7.scrub" \ --scrub=chroot WARNING: Could not find required logging config file: < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/logging.ini.> Using default... INFO: mock.py version 1.2.21 starting (python version = 3.4.3)... Start: init plugins INFO: selinux enabled Finish: init plugins Start: run Start: scrub ['chroot'] INFO: scrubbing chroot for mocker-epel-7-x86_64.el7 Finish: scrub ['chroot'] Finish: run Scrub chroot took 6 seconds ============================ ########################################################## ## Tue Jan 10 02:42:13 UTC 2017 Finished env: el7:epel-7-x86_64 ## took 366 seconds ## rc = 1 ########################################################## find: ‘logs’: No such file or directory No log files found, check command output ##!######################################################## Collecting mock logs ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.clean_rpmdb’ -> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.clean_rpmdb’ ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.he_basic_suite_4.0.sh ’ -> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.he_ basic_suite_4.0.sh’ ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.init’ -> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.init’ ########################################################## Build step 'Execute shell' marked build as failure Performing Post build task... Match found for :.* : True Logical operation result is TRUE Running script : #!/bin/bash -xe echo 'shell_scripts/system_tests.collect_logs.sh'
# # Required jjb vars: # version # VERSION=4.0 SUITE_TYPE=
WORKSPACE="$PWD" OVIRT_SUITE="$SUITE_TYPE_suite_$VERSION" TESTS_LOGS="$WORKSPACE/ovirt-system-tests/exported-artifacts"
rm -rf "$WORKSPACE/exported-artifacts" mkdir -p "$WORKSPACE/exported-artifacts"
if [[ -d "$TESTS_LOGS" ]]; then mv "$TESTS_LOGS/"* "$WORKSPACE/exported-artifacts/" fi
[ovirt_4.0_he-system-tests] $ /bin/bash -xe /tmp/hudson302101162661598371. sh + echo shell_scripts/system_tests.collect_logs.sh shell_scripts/system_tests.collect_logs.sh + VERSION=4.0 + SUITE_TYPE= + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/
+ OVIRT_SUITE=4.0 + TESTS_LOGS=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/ovirt-system-tests/exported-artifacts> + rm -rf <http://jenkins.ovirt.org/job/ ovirt_4.0_he-system-tests/627/ artifact/exported-artifacts> + mkdir -p <http://jenkins.ovirt.org/job/ ovirt_4.0_he-system-tests/627/ artifact/exported-artifacts> + [[ -d <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/exported-artifacts> ]] + mv <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/exported-artifacts/failure_msg.txt> < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/exported-artifacts/lago_logs> < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/exported-artifacts/mock_logs> < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/ artifact/exported-artifacts/> POST BUILD TASK : SUCCESS END OF POST BUILD TASK : 0 Match found for :.* : True Logical operation result is TRUE Running script : #!/bin/bash -x echo "shell-scripts/mock_cleanup.sh" # Make clear this is the cleanup, helps reading the jenkins logs cat <<EOC ____________________________________________________________
############################################################ ########### # # # CLEANUP # # # ############################################################ ########### EOC
shopt -s nullglob
WORKSPACE="${WORKSPACE:-$PWD}" UMOUNT_RETRIES="${UMOUNT_RETRIES:-3}" UMOUNT_RETRY_DELAY="${UMOUNT_RETRY_DELAY:-1s}"
safe_umount() { local mount="${1:?}" local attempt for ((attempt=0 ; attempt < $UMOUNT_RETRIES ; attempt++)); do # If this is not the 1st time through the loop, Sleep a while to let # the problem "solve itself" [[ attempt > 0 ]] && sleep "$UMOUNT_RETRY_DELAY" # Try to umount sudo umount --lazy "$mount" && return 0 # See if the mount is already not there despite failing findmnt --kernel --first "$mount" > /dev/null && return 0 done echo "ERROR: Failed to umount $mount." return 1 }
# restore the permissions in the working dir, as sometimes it leaves files # owned by root and then the 'cleanup workspace' from jenkins job fails to # clean and breaks the jobs sudo chown -R "$USER" "$WORKSPACE"
# stop any processes running inside the chroot failed=false mock_confs=("$WORKSPACE"/*/mocker*) # Clean current jobs mockroot if any for mock_conf_file in "${mock_confs[@]}"; do [[ "$mock_conf_file" ]] || continue echo "Cleaning up mock $mock_conf" mock_root="${mock_conf_file##*/}" mock_root="${mock_root%.*}" my_mock="/usr/bin/mock" my_mock+=" --configdir=${mock_conf_file%/*}" my_mock+=" --root=${mock_root}" my_mock+=" --resultdir=$WORKSPACE"
#TODO: investigate why mock --clean fails to umount certain dirs sometimes, #so we can use it instead of manually doing all this. echo "Killing all mock orphan processes, if any." $my_mock \ --orphanskill \ || { echo "ERROR: Failed to kill orphans on $chroot." failed=true }
mock_root="$(\ grep \ -Po "(?<=config_opts\['root'\] = ')[^']*" \ "$mock_conf_file" \ )" || : [[ "$mock_root" ]] || continue mounts=($(mount | awk '{print $3}' | grep "$mock_root")) || : if [[ "$mounts" ]]; then echo "Found mounted dirs inside the chroot $chroot. Trying to umount." fi for mount in "${mounts[@]}"; do safe_umount "$mount" || failed=true done done
# Clean any leftover chroot from other jobs for mock_root in /var/lib/mock/*; do this_chroot_failed=false mounts=($(cut -d\ -f2 /proc/mounts | grep "$mock_root" | sort -r)) || : if [[ "$mounts" ]]; then echo "Found mounted dirs inside the chroot $mock_root." \ "Trying to umount." fi for mount in "${mounts[@]}"; do safe_umount "$mount" && continue # If we got here, we failed $UMOUNT_RETRIES attempts so we should make # noise failed=true this_chroot_failed=true done if ! $this_chroot_failed; then sudo rm -rf "$mock_root" fi done
# remove mock caches that are older then 2 days: find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2 -print0 | \ xargs -0 -tr sudo rm -rf # We make no effort to leave around caches that may still be in use because # packages installed in them may go out of date, so may as well recreate them
# Drop all left over libvirt domains for UUID in $(virsh list --all --uuid); do virsh destroy $UUID || : sleep 2 virsh undefine --remove-all-storage --storage vda --snapshots-metadata $UUID || : done
if $failed; then echo "Cleanup script failed, propegating failure to job" exit 1 fi
[ovirt_4.0_he-system-tests] $ /bin/bash -x /tmp/ hudson1888216492513466503.sh + echo shell-scripts/mock_cleanup.sh shell-scripts/mock_cleanup.sh + cat ____________________________________________________________
############################################################ ########### # # # CLEANUP # # # ############################################################ ########### + shopt -s nullglob + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/
+ UMOUNT_RETRIES=3 + UMOUNT_RETRY_DELAY=1s + sudo chown -R jenkins <http://jenkins.ovirt.org/job/ ovirt_4.0_he-system-tests/ws/> + failed=false + mock_confs=("$WORKSPACE"/*/mocker*) + for mock_conf_file in '"${mock_confs[@]}"' + [[ -n <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg> ]] + echo 'Cleaning up mock ' Cleaning up mock + mock_root=mocker-epel-7-x86_64.el7.cfg + mock_root=mocker-epel-7-x86_64.el7 + my_mock=/usr/bin/mock + my_mock+=' --configdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/ovirt-system-tests'> + my_mock+=' --root=mocker-epel-7-x86_64.el7' + my_mock+=' --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/'> + echo 'Killing all mock orphan processes, if any.' Killing all mock orphan processes, if any. + /usr/bin/mock --configdir=<http://jenkins. ovirt.org/job/ovirt_4.0_he- system-tests/ws/ovirt-system-tests> --root=mocker-epel-7-x86_64.el7 --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he- system-tests/ws/
--orphanskill WARNING: Could not find required logging config file: < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/logging.ini.> Using default... INFO: mock.py version 1.2.21 starting (python version = 3.4.3)... Start: init plugins INFO: selinux enabled Finish: init plugins Start: run Finish: run ++ grep -Po '(?<=config_opts\['\''root'\''\] = '\'')[^'\'']*' < http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/ ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg> + mock_root=epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b + [[ -n epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b ]] + mounts=($(mount | awk '{print $3}' | grep "$mock_root")) ++ mount ++ awk '{print $3}' ++ grep epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b + : + [[ -n '' ]] + find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2 -print0 + xargs -0 -tr sudo rm -rf ++ virsh list --all --uuid + false POST BUILD TASK : SUCCESS END OF POST BUILD TASK : 1 Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? Archiving artifacts