On Tue, Jan 10, 2017 at 3:14 PM, Evgheni Dereveanchin <ederevea@redhat.com> wrote:

Not sure what the initial problem was, but on my laptop (Haswell-MB)
I always use the lowest possible CPU family to ensure it's using
as few features as possible in nested VMs:

;-)

I'm doing the exact opposite, for two reasons:

1. I want the best possible performance. Specifically, I'd like the tests to run as fast as possible.

2. I'd like to expose as many of the latest features up to the hosts (and VMs).

<cpu mode='custom' match='exact'>
<model fallback='allow'>core2duo</model>
<feature policy='require' name='vmx'/>
</cpu>

Respectively, I use model_Conroe on oVirt side and didn't have
problems with it. Do we really need to use newer CPU families
in our tests?

We probably don't - we used to have Conroe hard-coded in the tests (until I changed it to use something different).

It does mean it'll be a bit challenging to run on AMD if we decide to go back to hard-code Conroe.

Regards,
Evgheni Dereveanchin

----- Original Message -----
From: "Milan Zamazal" <mzamazal@redhat.com>
To: "Yaniv Kaul" <ykaul@redhat.com>
Cc: "Lev Veyde" <lveyde@redhat.com>, "Eyal Edri" <eedri@redhat.com>, "Sandro Bonazzola" <sbonazzo@redhat.com>, "infra" <infra@ovirt.org>, "Gal Ben Haim" <gbenhaim@redhat.com>, "Martin Polednik" <mpoledni@redhat.com>, "Evgheni Dereveanchin" <ederevea@redhat.com>
Sent: Tuesday, 10 January, 2017 1:16:09 PM
Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627

Yaniv Kaul <ykaul@redhat.com> writes:

> On Tue, Jan 10, 2017 at 12:45 PM, Milan Zamazal <mzamazal@redhat.com> wrote:
>
>> Yaniv Kaul <ykaul@redhat.com> writes:
>>
>> > On Tue, Jan 10, 2017 at 12:08 PM, Lev Veyde <lveyde@redhat.com> wrote:
>> >
>> >> This patch is one that caused it probably:
>> >> https://github.com/lago-project/lago/commit/
>> 05ccf7240976f91b0c14d6a1f88016
>> >> 376d5e87f0
>> >
>> >
>> > +Milan.
>>
>> +Martin
>>
>> > I must confess that I did not like the patch to begin with...
>> > I did not understand what real problem it solved, but Michal assured me
>> > there was a real issue.
>>
>> Yes, there was a real issue with nested virtualization. Some CPU flags
>> are missing with Haswell and Lago doesn't run properly.
>>
>
> Is this a libvirt bug btw?

I'm not sure. When the sets of CPU flags on the host and in the VM with
a copied host CPU are different, it's not clear what's the right thing
to do.

> Perhaps we need a switch to turn this feature on and off?

I think it would be useful to have a possibility to specify a particular
CPU type in the Lago configuration.

>> > I know have Engine with a Java@ 100% CPU - I hope it's unrelated to
>> this as
>> > well.
>> >
>> > I suggest we do survey to see who doesn't have SandyBridge and above and
>> > perhaps move higher than Westmere.
>>
>> We've got Westmere servers in the Brno lab.
>>
>
> Do we know the scope of the problem? Does it happen only on Westmere, for
> example?

The problem was with Haswell-noTSX (on my Lenovo, but I think Martin has
observed the same problem too). We don't know the scope of the problem,
but if we want to be able to run Lago on Brno servers then we must be
Westmere compatible.

> Y.
>
>
>> > What do we have in CI?
>> > Y.
>> >
>> >
>> >>
>> >> Thanks in advance,
>> >> Lev Veyde.
>> >>
>> >> ----- Original Message -----
>> >> From: "Lev Veyde" <lveyde@redhat.com>
>> >> To: "Eyal Edri" <eedri@redhat.com>, sbonazzo@redhat.com
>> >> Cc: infra@ovirt.org, "Gal Ben Haim" <gbenhaim@redhat.com>
>> >> Sent: Tuesday, January 10, 2017 11:50:05 AM
>> >> Subject: Re: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
>> >>
>> >> Hi,
>> >>
>> >> Checked the logs and see the following:
>> >>
>> >> 02:42:05 [WARNING] OVF does not contain a valid image description, using
>> >> default.
>> >> 02:42:05 The following CPU types are supported by this host:
>> >> 02:42:05 - model_Westmere: Intel Westmere Family
>> >> 02:42:05 - model_Nehalem: Intel Nehalem Family
>> >> 02:42:05 - model_Penryn: Intel Penryn Family
>> >> 02:42:05 - model_Conroe: Intel Conroe Family
>> >> 02:42:05 [ ERROR ] Failed to execute stage 'Environment customization':
>> >> Invalid CPU type specified: model_SandyBridge
>> >>
>> >> Barak thinks that it may be related to the recent update in the Lago
>> code.
>> >>
>> >> Gal, any idea ?
>> >>
>> >> Thanks in advance,
>> >> Lev Veyde.
>> >>
>> >> ----- Original Message -----
>> >> From: jenkins@jenkins.phx.ovirt.org
>> >> To: sbonazzo@redhat.com, infra@ovirt.org, lveyde@redhat.com
>> >> Sent: Tuesday, January 10, 2017 4:42:14 AM
>> >> Subject: Build failed in Jenkins: ovirt_4.0_he-system-tests #627
>> >>
>> >> See <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/changes
>> >
>> >>
>> >> Changes:
>> >>
>> >> [Lev Veyde] Mask NetworkManager service
>> >>
>> >> [Eyal Edri] fix imgbased job names in jjb
>> >>
>> >> [Daniel Belenky] fixing jjb version for cockpit-ovirt
>> >>
>> >> [Gil Shinar] Add some more 4.1 to experimental
>> >>
>> >> [Juan Hernandez] Don't build RPMs for the JBoss modules Maven plugin
>> >>
>> >> [pkliczewski] jsonrpc 4.1 branch
>> >>
>> >> ------------------------------------------
>> >> [...truncated 749 lines...]
>> >> Finish: shell
>> >> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> >> @@ Tue Jan 10 02:42:07 UTC 2017 automation/he_basic_suite_4.0.sh chroot
>> >> finished
>> >> @@ took 360 seconds
>> >> @@ rc = 1
>> >> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> >> ========== Scrubbing chroot
>> >> mock \
>> >> --configdir="<http://jenkins.ovirt.org/job/ovirt_4.0_he-
>> >> system-tests/ws/ovirt-system-tests"> \
>> >> --root="mocker-epel-7-x86_64.el7" \
>> >> --resultdir="./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.
>> el7.scrub"
>> >> \
>> >> --scrub=chroot
>> >> WARNING: Could not find required logging config file: <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/logging.ini.> Using default...
>> >> INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
>> >> Start: init plugins
>> >> INFO: selinux enabled
>> >> Finish: init plugins
>> >> Start: run
>> >> Start: scrub ['chroot']
>> >> INFO: scrubbing chroot for mocker-epel-7-x86_64.el7
>> >> Finish: scrub ['chroot']
>> >> Finish: run
>> >> Scrub chroot took 6 seconds
>> >> ============================
>> >> ##########################################################
>> >> ## Tue Jan 10 02:42:13 UTC 2017 Finished env: el7:epel-7-x86_64
>> >> ## took 366 seconds
>> >> ## rc = 1
>> >> ##########################################################
>> >> find: ‘logs’: No such file or directory
>> >> No log files found, check command output
>> >> ##!########################################################
>> >> Collecting mock logs
>> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.clean_rpmdb’ ->
>> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.clean_rpmdb’
>> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.he_basic_suite_4.0.sh’
>> ->
>> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.he_
>> >> basic_suite_4.0.sh’
>> >> ‘./mock_logs.xGGwEk6V/mocker-epel-7-x86_64.el7.init’ ->
>> >> ‘exported-artifacts/mock_logs/mocker-epel-7-x86_64.el7.init’
>> >> ##########################################################
>> >> Build step 'Execute shell' marked build as failure
>> >> Performing Post build task...
>> >> Match found for :.* : True
>> >> Logical operation result is TRUE
>> >> Running script : #!/bin/bash -xe
>> >> echo 'shell_scripts/system_tests.collect_logs.sh'
>> >>
>> >> #
>> >> # Required jjb vars:
>> >> # version
>> >> #
>> >> VERSION=4.0
>> >> SUITE_TYPE=
>> >>
>> >> WORKSPACE="$PWD"
>> >> OVIRT_SUITE="$SUITE_TYPE_suite_$VERSION"
>> >> TESTS_LOGS="$WORKSPACE/ovirt-system-tests/exported-artifacts"
>> >>
>> >> rm -rf "$WORKSPACE/exported-artifacts"
>> >> mkdir -p "$WORKSPACE/exported-artifacts"
>> >>
>> >> if [[ -d "$TESTS_LOGS" ]]; then
>> >> mv "$TESTS_LOGS/"* "$WORKSPACE/exported-artifacts/"
>> >> fi
>> >>
>> >> [ovirt_4.0_he-system-tests] $ /bin/bash -xe
>> /tmp/hudson302101162661598371.
>> >> sh
>> >> + echo shell_scripts/system_tests.collect_logs.sh
>> >> shell_scripts/system_tests.collect_logs.sh
>> >> + VERSION=4.0
>> >> + SUITE_TYPE=
>> >> + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >
>> >> + OVIRT_SUITE=4.0
>> >> + TESTS_LOGS=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
>> >> system-tests/ws/ovirt-system-tests/exported-artifacts>
>> >> + rm -rf <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
>> >> artifact/exported-artifacts>
>> >> + mkdir -p <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
>> >> artifact/exported-artifacts>
>> >> + [[ -d <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/exported-artifacts> ]]
>> >> + mv <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/exported-artifacts/failure_msg.txt> <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/exported-artifacts/lago_logs> <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/exported-artifacts/mock_logs> <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/627/
>> >> artifact/exported-artifacts/>
>> >> POST BUILD TASK : SUCCESS
>> >> END OF POST BUILD TASK : 0
>> >> Match found for :.* : True
>> >> Logical operation result is TRUE
>> >> Running script : #!/bin/bash -x
>> >> echo "shell-scripts/mock_cleanup.sh"
>> >> # Make clear this is the cleanup, helps reading the jenkins logs
>> >> cat <<EOC
>> >> _______________________________________________________________________
>> >> #######################################################################
>> >> # #
>> >> # CLEANUP #
>> >> # #
>> >> #######################################################################
>> >> EOC
>> >>
>> >> shopt -s nullglob
>> >>
>> >> WORKSPACE="${WORKSPACE:-$PWD}"
>> >> UMOUNT_RETRIES="${UMOUNT_RETRIES:-3}"
>> >> UMOUNT_RETRY_DELAY="${UMOUNT_RETRY_DELAY:-1s}"
>> >>
>> >> safe_umount() {
>> >> local mount="${1:?}"
>> >> local attempt
>> >> for ((attempt=0 ; attempt < $UMOUNT_RETRIES ; attempt++)); do
>> >> # If this is not the 1st time through the loop, Sleep a while to
>> >> let
>> >> # the problem "solve itself"
>> >> [[ attempt > 0 ]] && sleep "$UMOUNT_RETRY_DELAY"
>> >> # Try to umount
>> >> sudo umount --lazy "$mount" && return 0
>> >> # See if the mount is already not there despite failing
>> >> findmnt --kernel --first "$mount" > /dev/null && return 0
>> >> done
>> >> echo "ERROR: Failed to umount $mount."
>> >> return 1
>> >> }
>> >>
>> >> # restore the permissions in the working dir, as sometimes it leaves
>> files
>> >> # owned by root and then the 'cleanup workspace' from jenkins job fails
>> to
>> >> # clean and breaks the jobs
>> >> sudo chown -R "$USER" "$WORKSPACE"
>> >>
>> >> # stop any processes running inside the chroot
>> >> failed=false
>> >> mock_confs=("$WORKSPACE"/*/mocker*)
>> >> # Clean current jobs mockroot if any
>> >> for mock_conf_file in "${mock_confs[@]}"; do
>> >> [[ "$mock_conf_file" ]] || continue
>> >> echo "Cleaning up mock $mock_conf"
>> >> mock_root="${mock_conf_file##*/}"
>> >> mock_root="${mock_root%.*}"
>> >> my_mock="/usr/bin/mock"
>> >> my_mock+=" --configdir=${mock_conf_file%/*}"
>> >> my_mock+=" --root=${mock_root}"
>> >> my_mock+=" --resultdir=$WORKSPACE"
>> >>
>> >> #TODO: investigate why mock --clean fails to umount certain dirs
>> >> sometimes,
>> >> #so we can use it instead of manually doing all this.
>> >> echo "Killing all mock orphan processes, if any."
>> >> $my_mock \
>> >> --orphanskill \
>> >> || {
>> >> echo "ERROR: Failed to kill orphans on $chroot."
>> >> failed=true
>> >> }
>> >>
>> >> mock_root="$(\
>> >> grep \
>> >> -Po "(?<=config_opts\['root'\] = ')[^']*" \
>> >> "$mock_conf_file" \
>> >> )" || :
>> >> [[ "$mock_root" ]] || continue
>> >> mounts=($(mount | awk '{print $3}' | grep "$mock_root")) || :
>> >> if [[ "$mounts" ]]; then
>> >> echo "Found mounted dirs inside the chroot $chroot. Trying to
>> >> umount."
>> >> fi
>> >> for mount in "${mounts[@]}"; do
>> >> safe_umount "$mount" || failed=true
>> >> done
>> >> done
>> >>
>> >> # Clean any leftover chroot from other jobs
>> >> for mock_root in /var/lib/mock/*; do
>> >> this_chroot_failed=false
>> >> mounts=($(cut -d\ -f2 /proc/mounts | grep "$mock_root" | sort -r))
>> ||
>> >> :
>> >> if [[ "$mounts" ]]; then
>> >> echo "Found mounted dirs inside the chroot $mock_root." \
>> >> "Trying to umount."
>> >> fi
>> >> for mount in "${mounts[@]}"; do
>> >> safe_umount "$mount" && continue
>> >> # If we got here, we failed $UMOUNT_RETRIES attempts so we
>> should
>> >> make
>> >> # noise
>> >> failed=true
>> >> this_chroot_failed=true
>> >> done
>> >> if ! $this_chroot_failed; then
>> >> sudo rm -rf "$mock_root"
>> >> fi
>> >> done
>> >>
>> >> # remove mock caches that are older then 2 days:
>> >> find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2 -print0
>> | \
>> >> xargs -0 -tr sudo rm -rf
>> >> # We make no effort to leave around caches that may still be in use
>> because
>> >> # packages installed in them may go out of date, so may as well recreate
>> >> them
>> >>
>> >> # Drop all left over libvirt domains
>> >> for UUID in $(virsh list --all --uuid); do
>> >> virsh destroy $UUID || :
>> >> sleep 2
>> >> virsh undefine --remove-all-storage --storage vda --snapshots-metadata
>> >> $UUID || :
>> >> done
>> >>
>> >> if $failed; then
>> >> echo "Cleanup script failed, propegating failure to job"
>> >> exit 1
>> >> fi
>> >>
>> >> [ovirt_4.0_he-system-tests] $ /bin/bash -x /tmp/
>> >> hudson1888216492513466503.sh
>> >> + echo shell-scripts/mock_cleanup.sh
>> >> shell-scripts/mock_cleanup.sh
>> >> + cat
>> >> _______________________________________________________________________
>> >> #######################################################################
>> >> # #
>> >> # CLEANUP #
>> >> # #
>> >> #######################################################################
>> >> + shopt -s nullglob
>> >> + WORKSPACE=<http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >
>> >> + UMOUNT_RETRIES=3
>> >> + UMOUNT_RETRY_DELAY=1s
>> >> + sudo chown -R jenkins <http://jenkins.ovirt.org/job/
>> >> ovirt_4.0_he-system-tests/ws/>
>> >> + failed=false
>> >> + mock_confs=("$WORKSPACE"/*/mocker*)
>> >> + for mock_conf_file in '"${mock_confs[@]}"'
>> >> + [[ -n <http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg> ]]
>> >> + echo 'Cleaning up mock '
>> >> Cleaning up mock
>> >> + mock_root=mocker-epel-7-x86_64.el7.cfg
>> >> + mock_root=mocker-epel-7-x86_64.el7
>> >> + my_mock=/usr/bin/mock
>> >> + my_mock+=' --configdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
>> >> system-tests/ws/ovirt-system-tests'>
>> >> + my_mock+=' --root=mocker-epel-7-x86_64.el7'
>> >> + my_mock+=' --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
>> >> system-tests/ws/'>
>> >> + echo 'Killing all mock orphan processes, if any.'
>> >> Killing all mock orphan processes, if any.
>> >> + /usr/bin/mock --configdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-
>> >> system-tests/ws/ovirt-system-tests> --root=mocker-epel-7-x86_64.el7
>> >> --resultdir=<http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >
>> >> --orphanskill
>> >> WARNING: Could not find required logging config file: <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/logging.ini.> Using default...
>> >> INFO: mock.py version 1.2.21 starting (python version = 3.4.3)...
>> >> Start: init plugins
>> >> INFO: selinux enabled
>> >> Finish: init plugins
>> >> Start: run
>> >> Finish: run
>> >> ++ grep -Po '(?<=config_opts\['\''root'\''\] = '\'')[^'\'']*' <
>> >> http://jenkins.ovirt.org/job/ovirt_4.0_he-system-tests/ws/
>> >> ovirt-system-tests/mocker-epel-7-x86_64.el7.cfg>
>> >> + mock_root=epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
>> >> + [[ -n epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b ]]
>> >> + mounts=($(mount | awk '{print $3}' | grep "$mock_root"))
>> >> ++ mount
>> >> ++ awk '{print $3}'
>> >> ++ grep epel-7-x86_64-6f628e6dc1a827c86d5e1bd9d3b3d38b
>> >> + :
>> >> + [[ -n '' ]]
>> >> + find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2
>> -print0
>> >> + xargs -0 -tr sudo rm -rf
>> >> ++ virsh list --all --uuid
>> >> + false
>> >> POST BUILD TASK : SUCCESS
>> >> END OF POST BUILD TASK : 1
>> >> Recording test results
>> >> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
>> >> files were found. Configuration error?
>> >> Archiving artifacts
>> >>
>>