December 2019 - Infra - oVirt List Archives

Move HE suite to centos8
by Yedidyah Bar David 24 Dec '19

24 Dec '19

Hi all, Galit has been working on [1] for some time now. Willing to help with that (and getting back to a stable state with hosted-engine suites), I looked at it a few days ago, after manual deploy on centos8 with nightly master snapshot worked for me. I rebased her patches, and CI failed [2]. Out of 5 suites that it tested, 3 passed (!), and 2 failed :-( . I checked the two that failed, and see: 1. One of them failed [3] due to failing to parse JSON output from 'hosted-engine --vm-status --json'. The output seems to have been cut in the middle, after 1024 bytes: 2019-12-22 11:13:47,846::ssh.py::ssh::58::lago.ssh::DEBUG::Running 200606d0 on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-status --json 2019-12-22 11:13:48,441::ssh.py::ssh::81::lago.ssh::DEBUG::Command 200606d0 on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0 2019-12-22 11:13:48,441::ssh.py::ssh::89::lago.ssh::DEBUG::Command 200606d0 on lago-he-basic-role-remote-suite-4-3-host-0 output: {"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4534 (Sun Dec 22 06:13:39 2019)\nhost-id=1\nscore=3000\nvm_conf_refresh_time=4535 (Sun Dec 22 06:13:40 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local", "host-id": 1, "engine-status": {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Powering down"}, "score": 3000, "stopped": false, "maintenance": false, "crc32": "76bfd2ac", "local_conf_timestamp": 4535, "host-ts": 4534}, "2": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4532 (Sun Dec 22 06:13:38 2019)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4532 (Sun Dec 22 06:13:38 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-s 2019-12-22 11:13:48,442::testlib.py::assert_equals_within::242::ovirtlago.testlib::ERROR:: * Unhandled exception in <function <lambda> at 0x7fcc58f38668> Traceback (most recent call last): If you check previous runs of this command, in the same lago log, you see that they are complete (and somewhat longer, but not much), e.g.: 2019-12-22 11:13:41,804::ssh.py::ssh::58::lago.ssh::DEBUG::Running 1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-status --json 2019-12-22 11:13:43,105::ssh.py::ssh::81::lago.ssh::DEBUG::Command 1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0 2019-12-22 11:13:43,106::ssh.py::ssh::89::lago.ssh::DEBUG::Command 1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0 output: {"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4534 (Sun Dec 22 06:13:39 2019)\nhost-id=1\nscore=3000\nvm_conf_refresh_time=4535 (Sun Dec 22 06:13:40 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local", "host-id": 1, "engine-status": {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Powering down"}, "score": 3000, "stopped": false, "maintenance": false, "crc32": "76bfd2ac", "local_conf_timestamp": 4535, "host-ts": 4534}, "2": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4532 (Sun Dec 22 06:13:38 2019)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4532 (Sun Dec 22 06:13:38 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-1", "host-id": 2, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "2fc395a1", "local_conf_timestamp": 4532, "host-ts": 4532}, "global_maintenance": true} So I suspect an infra issue - on the network level, ssh, etc. 2. The other seems to have failed [4] due to timeout: 2019-12-22 12:57:08,595::cmd.py::exit_handler::921::cli::DEBUG::signal 15 was caught The reason it took too long seems to be: 2019-12-22 11:08:13,233::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:57a44b4e-859e-4ff2-b897-7c8b3ade4557:Get ssh client for lago-he-node-ng-suite-4-3-host-0: 2019-12-22 11:08:13,637::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:57a44b4e-859e-4ff2-b897-7c8b3ade4557:Get ssh client for lago-he-node-ng-suite-4-3-host-0: 2019-12-22 11:08:13,942::ssh.py::ssh::58::lago.ssh::DEBUG::Running 59005d88 on lago-he-node-ng-suite-4-3-host-0: true 2019-12-22 11:08:15,605::ssh.py::ssh::81::lago.ssh::DEBUG::Command 59005d88 on lago-he-node-ng-suite-4-3-host-0 returned with 0 2019-12-22 11:08:15,606::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-he-node-ng-suite-4-3-host-0 2019-12-22 11:08:15,608::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:38e81fd0-50f0-497c-8fb0-38eedcfb3f55:Get ssh client for lago-he-node-ng-suite-4-3-host-0: 2019-12-22 11:08:16,129::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:38e81fd0-50f0-497c-8fb0-38eedcfb3f55:Get ssh client for lago-he-node-ng-suite-4-3-host-0: 2019-12-22 12:20:39,054::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4007e002-f8e2-4427-884f-af57f723aca0:Get ssh client for lago-he-node-ng-suite-4-3-storage: 2019-12-22 12:20:39,639::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4007e002-f8e2-4427-884f-af57f723aca0:Get ssh client for lago-he-node-ng-suite-4-3-storage: 2019-12-22 12:20:40,408::ssh.py::ssh::58::lago.ssh::DEBUG::Running 77b26f28 on lago-he-node-ng-suite-4-3-storage: true 2019-12-22 12:20:40,475::ssh.py::ssh::81::lago.ssh::DEBUG::Command 77b26f28 on lago-he-node-ng-suite-4-3-storage returned with 0 2019-12-22 12:20:40,476::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-he-node-ng-suite-4-3-storage Meaning, if I got it right, that it sshed to host-0, succeeded, then tried to ssh to storage, and also succeeded - but connecting took more than an hour. Again, I suspect some infra issue, perhaps same as (1.). Any clue? I am now trying it again, anyway [5]. Thanks and best regards, [1] https://gerrit.ovirt.org/104797 [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7269/ [3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7269/… [4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7269/… [5] http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7273/ -- Didi

1 0

[oVirt Jenkins] cleanup-gated-ovirt-master - Build #9 - FAILURE!
by jenkins＠jenkins.phx.ovirt.org 23 Dec '19

23 Dec '19

Build: http://jenkins.ovirt.org/job/cleanup-gated-ovirt-master/9/ Build Name: #9 Build Status: FAILURE Gerrit change: null - title: null - project: null - branch: null - author: null <null>

1 0

oVirt infra daily report - unstable production jobs - 1048
by jenkins＠jenkins.phx.ovirt.org 23 Dec '19

23 Dec '19

Good morning! Attached is the HTML page with the jenkins status report. You can see it also here: - http://jenkins.ovirt.org/job/system_jenkins-report/1048//artifact/exported-… Cheers, Jenkins

1 0

[JIRA] (OVIRT-2850) Fedora 29 EOL
by Ehud Yonasi (oVirt JIRA) 23 Dec '19

23 Dec '19

Ehud Yonasi created OVIRT-2850: ---------------------------------- Summary: Fedora 29 EOL Key: OVIRT-2850 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2850 Project: oVirt - virtualization made easy Issue Type: Task Reporter: Ehud Yonasi Assignee: infra We need to prepare patches to remove it + mirrors, and let developers know to remove fc29 builds from CI. -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100118)

1 0

[oVirt Jenkins] cleanup-gated-ovirt-4.3 - Build #1 - FAILURE!
by jenkins＠jenkins.phx.ovirt.org 22 Dec '19

22 Dec '19

Build: http://jenkins.ovirt.org/job/cleanup-gated-ovirt-4.3/1/ Build Name: #1 Build Status: FAILURE Gerrit change: null - title: null - project: null - branch: null - author: null <null>

1 2

oVirt infra daily report - unstable production jobs - 1047
by jenkins＠jenkins.phx.ovirt.org 22 Dec '19

22 Dec '19

Good morning! Attached is the HTML page with the jenkins status report. You can see it also here: - http://jenkins.ovirt.org/job/system_jenkins-report/1047//artifact/exported-… Cheers, Jenkins

1 0

Build failed in Jenkins: system-sync_mirrors-centos-updates-7.6-el7-x86_64 #180
by jenkins＠jenkins.phx.ovirt.org 22 Dec '19

22 Dec '19

See <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-updates-7.6-el7-x86…> Changes: ------------------------------------------ Started by timer Running as SYSTEM [EnvInject] - Loading node environment variables. Building remotely on mirrors.phx.ovirt.org (mirrors) in workspace <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-updates-7.6-el7-x86…> No credentials specified > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url http://gerrit.ovirt.org/jenkins.git # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 Resetting working tree > git reset --hard # timeout=10 > git clean -fdx # timeout=10 Pruning obsolete local branches Fetching upstream changes from http://gerrit.ovirt.org/jenkins.git > git --version # timeout=10 > git fetch --tags --progress --prune http://gerrit.ovirt.org/jenkins.git +refs/heads/*:refs/remotes/origin/* > git rev-parse origin/master^{commit} # timeout=10 Checking out Revision 9bb80cca44ae31e3f7c9f48c43f3386abb445d34 (origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f 9bb80cca44ae31e3f7c9f48c43f3386abb445d34 Commit message: "ovirt-engine-api-explorer: Add to STDCI v2" > git rev-list --no-walk 9bb80cca44ae31e3f7c9f48c43f3386abb445d34 # timeout=10 [system-sync_mirrors-centos-updates-7.6-el7-x86_64] $ /bin/bash -xe /tmp/jenkins4808560237896478089.sh + jenkins/scripts/mirror_mgr.sh resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf + MIRRORS_MP_BASE=/var/www/html/repos + MIRRORS_HTTP_BASE=http://mirrors.phx.ovirt.org/repos + MIRRORS_CACHE=/home/jenkins/mirrors_cache + MAX_LOCK_ATTEMPTS=120 + LOCK_WAIT_INTERVAL=5 + LOCK_BASE=/home/jenkins + OLD_MD_TO_KEEP=100 + HTTP_SELINUX_TYPE=httpd_sys_content_t + HTTP_FILE_MODE=644 + main resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf + local command=resync_yum_mirror + command_args=("${@:2}") + local command_args + cmd_resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf + local repo_name=centos-updates-7.6-el7 + local repo_archs=x86_64 + local reposync_conf=jenkins/data/mirrors-reposync.conf + local sync_needed + mkdir -p /home/jenkins/mirrors_cache + verify_repo_fs centos-updates-7.6-el7 yum + local repo_name=centos-updates-7.6-el7 + local repo_type=yum + sudo install -o jenkins -d /var/www/html/repos/yum /var/www/html/repos/yum/centos-updates-7.6-el7 /var/www/html/repos/yum/centos-updates-7.6-el7/base + check_yum_sync_needed centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf sync_needed + local repo_name=centos-updates-7.6-el7 + local repo_archs=x86_64 + local reposync_conf=jenkins/data/mirrors-reposync.conf + local p_sync_needed=sync_needed + local reposync_out + echo 'Checking if mirror needs a resync' Checking if mirror needs a resync + rm -rf /home/jenkins/mirrors_cache/centos-updates-7.6-el7 ++ IFS=, ++ echo x86_64 + for arch in '$(IFS=,; echo $repo_archs)' ++ run_reposync centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf --urls --quiet ++ local repo_name=centos-updates-7.6-el7 ++ local repo_arch=x86_64 ++ local reposync_conf=jenkins/data/mirrors-reposync.conf ++ extra_args=("${@:4}") ++ local extra_args ++ reposync --config=jenkins/data/mirrors-reposync.conf --repoid=centos-updates-7.6-el7 --arch=x86_64 --cachedir=/home/jenkins/mirrors_cache --download_path=/var/www/html/repos/yum/centos-updates-7.6-el7/base --norepopath --newest-only --urls --quiet Error setting up repositories: Error making cache directory: /home/jenkins/mirrors_cache/centos-sclo-rh-release-7.6-el7/gen error was: [Errno 17] File exists: '/home/jenkins/mirrors_cache/centos-sclo-rh-release-7.6-el7/gen' + reposync_out= Build step 'Execute shell' marked build as failure

1 1

[CQ]: 105815,3 (vdsm) failed "ovirt-master" system tests
by oVirt Jenkins 22 Dec '19

22 Dec '19

Change 105815,3 (vdsm) is probably the reason behind recent system test failures in the "ovirt-master" change queue and needs to be fixed. This change had been removed from the testing queue. Artifacts build from this change will not be released until it is fixed. For further details about the change see: https://gerrit.ovirt.org/#/c/105815/3 For failed test results see: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17813/

1 0

oVirt infra daily report - unstable production jobs - 1046
by jenkins＠jenkins.phx.ovirt.org 21 Dec '19

21 Dec '19

Good morning! Attached is the HTML page with the jenkins status report. You can see it also here: - http://jenkins.ovirt.org/job/system_jenkins-report/1046//artifact/exported-… Cheers, Jenkins

1 0

oVirt infra daily report - unstable production jobs - 1045
by jenkins＠jenkins.phx.ovirt.org 20 Dec '19

20 Dec '19

Good morning! Attached is the HTML page with the jenkins status report. You can see it also here: - http://jenkins.ovirt.org/job/system_jenkins-report/1045//artifact/exported-… Cheers, Jenkins

1 0