Move HE suite to centos8
by Yedidyah Bar David
Hi all,
Galit has been working on [1] for some time now. Willing to help with
that (and getting back to a stable state with hosted-engine suites), I
looked at it a few days ago, after manual deploy on centos8 with
nightly master snapshot worked for me. I rebased her patches, and CI
failed [2]. Out of 5 suites that it tested, 3 passed (!), and 2 failed
:-( . I checked the two that failed, and see:
1. One of them failed [3] due to failing to parse JSON output from
'hosted-engine --vm-status --json'. The output seems to have been cut
in the middle, after 1024 bytes:
2019-12-22 11:13:47,846::ssh.py::ssh::58::lago.ssh::DEBUG::Running
200606d0 on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine
--vm-status --json
2019-12-22 11:13:48,441::ssh.py::ssh::81::lago.ssh::DEBUG::Command
200606d0 on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0
2019-12-22 11:13:48,441::ssh.py::ssh::89::lago.ssh::DEBUG::Command
200606d0 on lago-he-basic-role-remote-suite-4-3-host-0 output:
{"1": {"conf_on_shared_storage": true, "live-data": true, "extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4534
(Sun Dec 22 06:13:39
2019)\nhost-id=1\nscore=3000\nvm_conf_refresh_time=4535 (Sun Dec 22
06:13:40 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local",
"host-id": 1, "engine-status": {"reason": "failed liveliness check",
"health": "bad", "vm": "up", "detail": "Powering down"}, "score":
3000, "stopped": false, "maintenance": false, "crc32": "76bfd2ac",
"local_conf_timestamp": 4535, "host-ts": 4534}, "2":
{"conf_on_shared_storage": true, "live-data": true, "extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4532
(Sun Dec 22 06:13:38
2019)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4532 (Sun Dec 22
06:13:38 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-s
2019-12-22 11:13:48,442::testlib.py::assert_equals_within::242::ovirtlago.testlib::ERROR::
* Unhandled exception in <function <lambda> at 0x7fcc58f38668>
Traceback (most recent call last):
If you check previous runs of this command, in the same lago log, you
see that they are complete (and somewhat longer, but not much), e.g.:
2019-12-22 11:13:41,804::ssh.py::ssh::58::lago.ssh::DEBUG::Running
1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine
--vm-status --json
2019-12-22 11:13:43,105::ssh.py::ssh::81::lago.ssh::DEBUG::Command
1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0
2019-12-22 11:13:43,106::ssh.py::ssh::89::lago.ssh::DEBUG::Command
1c6c1226 on lago-he-basic-role-remote-suite-4-3-host-0 output:
{"1": {"conf_on_shared_storage": true, "live-data": true, "extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4534
(Sun Dec 22 06:13:39
2019)\nhost-id=1\nscore=3000\nvm_conf_refresh_time=4535 (Sun Dec 22
06:13:40 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local",
"host-id": 1, "engine-status": {"reason": "failed liveliness check",
"health": "bad", "vm": "up", "detail": "Powering down"}, "score":
3000, "stopped": false, "maintenance": false, "crc32": "76bfd2ac",
"local_conf_timestamp": 4535, "host-ts": 4534}, "2":
{"conf_on_shared_storage": true, "live-data": true, "extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4532
(Sun Dec 22 06:13:38
2019)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4532 (Sun Dec 22
06:13:38 2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-suite-4-3-host-1", "host-id":
2, "engine-status": {"reason": "vm not running on this host",
"health": "bad", "vm": "down", "detail": "unknown"}, "score": 3400,
"stopped": false, "maintenance": false, "crc32": "2fc395a1",
"local_conf_timestamp": 4532, "host-ts": 4532}, "global_maintenance":
true}
So I suspect an infra issue - on the network level, ssh, etc.
2. The other seems to have failed [4] due to timeout:
2019-12-22 12:57:08,595::cmd.py::exit_handler::921::cli::DEBUG::signal
15 was caught
The reason it took too long seems to be:
2019-12-22 11:08:13,233::log_utils.py::__enter__::600::lago.ssh::DEBUG::start
task:57a44b4e-859e-4ff2-b897-7c8b3ade4557:Get ssh client for
lago-he-node-ng-suite-4-3-host-0:
2019-12-22 11:08:13,637::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:57a44b4e-859e-4ff2-b897-7c8b3ade4557:Get ssh client for
lago-he-node-ng-suite-4-3-host-0:
2019-12-22 11:08:13,942::ssh.py::ssh::58::lago.ssh::DEBUG::Running
59005d88 on lago-he-node-ng-suite-4-3-host-0: true
2019-12-22 11:08:15,605::ssh.py::ssh::81::lago.ssh::DEBUG::Command
59005d88 on lago-he-node-ng-suite-4-3-host-0 returned with 0
2019-12-22 11:08:15,606::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait
succeeded for ssh to lago-he-node-ng-suite-4-3-host-0
2019-12-22 11:08:15,608::log_utils.py::__enter__::600::lago.ssh::DEBUG::start
task:38e81fd0-50f0-497c-8fb0-38eedcfb3f55:Get ssh client for
lago-he-node-ng-suite-4-3-host-0:
2019-12-22 11:08:16,129::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:38e81fd0-50f0-497c-8fb0-38eedcfb3f55:Get ssh client for
lago-he-node-ng-suite-4-3-host-0:
2019-12-22 12:20:39,054::log_utils.py::__enter__::600::lago.ssh::DEBUG::start
task:4007e002-f8e2-4427-884f-af57f723aca0:Get ssh client for
lago-he-node-ng-suite-4-3-storage:
2019-12-22 12:20:39,639::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:4007e002-f8e2-4427-884f-af57f723aca0:Get ssh client for
lago-he-node-ng-suite-4-3-storage:
2019-12-22 12:20:40,408::ssh.py::ssh::58::lago.ssh::DEBUG::Running
77b26f28 on lago-he-node-ng-suite-4-3-storage: true
2019-12-22 12:20:40,475::ssh.py::ssh::81::lago.ssh::DEBUG::Command
77b26f28 on lago-he-node-ng-suite-4-3-storage returned with 0
2019-12-22 12:20:40,476::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait
succeeded for ssh to lago-he-node-ng-suite-4-3-storage
Meaning, if I got it right, that it sshed to host-0, succeeded, then
tried to ssh to storage, and also succeeded - but connecting took more
than an hour. Again, I suspect some infra issue, perhaps same as (1.).
Any clue?
I am now trying it again, anyway [5].
Thanks and best regards,
[1] https://gerrit.ovirt.org/104797
[2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7269/
[3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/726...
[4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/726...
[5] http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7273/
--
Didi
4 years, 12 months
[JIRA] (OVIRT-2850) Fedora 29 EOL
by Ehud Yonasi (oVirt JIRA)
Ehud Yonasi created OVIRT-2850:
----------------------------------
Summary: Fedora 29 EOL
Key: OVIRT-2850
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2850
Project: oVirt - virtualization made easy
Issue Type: Task
Reporter: Ehud Yonasi
Assignee: infra
We need to prepare patches to remove it + mirrors, and let developers know to remove fc29 builds from CI.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100118)
5 years
Build failed in Jenkins:
system-sync_mirrors-centos-updates-7.6-el7-x86_64 #180
by jenkins@jenkins.phx.ovirt.org
See <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-updates-7.6-el7-x...>
Changes:
------------------------------------------
Started by timer
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on mirrors.phx.ovirt.org (mirrors) in workspace <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-updates-7.6-el7-x...>
No credentials specified
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url http://gerrit.ovirt.org/jenkins.git # timeout=10
Cleaning workspace
> git rev-parse --verify HEAD # timeout=10
Resetting working tree
> git reset --hard # timeout=10
> git clean -fdx # timeout=10
Pruning obsolete local branches
Fetching upstream changes from http://gerrit.ovirt.org/jenkins.git
> git --version # timeout=10
> git fetch --tags --progress --prune http://gerrit.ovirt.org/jenkins.git +refs/heads/*:refs/remotes/origin/*
> git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9bb80cca44ae31e3f7c9f48c43f3386abb445d34 (origin/master)
> git config core.sparsecheckout # timeout=10
> git checkout -f 9bb80cca44ae31e3f7c9f48c43f3386abb445d34
Commit message: "ovirt-engine-api-explorer: Add to STDCI v2"
> git rev-list --no-walk 9bb80cca44ae31e3f7c9f48c43f3386abb445d34 # timeout=10
[system-sync_mirrors-centos-updates-7.6-el7-x86_64] $ /bin/bash -xe /tmp/jenkins4808560237896478089.sh
+ jenkins/scripts/mirror_mgr.sh resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf
+ MIRRORS_MP_BASE=/var/www/html/repos
+ MIRRORS_HTTP_BASE=http://mirrors.phx.ovirt.org/repos
+ MIRRORS_CACHE=/home/jenkins/mirrors_cache
+ MAX_LOCK_ATTEMPTS=120
+ LOCK_WAIT_INTERVAL=5
+ LOCK_BASE=/home/jenkins
+ OLD_MD_TO_KEEP=100
+ HTTP_SELINUX_TYPE=httpd_sys_content_t
+ HTTP_FILE_MODE=644
+ main resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf
+ local command=resync_yum_mirror
+ command_args=("${@:2}")
+ local command_args
+ cmd_resync_yum_mirror centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf
+ local repo_name=centos-updates-7.6-el7
+ local repo_archs=x86_64
+ local reposync_conf=jenkins/data/mirrors-reposync.conf
+ local sync_needed
+ mkdir -p /home/jenkins/mirrors_cache
+ verify_repo_fs centos-updates-7.6-el7 yum
+ local repo_name=centos-updates-7.6-el7
+ local repo_type=yum
+ sudo install -o jenkins -d /var/www/html/repos/yum /var/www/html/repos/yum/centos-updates-7.6-el7 /var/www/html/repos/yum/centos-updates-7.6-el7/base
+ check_yum_sync_needed centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf sync_needed
+ local repo_name=centos-updates-7.6-el7
+ local repo_archs=x86_64
+ local reposync_conf=jenkins/data/mirrors-reposync.conf
+ local p_sync_needed=sync_needed
+ local reposync_out
+ echo 'Checking if mirror needs a resync'
Checking if mirror needs a resync
+ rm -rf /home/jenkins/mirrors_cache/centos-updates-7.6-el7
++ IFS=,
++ echo x86_64
+ for arch in '$(IFS=,; echo $repo_archs)'
++ run_reposync centos-updates-7.6-el7 x86_64 jenkins/data/mirrors-reposync.conf --urls --quiet
++ local repo_name=centos-updates-7.6-el7
++ local repo_arch=x86_64
++ local reposync_conf=jenkins/data/mirrors-reposync.conf
++ extra_args=("${@:4}")
++ local extra_args
++ reposync --config=jenkins/data/mirrors-reposync.conf --repoid=centos-updates-7.6-el7 --arch=x86_64 --cachedir=/home/jenkins/mirrors_cache --download_path=/var/www/html/repos/yum/centos-updates-7.6-el7/base --norepopath --newest-only --urls --quiet
Error setting up repositories: Error making cache directory: /home/jenkins/mirrors_cache/centos-sclo-rh-release-7.6-el7/gen error was: [Errno 17] File exists: '/home/jenkins/mirrors_cache/centos-sclo-rh-release-7.6-el7/gen'
+ reposync_out=
Build step 'Execute shell' marked build as failure
5 years