
On Tue, Dec 17, 2019 at 10:11 AM Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
$subject. [1] has ovirt-engine-4.4.0-0.0.master.20191204120550.git04d5d05.el7.noarch .
Tried to look around, and I have a few notes/questions:
1. Last successful run of [2] is 3 days old, but apparently it wasn't published. Any idea why?
2. Failed runs of [2] are reported to infra, with emails such as:
[CQ]: 105472, 5 (ovirt-engine) failed "ovirt-master" system tests, but isn't the failure root cause
Is anyone monitoring these?
Is this the only alerting that CI generates on such failures?
If first is No and second is Yes, then we need someone/something to start monitoring. This was discussed a lot, but I do not see any change. Ideally, such alerts should be To'ed or Cc'ed to the author and reviewers of the patch that CI found to be guilty (which might be wrong, that's not the point). Do we plan to have something like this? Any idea when it will be ready?
3. I looked at a few recent failures of [2], specifically [3][4]. Both seem to have been killed after a timeout, while running 'engine-config'. For [3] that's clear, see [5]:
2019-12-16 17:11:44,766::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fb6611dc-55bb-4251-aeda-2578b2ec83a2:Get ssh client for lago-basic-suite-master-engine: 2019-12-16 17:11:44,931::ssh.py::ssh::58::lago.ssh::DEBUG::Running 22e2b6b6 on lago-basic-suite-master-engine: engine-config --set VdsmUseNmstate=true 2019-12-16 19:55:21,965::cmd.py::exit_handler::921::cli::DEBUG::signal 15 was caught
Can't find stdout/stderr of engine-config, so it's hard to tell if it outputted anything helpful to understand why it was stuck.
It's hard to tell that about [4], because it has very few artifacts collected, no idea why, notably no lago.log, but [6] does show:
[36m # initialize_engine: [32mSuccess [0m (in 0:04:00) [0m [36m # engine_config: [0m [0m [0m [36m * Collect artifacts: [0m [0m [0m [36m - [Thread-34] lago-basic-suite-master-engine: [31mERROR [0m (in 0:00:04) [0m [36m * Collect artifacts: [31mERROR [0m (in 0:00:04) [0m [36m # engine_config: [31mERROR [0m (in 2:42:57) [0m /bin/bash: line 31: 5225 Killed ${_STDCI_TIMEOUT_CMD} "3h" "$script_path" < /dev/null
If I run 'engine-config --set VdsmUseNmstate=true' on my 20191204120550.git04d5d05 engine, it returns quickly.
Tried also adding a repo pointing at last successful run of [7], which is currently [8], and it prompts me to input a version, probably as a result of [9]. Ales/Martin, can you please have a look? Thanks.
Something like this might be enough, please take over: https://gerrit.ovirt.org/105784 But the main point of my mail was the first points.
[1] https://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/noarch/ [2] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/ [3] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/ [4] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/ [5] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/artifac... [6] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/artifac... [7] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/ [8] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/384/ [9] https://gerrit.ovirt.org/105440 -- Didi
-- Didi