On Tue, Dec 17, 2019 at 10:11 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
Hi all,
$subject. [1] has
ovirt-engine-4.4.0-0.0.master.20191204120550.git04d5d05.el7.noarch .
Tried to look around, and I have a few notes/questions:
1. Last successful run of [2] is 3 days old, but apparently it wasn't
published. Any idea why?
2. Failed runs of [2] are reported to infra, with emails such as:
[CQ]: 105472, 5 (ovirt-engine) failed "ovirt-master" system tests, but
isn't the failure root cause
Is anyone monitoring these?
Is this the only alerting that CI generates on such failures?
If first is No and second is Yes, then we need someone/something to
start monitoring. This was discussed a lot, but I do not see any
change. Ideally, such alerts should be To'ed or Cc'ed to the author
and reviewers of the patch that CI found to be guilty (which might be
wrong, that's not the point). Do we plan to have something like this?
Any idea when it will be ready?
3. I looked at a few recent failures of [2], specifically [3][4]. Both
seem to have been killed after a timeout, while running
'engine-config'. For [3] that's clear, see [5]:
2019-12-16 17:11:44,766::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:fb6611dc-55bb-4251-aeda-2578b2ec83a2:Get ssh client for
lago-basic-suite-master-engine:
2019-12-16 17:11:44,931::ssh.py::ssh::58::lago.ssh::DEBUG::Running
22e2b6b6 on lago-basic-suite-master-engine: engine-config --set
VdsmUseNmstate=true
2019-12-16 19:55:21,965::cmd.py::exit_handler::921::cli::DEBUG::signal
15 was caught
Can't find stdout/stderr of engine-config, so it's hard to tell if it
outputted anything helpful to understand why it was stuck.
It's hard to tell that about [4], because it has very few artifacts
collected, no idea why, notably no lago.log, but [6] does show:
[36m # initialize_engine: [32mSuccess [0m (in 0:04:00) [0m
[36m # engine_config: [0m [0m [0m
[36m * Collect artifacts: [0m [0m [0m
[36m - [Thread-34] lago-basic-suite-master-engine:
[31mERROR [0m (in 0:00:04) [0m
[36m * Collect artifacts: [31mERROR [0m (in 0:00:04) [0m
[36m # engine_config: [31mERROR [0m (in 2:42:57) [0m
/bin/bash: line 31: 5225 Killed
${_STDCI_TIMEOUT_CMD} "3h" "$script_path" < /dev/null
If I run 'engine-config --set VdsmUseNmstate=true' on my
20191204120550.git04d5d05 engine, it returns quickly.
Tried also adding a repo pointing at last successful run of [7], which
is currently [8], and it prompts me to input a version, probably as a
result of [9]. Ales/Martin, can you please have a look? Thanks.
Something like this might be enough, please take over:
https://gerrit.ovirt.org/105784
But the main point of my mail was the first points.
--
Didi