On Tue, Dec 17, 2019 at 9:55 AM Anton Marchukov <amarchuk@redhat.com> wrote:
Hi.

We do watch those and this one was reported by Dafna though devel list was not included for some reason (usually we do include it). We strive to follow up on it daily, but sometimes we lag behind.

It would be good to send to the patch owner that the system is identified as being a possible cause (by bi-section), but initially it was not done like that. Sometimes reporting is misleading (e.g. the external repos changes that we use are not visible for bi-section, also infra problems is something we fix). Though I am ok to try to CC the patch owner in especially since we are working on gating as a long-term solution and IMO this is a step in the right direction.

Please let's try that, those alerts need to be more raised

Anton.

> On 17 Dec 2019, at 09:43, Yedidyah Bar David <didi@redhat.com> wrote:
>
> On Tue, Dec 17, 2019 at 10:11 AM Yedidyah Bar David <didi@redhat.com> wrote:
>>
>> Hi all,
>>
>> $subject. [1] has
>> ovirt-engine-4.4.0-0.0.master.20191204120550.git04d5d05.el7.noarch .
>>
>> Tried to look around, and I have a few notes/questions:
>>
>> 1. Last successful run of [2] is 3 days old, but apparently it wasn't
>> published. Any idea why?
>>
>> 2. Failed runs of [2] are reported to infra, with emails such as:
>>
>> [CQ]: 105472, 5 (ovirt-engine) failed "ovirt-master" system tests, but
>> isn't the failure root cause
>>
>> Is anyone monitoring these?
>>
>> Is this the only alerting that CI generates on such failures?
>>
>> If first is No and second is Yes, then we need someone/something to
>> start monitoring. This was discussed a lot, but I do not see any
>> change. Ideally, such alerts should be To'ed or Cc'ed to the author
>> and reviewers of the patch that CI found to be guilty (which might be
>> wrong, that's not the point). Do we plan to have something like this?
>> Any idea when it will be ready?
>>
>> 3. I looked at a few recent failures of [2], specifically [3][4]. Both
>> seem to have been killed after a timeout, while running
>> 'engine-config'. For [3] that's clear, see [5]:
>>
>> 2019-12-16 17:11:44,766::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
>> task:fb6611dc-55bb-4251-aeda-2578b2ec83a2:Get ssh client for
>> lago-basic-suite-master-engine:
>> 2019-12-16 17:11:44,931::ssh.py::ssh::58::lago.ssh::DEBUG::Running
>> 22e2b6b6 on lago-basic-suite-master-engine: engine-config --set
>> VdsmUseNmstate=true
>> 2019-12-16 19:55:21,965::cmd.py::exit_handler::921::cli::DEBUG::signal
>> 15 was caught
>>
>> Can't find stdout/stderr of engine-config, so it's hard to tell if it
>> outputted anything helpful to understand why it was stuck.
>>
>> It's hard to tell that about [4], because it has very few artifacts
>> collected, no idea why, notably no lago.log, but [6] does show:
>>
>> [36m  # initialize_engine:  [32mSuccess [0m (in 0:04:00) [0m
>> [36m  # engine_config:  [0m [0m [0m
>> [36m    * Collect artifacts:  [0m [0m [0m
>> [36m      - [Thread-34] lago-basic-suite-master-engine:
>> [31mERROR [0m (in 0:00:04) [0m
>> [36m    * Collect artifacts:  [31mERROR [0m (in 0:00:04) [0m
>> [36m  # engine_config:  [31mERROR [0m (in 2:42:57) [0m
>> /bin/bash: line 31:  5225 Killed
>> ${_STDCI_TIMEOUT_CMD} "3h" "$script_path" < /dev/null
>>
>> If I run 'engine-config --set VdsmUseNmstate=true' on my
>> 20191204120550.git04d5d05 engine, it returns quickly.
>>
>> Tried also adding a repo pointing at last successful run of [7], which
>> is currently [8], and it prompts me to input a version, probably as a
>> result of [9]. Ales/Martin, can you please have a look? Thanks.
>
> Something like this might be enough, please take over:
>
> https://gerrit.ovirt.org/105784
>
> But the main point of my mail was the first points.
>
>>
>> [1] https://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/noarch/
>> [2] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/
>> [3] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/
>> [4] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/
>> [5] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/artifact/basic-suite.el7.x86_64/lago_logs/lago.log
>> [6] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/artifact/basic-suite.el7.x86_64/mock_logs/script/stdout_stderr.log
>> [7] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/
>> [8] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/384/
>> [9] https://gerrit.ovirt.org/105440
>> --
>> Didi
>
>
>
> --
> Didi
> _______________________________________________
> Infra mailing list -- infra@ovirt.org
> To unsubscribe send an email to infra-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/4TEQYJOB67NCPO7MNV2JEKDXRV5KZTVU/

--
Anton Marchukov
Associate Manager - RHV DevOps - Red Hat








--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.