On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer(a)redhat.com>
wrote:
> On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh(a)redhat.com> wrote:
>
>>
>>
>> On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>>> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh(a)redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh(a)redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer
<nsoffer(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>> We have this failure that pops randomly:
>>>>>>
>>>>>> 1. All tests pass
>>>>>>
>>>>>> *00:13:13.284* ___________________________________ summary
____________________________________*00:13:13.285* tests: commands
succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36:
commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36:
commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290*
network-py36: commands succeeded*00:13:13.291* virt-py27: commands
succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations
:)
>>>>>>
>>>>>>
>>>>>> 2. But we fail to collect logs at the end
>>>>>>
>>>>>> *00:14:35.992*
##########################################################*00:14:35.995* ## Wed Nov 28
17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764
seconds*00:14:35.997* ## rc = 1*00:14:35.997*
##########################################################*00:14:36.009* ##! ERROR
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries:
/tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012*
journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar:
journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read
it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015*
lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015*
libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016*
libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017*
libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017*
ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018*
samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018*
wtmp*00:14:36.018* Took 678 seconds*00:14:36.018*
===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019*
##!########################################################
>>>>>>
>>>>>>
>>>>>> This looks like an issue with vdsm check-patch.sh:
>>>>>>
>>>>>> function collect_logs {
>>>>>> res=$?
>>>>>> [ "$res" -ne 0 ] && echo "*** err:
$res"
>>>>>> cd /var/log
>>>>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" *
>>>>>> cd /var/host_log
>>>>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" *
>>>>>> }
>>>>>>
>>>>>> trap collect_logs EXIT
>>>>>>
>>>>>> Seems that tar fail to collect log if the log is modified while
>>>>>> copied, which makes sense.
>>>>>>
>>>>>> We can ignore errors in tar, since log collection should not
fail
>>>>>> the build, but I think
>>>>>> a better solution is to avoid collecting any logs since vdsm
writes
>>>>>> its own logs during
>>>>>> tests - all the info must be in vdsm log.
>>>>>>
>>>>>> Here is the list of collected logs:
>>>>>>
>>>>>> *00:13:47.280* + tar -cvzf
/home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz
btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog
libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285*
btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309*
dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312*
hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315*
libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317*
openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319*
README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342*
wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf
/home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz
anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron
dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt
ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356*
anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358*
anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395*
anaconda/ks-script-b5_08tmo.log*00:13:47.396*
anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398*
anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449*
anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465*
anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467*
anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504*
audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814*
audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008*
audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157*
chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161*
cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335*
dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424*
hawkey.log*00:13:51.704* journal/*00:13:51.708*
journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709*
journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar:
journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read
it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915*
lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926*
libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928*
libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929*
libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930*
ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933*
samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935*
wtmp
>>>>>>
>>>>>>
>>>>>> Most if not all are lot relevant to vdsm tests, and should not
be
>>>>>> collected.
>>>>>>
>>>>>> This was added in:
>>>>>>
>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9
>>>>>> Author: Edward Haas <edwardh(a)redhat.com>
>>>>>> Date: Tue Apr 17 10:53:11 2018 +0300
>>>>>>
>>>>>> automation: Collect logs even when check-patch fails
>>>>>>
>>>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a
>>>>>> Signed-off-by: Edward Haas <edwardh(a)redhat.com>
>>>>>>
>>>>>> Reviewed in:
>>>>>>
https://gerrit.ovirt.org/c/90370
>>>>>>
>>>>>> Edward, can you explain why do we need to collect logs during
>>>>>> check-patch,
>>>>>> and why do we need to collect all the logs in the system?
>>>>>>
>>>>>
>>>>> check-patch are running unit and integrations tests.
>>>>> The integration tests are touching the OS and other packages (like
>>>>> openvswitch).
>>>>> It was added so we can debug why tests failed.
>>>>>
>>>>> I guess we can now separate the unit and integration tests, but it
>>>>> will not solve
>>>>> the problem presented here.
>>>>> Failing to collect the logs silently sounds a good enough solution
to
>>>>> me.
>>>>>
>>>>
>>>> Barak suggested to just exclude the journal:
>>>>
https://gerrit.ovirt.org/#/c/95850/
>>>>
>>>
>>> This is fixed now, thanks!
>>>
>>
> Not fixed yet, we still fail collecting /var/host_log:
>
> + cd /var/host_log
>
> + tar --exclude 'journal/*' -czf
>
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz
> btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm
> tallylog vdsm_tests.log wtmp yum.log
> + cd /var/host_log
> + tar -czf
>
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz
> anaconda audit boot.log boot.log-20181210 boot.log-20181211
> boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215
> boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202
> cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby
> grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126
> maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats
> ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga
> secure spooler spooler-20181126 spooler-20181202 spooler-20181210
> spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log
> yum.log-20170101 yum.log-20180101
> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file
> changed as we read it
>
>
>
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_...
>
> Copying the binary logs from journal is not the way to copy logs.
>
> If we need the host journal, it should be collected by CI infra using
> journalctl.
>
But these are collected by the CI, not us. We just take what was already
collected.
Gal, Barak, can you explain this?
This should fix the issue:
https://gerrit.ovirt.org/c/96244/
>
> Nir
>
>>
>>
>>> Any reason why we exclude journal only for /var/log, and do collect the
>>> binary journal
>>> from /var/host_log? I guess it can fail in the same way.
>>>
>>
>> As far as I know, /vatests/integration/vlan_test.pyr/host_log is
>> collected by the CI and placed (copy) there.
>> It does not collect everything, therefore we added the second one.
>>
>>
>>> If we collect logs for integration tests, the most important log is the
>>> journal, and now
>>> we skip it.
>>>
>>
>> I think it appears in /var/host_log.
>>
>>
>>>
>>> We can get the journal in a reliable way like this:
>>>
>>> journalctl --since build-date > /tmp/journal.log
>>>
>>> Nir
>>>
>>>>