On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
> On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh(a)redhat.com> wrote:
>>
>>
>>
>> On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>
>>> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh(a)redhat.com>
wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh(a)redhat.com>
wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer(a)redhat.com>
wrote:
>>>>>>
>>>>>> We have this failure that pops randomly:
>>>>>>
>>>>>> 1. All tests pass
>>>>>>
>>>>>> 00:13:13.284 ___________________________________ summary
____________________________________
>>>>>> 00:13:13.285 tests: commands succeeded
>>>>>> 00:13:13.286 storage-py27: commands succeeded
>>>>>> 00:13:13.286 storage-py36: commands succeeded
>>>>>> 00:13:13.286 lib-py27: commands succeeded
>>>>>> 00:13:13.287 lib-py36: commands succeeded
>>>>>> 00:13:13.288 network-py27: commands succeeded
>>>>>> 00:13:13.290 network-py36: commands succeeded
>>>>>> 00:13:13.291 virt-py27: commands succeeded
>>>>>> 00:13:13.292 virt-py36: commands succeeded
>>>>>> 00:13:13.293 congratulations :)
>>>>>>
>>>>>>
>>>>>> 2. But we fail to collect logs at the end
>>>>>>
>>>>>> 00:14:35.992
##########################################################
>>>>>> 00:14:35.995 ## Wed Nov 28 17:39:50 UTC 2018 Finished env:
fc28:fedora-28-x86_64
>>>>>> 00:14:35.996 ## took 764 seconds
>>>>>> 00:14:35.997 ## rc = 1
>>>>>> 00:14:35.997
##########################################################
>>>>>> 00:14:36.009 ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>>>>>> 00:14:36.010 ##! Last 20 log entries:
/tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log
>>>>>> 00:14:36.011 ##!
>>>>>> 00:14:36.012
journal/b087148aba6d49b9bbef488e52a48752/system.journal
>>>>>> 00:14:36.013 tar:
journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it
>>>>>> 00:14:36.014
journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal
>>>>>> 00:14:36.015 lastlog
>>>>>> 00:14:36.015 libvirt/
>>>>>> 00:14:36.015 libvirt/lxc/
>>>>>> 00:14:36.015 libvirt/libxl/
>>>>>> 00:14:36.016 libvirt/qemu/
>>>>>> 00:14:36.016
libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log
>>>>>> 00:14:36.017 libvirt/uml/
>>>>>> 00:14:36.017 ovirt-guest-agent/
>>>>>> 00:14:36.017 ovirt-guest-agent/ovirt-guest-agent.log
>>>>>> 00:14:36.017 README
>>>>>> 00:14:36.018 samba/
>>>>>> 00:14:36.018 samba/old/
>>>>>> 00:14:36.018 sssd/
>>>>>> 00:14:36.018 tallylog
>>>>>> 00:14:36.018 wtmp
>>>>>> 00:14:36.018 Took 678 seconds
>>>>>> 00:14:36.018 ===================================
>>>>>> 00:14:36.019 ##!
>>>>>> 00:14:36.019 ##! ERROR
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> 00:14:36.019
##!########################################################
>>>>>>
>>>>>>
>>>>>> This looks like an issue with vdsm check-patch.sh:
>>>>>>
>>>>>> function collect_logs {
>>>>>> res=$?
>>>>>> [ "$res" -ne 0 ] && echo "*** err:
$res"
>>>>>> cd /var/log
>>>>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" *
>>>>>> cd /var/host_log
>>>>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" *
>>>>>> }
>>>>>>
>>>>>> trap collect_logs EXIT
>>>>>>
>>>>>> Seems that tar fail to collect log if the log is modified while
copied, which makes sense.
>>>>>>
>>>>>> We can ignore errors in tar, since log collection should not fail
the build, but I think
>>>>>> a better solution is to avoid collecting any logs since vdsm
writes its own logs during
>>>>>> tests - all the info must be in vdsm log.
>>>>>>
>>>>>> Here is the list of collected logs:
>>>>>>
>>>>>> 00:13:47.280 + tar -cvzf
/home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz
btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog
libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log
>>>>>> 00:13:47.285 btmp
>>>>>> 00:13:47.285 dnf.librepo.log
>>>>>> 00:13:47.299 dnf.log
>>>>>> 00:13:47.309 dnf.rpm.log
>>>>>> 00:13:47.310 faillog
>>>>>> 00:13:47.311 glusterfs/
>>>>>> 00:13:47.312 hawkey.log
>>>>>> 00:13:47.313 journal/
>>>>>> 00:13:47.313 lastlog
>>>>>> 00:13:47.315 libvirt/
>>>>>> 00:13:47.315 libvirt/qemu/
>>>>>> 00:13:47.316 openvswitch/
>>>>>> 00:13:47.317 openvswitch/ovs-vswitchd.log
>>>>>> 00:13:47.318 openvswitch/ovsdb-server.log
>>>>>> 00:13:47.319 README
>>>>>> 00:13:47.320 tallylog
>>>>>> 00:13:47.321 vdsm_tests.log
>>>>>> 00:13:47.342 wtmp
>>>>>> 00:13:47.343 yum.log
>>>>>> 00:13:47.349 + cd /var/host_log
>>>>>> 00:13:47.350 + tar -cvzf
/home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz
anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron
dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt
ovirt-guest-agent README samba sssd tallylog wtmp
>>>>>> 00:13:47.356 anaconda/
>>>>>> 00:13:47.356 anaconda/ifcfg.log
>>>>>> 00:13:47.357 anaconda/ks-script-l5qnynnj.log
>>>>>> 00:13:47.358 anaconda/storage.log
>>>>>> 00:13:47.359 anaconda/program.log
>>>>>> 00:13:47.395 anaconda/ks-script-b5_08tmo.log
>>>>>> 00:13:47.396 anaconda/ks-script-6uks8bp3.log
>>>>>> 00:13:47.397 anaconda/hawkey.log
>>>>>> 00:13:47.398 anaconda/syslog
>>>>>> 00:13:47.406 anaconda/journal.log
>>>>>> 00:13:47.449 anaconda/dnf.librepo.log
>>>>>> 00:13:47.458 anaconda/packaging.log
>>>>>> 00:13:47.465 anaconda/dbus.log
>>>>>> 00:13:47.466 anaconda/anaconda.log
>>>>>> 00:13:47.467 anaconda/ks-script-slrcz39_.log
>>>>>> 00:13:47.503 audit/
>>>>>> 00:13:47.504 audit/audit.log.3
>>>>>> 00:13:47.657 audit/audit.log.2
>>>>>> 00:13:47.814 audit/audit.log.1
>>>>>> 00:13:47.981 audit/audit.log
>>>>>> 00:13:48.008 audit/audit.log.4
>>>>>> 00:13:48.155 boot.log
>>>>>> 00:13:48.156 btmp
>>>>>> 00:13:48.157 chrony/
>>>>>> 00:13:48.159 cloud-init.log
>>>>>> 00:13:48.159 cloud-init-output.log
>>>>>> 00:13:48.161 cron
>>>>>> 00:13:48.162 dnf.librepo.log
>>>>>> 00:13:49.930 dnf.log
>>>>>> 00:13:51.335 dnf.rpm.log
>>>>>> 00:13:51.421 firewalld
>>>>>> 00:13:51.423 glusterfs/
>>>>>> 00:13:51.424 hawkey.log
>>>>>> 00:13:51.704 journal/
>>>>>> 00:13:51.708 journal/b087148aba6d49b9bbef488e52a48752/
>>>>>> 00:13:51.709
journal/b087148aba6d49b9bbef488e52a48752/system.journal
>>>>>> 00:13:55.817 tar:
journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it
>>>>>> 00:13:55.819
journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal
>>>>>> 00:13:55.915 lastlog
>>>>>> 00:13:55.923 libvirt/
>>>>>> 00:13:55.924 libvirt/lxc/
>>>>>> 00:13:55.926 libvirt/libxl/
>>>>>> 00:13:55.927 libvirt/qemu/
>>>>>> 00:13:55.928
libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log
>>>>>> 00:13:55.929 libvirt/uml/
>>>>>> 00:13:55.930 ovirt-guest-agent/
>>>>>> 00:13:55.930 ovirt-guest-agent/ovirt-guest-agent.log
>>>>>> 00:13:55.932 README
>>>>>> 00:13:55.933 samba/
>>>>>> 00:13:55.933 samba/old/
>>>>>> 00:13:55.935 sssd/
>>>>>> 00:13:55.935 tallylog
>>>>>> 00:13:55.935 wtmp
>>>>>>
>>>>>>
>>>>>> Most if not all are lot relevant to vdsm tests, and should not be
collected.
>>>>>>
>>>>>> This was added in:
>>>>>>
>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9
>>>>>> Author: Edward Haas <edwardh(a)redhat.com>
>>>>>> Date: Tue Apr 17 10:53:11 2018 +0300
>>>>>>
>>>>>> automation: Collect logs even when check-patch fails
>>>>>>
>>>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a
>>>>>> Signed-off-by: Edward Haas <edwardh(a)redhat.com>
>>>>>>
>>>>>> Reviewed in:
>>>>>>
https://gerrit.ovirt.org/c/90370
>>>>>>
>>>>>> Edward, can you explain why do we need to collect logs during
check-patch,
>>>>>> and why do we need to collect all the logs in the system?
>>>>>
>>>>>
>>>>> check-patch are running unit and integrations tests.
>>>>> The integration tests are touching the OS and other packages (like
openvswitch).
>>>>> It was added so we can debug why tests failed.
>>>>>
>>>>> I guess we can now separate the unit and integration tests, but it
will not solve
>>>>> the problem presented here.
>>>>> Failing to collect the logs silently sounds a good enough solution to
me.
>>>>
>>>>
>>>> Barak suggested to just exclude the journal:
https://gerrit.ovirt.org/#/c/95850/
>>>
>>>
>>> This is fixed now, thanks!
>
>
> Not fixed yet, we still fail collecting /var/host_log:
>
> + cd /var/host_log
>
> + tar --exclude 'journal/*' -czf
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz
btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog
vdsm_tests.log wtmp yum.log
> + cd /var/host_log
> + tar -czf
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz
anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212
boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201
cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld
glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126
maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine
ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126
spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp
yum.log yum.log-20170101 yum.log-20180101
> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read
it
>
>
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_...
>
> Copying the binary logs from journal is not the way to copy logs.
>
> If we need the host journal, it should be collected by CI infra using journalctl.
This should fix the issue:
https://gerrit.ovirt.org/c/96244/
This was not anough, it fails again:
+ tar --exclude 'journal/*' -czf
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz
anaconda audit boot.log boot.log-20181213 boot.log-20181214
boot.log-20181215 boot.log-20181216 boot.log-20181217
boot.log-20181218 boot.log-20181219 btmp btmp-20181201 cron
cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg
dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal
lastlog libvirt maillog maillog-20181126 maillog-20181202
maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine
ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure
spooler spooler-20181126 spooler-20181202 spooler-20181210
spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log
yum.log-20170101 yum.log-20180101
tar: secure: file changed as we read it
We can exclude also "secure", but this is not the right way.
Can we make a list of logs we are interested in, instead of collecting
everything?
Then we can setup something more reliable to do the collection.
Nir
>
>>
>>
>> Nir
>>
>>>>
>>>>
>>>> Any reason why we exclude journal only for /var/log, and do collect the
binary journal
>>>> from /var/host_log? I guess it can fail in the same way.
>>>
>>>
>>> As far as I know, /var/host_log is collected by the CI and placed (copy)
there.
>>> It does not collect everything, therefore we added the second one.
>>>
>>>>
>>>> If we collect logs for integration tests, the most important log is the
journal, and now
>>>> we skip it.
>>>
>>>
>>> I think it appears in /var/host_log.
>>>
>>>>
>>>>
>>>> We can get the journal in a reliable way like this:
>>>>
>>>> journalctl --since build-date > /tmp/journal.log
>>>>
>>>> Nir