[VDSM] all test passed, build failed with "tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it"

We have this failure that pops randomly: 1. All tests pass *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) 2. But we fail to collect logs at the end *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## This looks like an issue with vdsm check-patch.sh: function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * } trap collect_logs EXIT Seems that tar fail to collect log if the log is modified while copied, which makes sense. We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log. Here is the list of collected logs: *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp Most if not all are lot relevant to vdsm tests, and should not be collected. This was added in: commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300 automation: Collect logs even when check-patch fails Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com> Reviewed in: https://gerrit.ovirt.org/c/90370 Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system? Nir

On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed. I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Nir

On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
Nir

On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks! Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way. If we collect logs for integration tests, the most important log is the journal, and now we skip it. We can get the journal in a reliable way like this: journalctl --since build-date > /tmp/journal.log Nir

On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /var/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir

On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log: + cd /var/host_log + tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta... Copying the binary logs from journal is not the way to copy logs. If we need the host journal, it should be collected by CI infra using journalctl. Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /var/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir

On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
We have this failure that pops randomly:
1. All tests pass
*00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :)
2. But we fail to collect logs at the end
*00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!########################################################
This looks like an issue with vdsm check-patch.sh:
function collect_logs { res=$? [ "$res" -ne 0 ] && echo "*** err: $res" cd /var/log tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * cd /var/host_log tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * }
trap collect_logs EXIT
Seems that tar fail to collect log if the log is modified while copied, which makes sense.
We can ignore errors in tar, since log collection should not fail the build, but I think a better solution is to avoid collecting any logs since vdsm writes its own logs during tests - all the info must be in vdsm log.
Here is the list of collected logs:
*00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
Most if not all are lot relevant to vdsm tests, and should not be collected.
This was added in:
commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 Author: Edward Haas <edwardh@redhat.com> Date: Tue Apr 17 10:53:11 2018 +0300
automation: Collect logs even when check-patch fails
Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a Signed-off-by: Edward Haas <edwardh@redhat.com>
Reviewed in: https://gerrit.ovirt.org/c/90370
Edward, can you explain why do we need to collect logs during check-patch, and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /var/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir

On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote:
> We have this failure that pops randomly: > > 1. All tests pass > > *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) > > > 2. But we fail to collect logs at the end > > *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## > > > This looks like an issue with vdsm check-patch.sh: > > function collect_logs { > res=$? > [ "$res" -ne 0 ] && echo "*** err: $res" > cd /var/log > tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * > cd /var/host_log > tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * > } > > trap collect_logs EXIT > > Seems that tar fail to collect log if the log is modified while > copied, which makes sense. > > We can ignore errors in tar, since log collection should not fail > the build, but I think > a better solution is to avoid collecting any logs since vdsm writes > its own logs during > tests - all the info must be in vdsm log. > > Here is the list of collected logs: > > *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp > > > Most if not all are lot relevant to vdsm tests, and should not be > collected. > > This was added in: > > commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 > Author: Edward Haas <edwardh@redhat.com> > Date: Tue Apr 17 10:53:11 2018 +0300 > > automation: Collect logs even when check-patch fails > > Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a > Signed-off-by: Edward Haas <edwardh@redhat.com> > > Reviewed in: > https://gerrit.ovirt.org/c/90370 > > Edward, can you explain why do we need to collect logs during > check-patch, > and why do we need to collect all the logs in the system? >
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
But these are collected by the CI, not us. We just take what was already collected. Gal, Barak, can you explain this?
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /vatests/integration/vlan_test.pyr/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir

On Sun, 16 Dec 2018 at 14:44, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
> > > On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> > wrote: > >> We have this failure that pops randomly: >> >> 1. All tests pass >> >> *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) >> >> >> 2. But we fail to collect logs at the end >> >> *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## >> >> >> This looks like an issue with vdsm check-patch.sh: >> >> function collect_logs { >> res=$? >> [ "$res" -ne 0 ] && echo "*** err: $res" >> cd /var/log >> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >> cd /var/host_log >> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >> } >> >> trap collect_logs EXIT >> >> Seems that tar fail to collect log if the log is modified while >> copied, which makes sense. >> >> We can ignore errors in tar, since log collection should not fail >> the build, but I think >> a better solution is to avoid collecting any logs since vdsm writes >> its own logs during >> tests - all the info must be in vdsm log. >> >> Here is the list of collected logs: >> >> *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp >> >> >> Most if not all are lot relevant to vdsm tests, and should not be >> collected. >> >> This was added in: >> >> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >> Author: Edward Haas <edwardh@redhat.com> >> Date: Tue Apr 17 10:53:11 2018 +0300 >> >> automation: Collect logs even when check-patch fails >> >> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >> Signed-off-by: Edward Haas <edwardh@redhat.com> >> >> Reviewed in: >> https://gerrit.ovirt.org/c/90370 >> >> Edward, can you explain why do we need to collect logs during >> check-patch, >> and why do we need to collect all the logs in the system? >> > > check-patch are running unit and integrations tests. > The integration tests are touching the OS and other packages (like > openvswitch). > It was added so we can debug why tests failed. > > I guess we can now separate the unit and integration tests, but it > will not solve > the problem presented here. > Failing to collect the logs silently sounds a good enough solution > to me. >
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
But these are collected by the CI, not us. We just take what was already collected. Gal, Barak, can you explain this?
As you can see in the code Nir patched, We don't collect it, you do.
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /vatests/integration/vlan_test.pyr/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Sun, Dec 16, 2018 at 3:31 PM Barak Korren <bkorren@redhat.com> wrote:
On Sun, 16 Dec 2018 at 14:44, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
> > > On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> > wrote: > >> >> >> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> >> wrote: >> >>> We have this failure that pops randomly: >>> >>> 1. All tests pass >>> >>> *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) >>> >>> >>> 2. But we fail to collect logs at the end >>> >>> *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## >>> >>> >>> This looks like an issue with vdsm check-patch.sh: >>> >>> function collect_logs { >>> res=$? >>> [ "$res" -ne 0 ] && echo "*** err: $res" >>> cd /var/log >>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >>> cd /var/host_log >>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >>> } >>> >>> trap collect_logs EXIT >>> >>> Seems that tar fail to collect log if the log is modified while >>> copied, which makes sense. >>> >>> We can ignore errors in tar, since log collection should not fail >>> the build, but I think >>> a better solution is to avoid collecting any logs since vdsm >>> writes its own logs during >>> tests - all the info must be in vdsm log. >>> >>> Here is the list of collected logs: >>> >>> *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp >>> >>> >>> Most if not all are lot relevant to vdsm tests, and should not be >>> collected. >>> >>> This was added in: >>> >>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >>> Author: Edward Haas <edwardh@redhat.com> >>> Date: Tue Apr 17 10:53:11 2018 +0300 >>> >>> automation: Collect logs even when check-patch fails >>> >>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >>> Signed-off-by: Edward Haas <edwardh@redhat.com> >>> >>> Reviewed in: >>> https://gerrit.ovirt.org/c/90370 >>> >>> Edward, can you explain why do we need to collect logs during >>> check-patch, >>> and why do we need to collect all the logs in the system? >>> >> >> check-patch are running unit and integrations tests. >> The integration tests are touching the OS and other packages (like >> openvswitch). >> It was added so we can debug why tests failed. >> >> I guess we can now separate the unit and integration tests, but it >> will not solve >> the problem presented here. >> Failing to collect the logs silently sounds a good enough solution >> to me. >> > > Barak suggested to just exclude the journal: > https://gerrit.ovirt.org/#/c/95850/ >
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
But these are collected by the CI, not us. We just take what was already collected. Gal, Barak, can you explain this?
As you can see in the code Nir patched, We don't collect it, you do.
We collected what was already collected by CI. Who is collecting those logs into host_log?
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /vatests/integration/vlan_test.pyr/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir
>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Mon, 17 Dec 2018 at 08:32, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 3:31 PM Barak Korren <bkorren@redhat.com> wrote:
On Sun, 16 Dec 2018 at 14:44, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> > wrote: > >> >> >> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> >> wrote: >> >>> >>> >>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> >>> wrote: >>> >>>> We have this failure that pops randomly: >>>> >>>> 1. All tests pass >>>> >>>> *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) >>>> >>>> >>>> 2. But we fail to collect logs at the end >>>> >>>> *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## >>>> >>>> >>>> This looks like an issue with vdsm check-patch.sh: >>>> >>>> function collect_logs { >>>> res=$? >>>> [ "$res" -ne 0 ] && echo "*** err: $res" >>>> cd /var/log >>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >>>> cd /var/host_log >>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >>>> } >>>> >>>> trap collect_logs EXIT >>>> >>>> Seems that tar fail to collect log if the log is modified while >>>> copied, which makes sense. >>>> >>>> We can ignore errors in tar, since log collection should not fail >>>> the build, but I think >>>> a better solution is to avoid collecting any logs since vdsm >>>> writes its own logs during >>>> tests - all the info must be in vdsm log. >>>> >>>> Here is the list of collected logs: >>>> >>>> *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp >>>> >>>> >>>> Most if not all are lot relevant to vdsm tests, and should not be >>>> collected. >>>> >>>> This was added in: >>>> >>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >>>> Author: Edward Haas <edwardh@redhat.com> >>>> Date: Tue Apr 17 10:53:11 2018 +0300 >>>> >>>> automation: Collect logs even when check-patch fails >>>> >>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >>>> Signed-off-by: Edward Haas <edwardh@redhat.com> >>>> >>>> Reviewed in: >>>> https://gerrit.ovirt.org/c/90370 >>>> >>>> Edward, can you explain why do we need to collect logs during >>>> check-patch, >>>> and why do we need to collect all the logs in the system? >>>> >>> >>> check-patch are running unit and integrations tests. >>> The integration tests are touching the OS and other packages (like >>> openvswitch). >>> It was added so we can debug why tests failed. >>> >>> I guess we can now separate the unit and integration tests, but it >>> will not solve >>> the problem presented here. >>> Failing to collect the logs silently sounds a good enough solution >>> to me. >>> >> >> Barak suggested to just exclude the journal: >> https://gerrit.ovirt.org/#/c/95850/ >> > > This is fixed now, thanks! >
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
But these are collected by the CI, not us. We just take what was already collected. Gal, Barak, can you explain this?
As you can see in the code Nir patched, We don't collect it, you do.
We collected what was already collected by CI. Who is collecting those logs into host_log?
its probably just a bind mount of /var/log on the host, so the host itself is doing that...
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
> Any reason why we exclude journal only for /var/log, and do collect > the binary journal > from /var/host_log? I guess it can fail in the same way. >
As far as I know, /vatests/integration/vlan_test.pyr/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
> If we collect logs for integration tests, the most important log is > the journal, and now > we skip it. >
I think it appears in /var/host_log.
> > We can get the journal in a reliable way like this: > > journalctl --since build-date > /tmp/journal.log > > Nir > >>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Mon, Dec 17, 2018 at 8:39 AM Barak Korren <bkorren@redhat.com> wrote:
On Mon, 17 Dec 2018 at 08:32, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 3:31 PM Barak Korren <bkorren@redhat.com> wrote:
On Sun, 16 Dec 2018 at 14:44, Edward Haas <edwardh@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
> > > On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> > wrote: > >> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> >> wrote: >> >>> >>> >>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> >>>> wrote: >>>> >>>>> We have this failure that pops randomly: >>>>> >>>>> 1. All tests pass >>>>> >>>>> *00:13:13.284* ___________________________________ summary ____________________________________*00:13:13.285* tests: commands succeeded*00:13:13.286* storage-py27: commands succeeded*00:13:13.286* storage-py36: commands succeeded*00:13:13.286* lib-py27: commands succeeded*00:13:13.287* lib-py36: commands succeeded*00:13:13.288* network-py27: commands succeeded*00:13:13.290* network-py36: commands succeeded*00:13:13.291* virt-py27: commands succeeded*00:13:13.292* virt-py36: commands succeeded*00:13:13.293* congratulations :) >>>>> >>>>> >>>>> 2. But we fail to collect logs at the end >>>>> >>>>> *00:14:35.992* ##########################################################*00:14:35.995* ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64*00:14:35.996* ## took 764 seconds*00:14:35.997* ## rc = 1*00:14:35.997* ##########################################################*00:14:36.009* ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* ##!*00:14:36.012* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:14:36.014* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* ===================================*00:14:36.019* ##!*00:14:36.019* ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* ##!######################################################## >>>>> >>>>> >>>>> This looks like an issue with vdsm check-patch.sh: >>>>> >>>>> function collect_logs { >>>>> res=$? >>>>> [ "$res" -ne 0 ] && echo "*** err: $res" >>>>> cd /var/log >>>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >>>>> cd /var/host_log >>>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >>>>> } >>>>> >>>>> trap collect_logs EXIT >>>>> >>>>> Seems that tar fail to collect log if the log is modified while >>>>> copied, which makes sense. >>>>> >>>>> We can ignore errors in tar, since log collection should not >>>>> fail the build, but I think >>>>> a better solution is to avoid collecting any logs since vdsm >>>>> writes its own logs during >>>>> tests - all the info must be in vdsm log. >>>>> >>>>> Here is the list of collected logs: >>>>> >>>>> *00:13:47.280* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* anaconda/ks-script-l5qnynnj.log*00:13:47.358* anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* anaconda/ks-script-b5_08tmo.log*00:13:47.396* anaconda/ks-script-6uks8bp3.log*00:13:47.397* anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* btmp*00:13:48.157* chrony/*00:13:48.159* cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* cron*00:13:48.162* dnf.librepo.log*00:13:49.930* dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* firewalld*00:13:51.423* glusterfs/*00:13:51.424* hawkey.log*00:13:51.704* journal/*00:13:51.708* journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it*00:13:55.819* journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* lastlog*00:13:55.923* libvirt/*00:13:55.924* libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* libvirt/qemu/*00:13:55.928* libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp >>>>> >>>>> >>>>> Most if not all are lot relevant to vdsm tests, and should not >>>>> be collected. >>>>> >>>>> This was added in: >>>>> >>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >>>>> Author: Edward Haas <edwardh@redhat.com> >>>>> Date: Tue Apr 17 10:53:11 2018 +0300 >>>>> >>>>> automation: Collect logs even when check-patch fails >>>>> >>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >>>>> Signed-off-by: Edward Haas <edwardh@redhat.com> >>>>> >>>>> Reviewed in: >>>>> https://gerrit.ovirt.org/c/90370 >>>>> >>>>> Edward, can you explain why do we need to collect logs during >>>>> check-patch, >>>>> and why do we need to collect all the logs in the system? >>>>> >>>> >>>> check-patch are running unit and integrations tests. >>>> The integration tests are touching the OS and other packages >>>> (like openvswitch). >>>> It was added so we can debug why tests failed. >>>> >>>> I guess we can now separate the unit and integration tests, but >>>> it will not solve >>>> the problem presented here. >>>> Failing to collect the logs silently sounds a good enough >>>> solution to me. >>>> >>> >>> Barak suggested to just exclude the journal: >>> https://gerrit.ovirt.org/#/c/95850/ >>> >> >> This is fixed now, thanks! >> > Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
But these are collected by the CI, not us. We just take what was already collected. Gal, Barak, can you explain this?
As you can see in the code Nir patched, We don't collect it, you do.
We collected what was already collected by CI. Who is collecting those logs into host_log?
its probably just a bind mount of /var/log on the host, so the host itself is doing that...
But not all logs are there. That was the reason to collect them from /var/log as well.
This should fix the issue: https://gerrit.ovirt.org/c/96244/
Nir
> > >> Any reason why we exclude journal only for /var/log, and do collect >> the binary journal >> from /var/host_log? I guess it can fail in the same way. >> > > As far as I know, /vatests/integration/vlan_test.pyr/host_log is > collected by the CI and placed (copy) there. > It does not collect everything, therefore we added the second one. > > >> If we collect logs for integration tests, the most important log is >> the journal, and now >> we skip it. >> > > I think it appears in /var/host_log. > > >> >> We can get the journal in a reliable way like this: >> >> journalctl --since build-date > /tmp/journal.log >> >> Nir >> >>>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com> wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com> wrote:
On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote: > > We have this failure that pops randomly: > > 1. All tests pass > > 00:13:13.284 ___________________________________ summary ____________________________________ > 00:13:13.285 tests: commands succeeded > 00:13:13.286 storage-py27: commands succeeded > 00:13:13.286 storage-py36: commands succeeded > 00:13:13.286 lib-py27: commands succeeded > 00:13:13.287 lib-py36: commands succeeded > 00:13:13.288 network-py27: commands succeeded > 00:13:13.290 network-py36: commands succeeded > 00:13:13.291 virt-py27: commands succeeded > 00:13:13.292 virt-py36: commands succeeded > 00:13:13.293 congratulations :) > > > 2. But we fail to collect logs at the end > > 00:14:35.992 ########################################################## > 00:14:35.995 ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64 > 00:14:35.996 ## took 764 seconds > 00:14:35.997 ## rc = 1 > 00:14:35.997 ########################################################## > 00:14:36.009 ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > 00:14:36.010 ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log > 00:14:36.011 ##! > 00:14:36.012 journal/b087148aba6d49b9bbef488e52a48752/system.journal > 00:14:36.013 tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it > 00:14:36.014 journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal > 00:14:36.015 lastlog > 00:14:36.015 libvirt/ > 00:14:36.015 libvirt/lxc/ > 00:14:36.015 libvirt/libxl/ > 00:14:36.016 libvirt/qemu/ > 00:14:36.016 libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log > 00:14:36.017 libvirt/uml/ > 00:14:36.017 ovirt-guest-agent/ > 00:14:36.017 ovirt-guest-agent/ovirt-guest-agent.log > 00:14:36.017 README > 00:14:36.018 samba/ > 00:14:36.018 samba/old/ > 00:14:36.018 sssd/ > 00:14:36.018 tallylog > 00:14:36.018 wtmp > 00:14:36.018 Took 678 seconds > 00:14:36.018 =================================== > 00:14:36.019 ##! > 00:14:36.019 ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 00:14:36.019 ##!######################################################## > > > This looks like an issue with vdsm check-patch.sh: > > function collect_logs { > res=$? > [ "$res" -ne 0 ] && echo "*** err: $res" > cd /var/log > tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * > cd /var/host_log > tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * > } > > trap collect_logs EXIT > > Seems that tar fail to collect log if the log is modified while copied, which makes sense. > > We can ignore errors in tar, since log collection should not fail the build, but I think > a better solution is to avoid collecting any logs since vdsm writes its own logs during > tests - all the info must be in vdsm log. > > Here is the list of collected logs: > > 00:13:47.280 + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log > 00:13:47.285 btmp > 00:13:47.285 dnf.librepo.log > 00:13:47.299 dnf.log > 00:13:47.309 dnf.rpm.log > 00:13:47.310 faillog > 00:13:47.311 glusterfs/ > 00:13:47.312 hawkey.log > 00:13:47.313 journal/ > 00:13:47.313 lastlog > 00:13:47.315 libvirt/ > 00:13:47.315 libvirt/qemu/ > 00:13:47.316 openvswitch/ > 00:13:47.317 openvswitch/ovs-vswitchd.log > 00:13:47.318 openvswitch/ovsdb-server.log > 00:13:47.319 README > 00:13:47.320 tallylog > 00:13:47.321 vdsm_tests.log > 00:13:47.342 wtmp > 00:13:47.343 yum.log > 00:13:47.349 + cd /var/host_log > 00:13:47.350 + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp > 00:13:47.356 anaconda/ > 00:13:47.356 anaconda/ifcfg.log > 00:13:47.357 anaconda/ks-script-l5qnynnj.log > 00:13:47.358 anaconda/storage.log > 00:13:47.359 anaconda/program.log > 00:13:47.395 anaconda/ks-script-b5_08tmo.log > 00:13:47.396 anaconda/ks-script-6uks8bp3.log > 00:13:47.397 anaconda/hawkey.log > 00:13:47.398 anaconda/syslog > 00:13:47.406 anaconda/journal.log > 00:13:47.449 anaconda/dnf.librepo.log > 00:13:47.458 anaconda/packaging.log > 00:13:47.465 anaconda/dbus.log > 00:13:47.466 anaconda/anaconda.log > 00:13:47.467 anaconda/ks-script-slrcz39_.log > 00:13:47.503 audit/ > 00:13:47.504 audit/audit.log.3 > 00:13:47.657 audit/audit.log.2 > 00:13:47.814 audit/audit.log.1 > 00:13:47.981 audit/audit.log > 00:13:48.008 audit/audit.log.4 > 00:13:48.155 boot.log > 00:13:48.156 btmp > 00:13:48.157 chrony/ > 00:13:48.159 cloud-init.log > 00:13:48.159 cloud-init-output.log > 00:13:48.161 cron > 00:13:48.162 dnf.librepo.log > 00:13:49.930 dnf.log > 00:13:51.335 dnf.rpm.log > 00:13:51.421 firewalld > 00:13:51.423 glusterfs/ > 00:13:51.424 hawkey.log > 00:13:51.704 journal/ > 00:13:51.708 journal/b087148aba6d49b9bbef488e52a48752/ > 00:13:51.709 journal/b087148aba6d49b9bbef488e52a48752/system.journal > 00:13:55.817 tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it > 00:13:55.819 journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal > 00:13:55.915 lastlog > 00:13:55.923 libvirt/ > 00:13:55.924 libvirt/lxc/ > 00:13:55.926 libvirt/libxl/ > 00:13:55.927 libvirt/qemu/ > 00:13:55.928 libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log > 00:13:55.929 libvirt/uml/ > 00:13:55.930 ovirt-guest-agent/ > 00:13:55.930 ovirt-guest-agent/ovirt-guest-agent.log > 00:13:55.932 README > 00:13:55.933 samba/ > 00:13:55.933 samba/old/ > 00:13:55.935 sssd/ > 00:13:55.935 tallylog > 00:13:55.935 wtmp > > > Most if not all are lot relevant to vdsm tests, and should not be collected. > > This was added in: > > commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 > Author: Edward Haas <edwardh@redhat.com> > Date: Tue Apr 17 10:53:11 2018 +0300 > > automation: Collect logs even when check-patch fails > > Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a > Signed-off-by: Edward Haas <edwardh@redhat.com> > > Reviewed in: > https://gerrit.ovirt.org/c/90370 > > Edward, can you explain why do we need to collect logs during check-patch, > and why do we need to collect all the logs in the system?
check-patch are running unit and integrations tests. The integration tests are touching the OS and other packages (like openvswitch). It was added so we can debug why tests failed.
I guess we can now separate the unit and integration tests, but it will not solve the problem presented here. Failing to collect the logs silently sounds a good enough solution to me.
Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/
This is fixed now, thanks!
Not fixed yet, we still fail collecting /var/host_log:
+ cd /var/host_log
+ tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log + cd /var/host_log + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_sta...
Copying the binary logs from journal is not the way to copy logs.
If we need the host journal, it should be collected by CI infra using journalctl.
This should fix the issue: https://gerrit.ovirt.org/c/96244/
This was not anough, it fails again: + tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 boot.log-20181217 boot.log-20181218 boot.log-20181219 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 tar: secure: file changed as we read it We can exclude also "secure", but this is not the right way. Can we make a list of logs we are interested in, instead of collecting everything? Then we can setup something more reliable to do the collection. Nir
Nir
Any reason why we exclude journal only for /var/log, and do collect the binary journal from /var/host_log? I guess it can fail in the same way.
As far as I know, /var/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is the journal, and now we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir

On Wed, Dec 19, 2018 at 7:07 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <edwardh@redhat.com> wrote:
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <edwardh@redhat.com>
wrote:
On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <edwardh@redhat.com>
wrote:
> > > > On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <nsoffer@redhat.com> wrote: >> >> We have this failure that pops randomly: >> >> 1. All tests pass >> >> 00:13:13.284 ___________________________________ summary
>> 00:13:13.285 tests: commands succeeded >> 00:13:13.286 storage-py27: commands succeeded >> 00:13:13.286 storage-py36: commands succeeded >> 00:13:13.286 lib-py27: commands succeeded >> 00:13:13.287 lib-py36: commands succeeded >> 00:13:13.288 network-py27: commands succeeded >> 00:13:13.290 network-py36: commands succeeded >> 00:13:13.291 virt-py27: commands succeeded >> 00:13:13.292 virt-py36: commands succeeded >> 00:13:13.293 congratulations :) >> >> >> 2. But we fail to collect logs at the end >> >> 00:14:35.992 ########################################################## >> 00:14:35.995 ## Wed Nov 28 17:39:50 UTC 2018 Finished env: fc28:fedora-28-x86_64 >> 00:14:35.996 ## took 764 seconds >> 00:14:35.997 ## rc = 1 >> 00:14:35.997 ########################################################## >> 00:14:36.009 ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv >> 00:14:36.010 ##! Last 20 log entries: /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log >> 00:14:36.011 ##! >> 00:14:36.012 journal/b087148aba6d49b9bbef488e52a48752/system.journal >> 00:14:36.013 tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it >> 00:14:36.014 journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal >> 00:14:36.015 lastlog >> 00:14:36.015 libvirt/ >> 00:14:36.015 libvirt/lxc/ >> 00:14:36.015 libvirt/libxl/ >> 00:14:36.016 libvirt/qemu/ >> 00:14:36.016 libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log >> 00:14:36.017 libvirt/uml/ >> 00:14:36.017 ovirt-guest-agent/ >> 00:14:36.017 ovirt-guest-agent/ovirt-guest-agent.log >> 00:14:36.017 README >> 00:14:36.018 samba/ >> 00:14:36.018 samba/old/ >> 00:14:36.018 sssd/ >> 00:14:36.018 tallylog >> 00:14:36.018 wtmp >> 00:14:36.018 Took 678 seconds >> 00:14:36.018 =================================== >> 00:14:36.019 ##! >> 00:14:36.019 ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> 00:14:36.019 ##!######################################################## >> >> >> This looks like an issue with vdsm check-patch.sh: >> >> function collect_logs { >> res=$? >> [ "$res" -ne 0 ] && echo "*** err: $res" >> cd /var/log >> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >> cd /var/host_log >> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >> } >> >> trap collect_logs EXIT >> >> Seems that tar fail to collect log if the log is modified while copied, which makes sense. >> >> We can ignore errors in tar, since log collection should not fail the build, but I think >> a better solution is to avoid collecting any logs since vdsm writes its own logs during >> tests - all the info must be in vdsm log. >> >> Here is the list of collected logs: >> >> 00:13:47.280 + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp yum.log >> 00:13:47.285 btmp >> 00:13:47.285 dnf.librepo.log >> 00:13:47.299 dnf.log >> 00:13:47.309 dnf.rpm.log >> 00:13:47.310 faillog >> 00:13:47.311 glusterfs/ >> 00:13:47.312 hawkey.log >> 00:13:47.313 journal/ >> 00:13:47.313 lastlog >> 00:13:47.315 libvirt/ >> 00:13:47.315 libvirt/qemu/ >> 00:13:47.316 openvswitch/ >> 00:13:47.317 openvswitch/ovs-vswitchd.log >> 00:13:47.318 openvswitch/ovsdb-server.log >> 00:13:47.319 README >> 00:13:47.320 tallylog >> 00:13:47.321 vdsm_tests.log >> 00:13:47.342 wtmp >> 00:13:47.343 yum.log >> 00:13:47.349 + cd /var/host_log >> 00:13:47.350 + tar -cvzf /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp >> 00:13:47.356 anaconda/ >> 00:13:47.356 anaconda/ifcfg.log >> 00:13:47.357 anaconda/ks-script-l5qnynnj.log >> 00:13:47.358 anaconda/storage.log >> 00:13:47.359 anaconda/program.log >> 00:13:47.395 anaconda/ks-script-b5_08tmo.log >> 00:13:47.396 anaconda/ks-script-6uks8bp3.log >> 00:13:47.397 anaconda/hawkey.log >> 00:13:47.398 anaconda/syslog >> 00:13:47.406 anaconda/journal.log >> 00:13:47.449 anaconda/dnf.librepo.log >> 00:13:47.458 anaconda/packaging.log >> 00:13:47.465 anaconda/dbus.log >> 00:13:47.466 anaconda/anaconda.log >> 00:13:47.467 anaconda/ks-script-slrcz39_.log >> 00:13:47.503 audit/ >> 00:13:47.504 audit/audit.log.3 >> 00:13:47.657 audit/audit.log.2 >> 00:13:47.814 audit/audit.log.1 >> 00:13:47.981 audit/audit.log >> 00:13:48.008 audit/audit.log.4 >> 00:13:48.155 boot.log >> 00:13:48.156 btmp >> 00:13:48.157 chrony/ >> 00:13:48.159 cloud-init.log >> 00:13:48.159 cloud-init-output.log >> 00:13:48.161 cron >> 00:13:48.162 dnf.librepo.log >> 00:13:49.930 dnf.log >> 00:13:51.335 dnf.rpm.log >> 00:13:51.421 firewalld >> 00:13:51.423 glusterfs/ >> 00:13:51.424 hawkey.log >> 00:13:51.704 journal/ >> 00:13:51.708 journal/b087148aba6d49b9bbef488e52a48752/ >> 00:13:51.709 journal/b087148aba6d49b9bbef488e52a48752/system.journal >> 00:13:55.817 tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it >> 00:13:55.819 journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal >> 00:13:55.915 lastlog >> 00:13:55.923 libvirt/ >> 00:13:55.924 libvirt/lxc/ >> 00:13:55.926 libvirt/libxl/ >> 00:13:55.927 libvirt/qemu/ >> 00:13:55.928 libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log >> 00:13:55.929 libvirt/uml/ >> 00:13:55.930 ovirt-guest-agent/ >> 00:13:55.930 ovirt-guest-agent/ovirt-guest-agent.log >> 00:13:55.932 README >> 00:13:55.933 samba/ >> 00:13:55.933 samba/old/ >> 00:13:55.935 sssd/ >> 00:13:55.935 tallylog >> 00:13:55.935 wtmp >> >> >> Most if not all are lot relevant to vdsm tests, and should not be
collected. >>>>>>> >>>>>>> This was added in: >>>>>>> >>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >>>>>>> Author: Edward Haas <edwardh@redhat.com> >>>>>>> Date: Tue Apr 17 10:53:11 2018 +0300 >>>>>>> >>>>>>> automation: Collect logs even when check-patch fails >>>>>>> >>>>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >>>>>>> Signed-off-by: Edward Haas <edwardh@redhat.com> >>>>>>> >>>>>>> Reviewed in: >>>>>>> https://gerrit.ovirt.org/c/90370 >>>>>>> >>>>>>> Edward, can you explain why do we need to collect logs during check-patch, >>>>>>> and why do we need to collect all the logs in the system? >>>>>> >>>>>> >>>>>> check-patch are running unit and integrations tests. >>>>>> The integration tests are touching the OS and other packages (like openvswitch). >>>>>> It was added so we can debug why tests failed. >>>>>> >>>>>> I guess we can now separate the unit and integration tests, but it will not solve >>>>>> the problem presented here. >>>>>> Failing to collect the logs silently sounds a good enough solution to me. >>>>> >>>>> >>>>> Barak suggested to just exclude the journal: https://gerrit.ovirt.org/#/c/95850/ >>>> >>>> >>>> This is fixed now, thanks! >> >> >> Not fixed yet, we still fail collecting /var/host_log: >> >> + cd /var/host_log >> >> + tar --exclude 'journal/*' -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm tallylog vdsm_tests.log wtmp yum.log >> + cd /var/host_log >> + tar -czf /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181210 boot.log-20181211 boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101 >> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file changed as we read it >> >> https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/825/nodes/119/steps/209/log/?start=0 >> >> Copying the binary logs from journal is not the way to copy logs. >> >> If we need the host journal, it should be collected by CI infra using journalctl. > > > This should fix the issue: > https://gerrit.ovirt.org/c/96244/
This was not anough, it fails again:
+ tar --exclude 'journal/*' -czf
/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz anaconda audit boot.log boot.log-20181213 boot.log-20181214 boot.log-20181215 boot.log-20181216 boot.log-20181217 boot.log-20181218 boot.log-20181219 btmp btmp-20181201 cron cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure spooler spooler-20181126 spooler-20181202 spooler-20181210 spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log yum.log-20170101 yum.log-20180101
tar: secure: file changed as we read it
We can exclude also "secure", but this is not the right way.
Can we make a list of logs we are interested in, instead of collecting everything?
Then we can setup something more reliable to do the collection.
This should finally fix the issue: https://gerrit.ovirt.org/c/96412/
Nir
Nir
Any reason why we exclude journal only for /var/log, and do collect
from /var/host_log? I guess it can fail in the same way.
As far as I know, /var/host_log is collected by the CI and placed (copy) there. It does not collect everything, therefore we added the second one.
If we collect logs for integration tests, the most important log is
the binary journal the journal, and now
we skip it.
I think it appears in /var/host_log.
We can get the journal in a reliable way like this:
journalctl --since build-date > /tmp/journal.log
Nir
participants (3)
-
Barak Korren
-
Edward Haas
-
Nir Soffer