[OVIRT CI] Tests succeeded, build failed

Hi all, Looking in another unrelated vdsm test failure, I found this: 18:36:52 ########################################################## 18:36:52 ## Tue Dec 13 18:36:52 UTC 2016 Finished env: fc24:fedora-24-x86_64 18:36:52 ## took 417 seconds 18:36:52 ## rc = 0 18:36:52 ########################################################## 18:36:52 ## FINISHED SUCCESSFULLY 18:36:52 ########################################################## 18:36:52 ########################################################## The test succeeded, but during cleanup: 18:36:55 + sudo umount --lazy /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-54661/root/proc/filesystems 18:36:55 umount: /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-54661/root/proc/filesystems: mountpoint not found 18:36:55 + echo 'ERROR: Failed to umount /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-54661/root/proc/filesystems.' 18:36:55 ERROR: Failed to umount /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-54661/root/proc/filesystems. 18:36:55 + failed=true 18:36:55 + this_chroot_failed=true 18:36:55 + true 18:36:55 + find /var/cache/mock/ -mindepth 1 -maxdepth 1 -type d -mtime +2 -print0 18:36:55 + xargs -0 -tr sudo rm -rf 18:36:55 ++ virsh list --all --uuid 18:36:55 + true 18:36:55 + echo 'Cleanup script failed, propegating failure to job' 18:36:55 Cleanup script failed, propegating failure to job 18:36:55 + exit 1 18:36:55 POST BUILD TASK : FAILURE 18:36:55 END OF POST BUILD TASK : 0 18:36:55 ESCALATE FAILED POST BUILD TASK TO JOB STATUS 18:36:55 Build step 'Post build task' changed build result to FAILURE 18:36:55 Archiving artifacts 18:37:00 Build step 'Groovy Postbuild' marked build as failure 18:37:00 Started calculate disk usage of build 18:37:00 Finished Calculation of disk usage of build in 0 seconds 18:37:00 Started calculate disk usage of workspace 18:37:01 Finished Calculation of disk usage of workspace in 0 seconds 18:37:01 Finished: FAILURE The build failed because cleanup script failed. I discussed this lot of time with David Caro, trying to convince him that there are 3 possible results for a build: 1. Tests run and passed 2. Tests run and failed 3. System could not run the tests or had another error I know that this adds 33% more work to the CI team, having to handle 3 results instead of two, but we really need this distinction. Barak, do you think we can change the script so setup and cleanup failures are not treated as build failures but build errors? In travis such failure seem to start another build automatically, making developers life much nicer. I kept this build forever so people can inspect it: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc24-x86_64/6291/ Nir

On 13 December 2016 at 22:56, Nir Soffer <nsoffer@redhat.com> wrote:
Barak, do you think we can change the script so setup and cleanup failures are not treated as build failures but build errors?
Doing that means assuming that the cleanup script, that runs flawlessly 100** times a day for all patches in all projects suddenly failed for reasons that have nothing to do with the patch that was just tested. I think we err on the right side of caution now...
In travis such failure seem to start another build automatically, making developers life much nicer.
And in VMware there is no SPM, you can mix local and remote storages on the same nodes, and upload images to the storage domain from the GUI, making the admin's life much nicer. What is your point again? On a more serious note, we may do auto-rerunning at some point, but we need to re-engineer most of the standard-CI system to make that happen, so it'll take a while. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

I kept this build forever so people can inspect it: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc24-x86_64/6291/
Looking closer at this, there are two things here we may be able to address: 1. The cleanup script fails to un-mount a filesystem that is already not mounted. We can probably easily fix that [1]. 2. The reason the cleanup script was trying to cleanup stuff in the 1st place was because a major mess was left around on the node by the vdsm check_merged job that ran on it prior to this job [2]. The check_merged job failed in such a way that made the cleanup script not run at all. I'm still not sure what it the root cause of that, so we'll need to further investigate. I will make a quick fix for #1 so that #2 failures do not cascade into other jobs until we can figure out why is it happening. [1]: https://ovirt-jira.atlassian.net/browse/OVIRT-937 [2]: https://ovirt-jira.atlassian.net/browse/OVIRT-938 -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

On Wed, Dec 14, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wrote:
I kept this build forever so people can inspect it: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc24-x86_64/6291/
Looking closer at this, there are two things here we may be able to address:
1. The cleanup script fails to un-mount a filesystem that is already not mounted. We can probably easily fix that [1]. 2. The reason the cleanup script was trying to cleanup stuff in the 1st place was because a major mess was left around on the node by the vdsm check_merged job that ran on it prior to this job [2]. The check_merged job failed in such a way that made the cleanup script not run at all. I'm still not sure what it the root cause of that, so we'll need to further investigate.
I will make a quick fix for #1 so that #2 failures do not cascade into other jobs until we can figure out why is it happening.
[1]: https://ovirt-jira.atlassian.net/browse/OVIRT-937 [2]: https://ovirt-jira.atlassian.net/browse/OVIRT-938
Thanks Barak! Here is another example: 17:06:48 umount: /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-12777/root/proc/filesystems: mountpoint not found 17:06:48 + echo 'ERROR: Failed to umount /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-12777/root/proc/filesystems.' 17:06:48 ERROR: Failed to umount /var/lib/mock/epel-7-x86_64-95d9ead9d725499a15a9021ba2fe9831-12777/root/proc/filesystems. 17:06:48 + failed=true The build: http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/4766/ Nir
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/
participants (2)
-
Barak Korren
-
Nir Soffer