
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=39797#comment-39797 ] Barak Korren commented on OVIRT-2794: ------------------------------------- This was a bit Puzzling, we've seen issues between {{docker_cleanup.py}} and Docker appear sporadically in the past, and therefore have have made the job code generally not fail when {{docker_cleanup.py}} fails, and instead send an email to the infra list. It turn out that was only true for the V2 code, for the V1 code (which is still used in the manual job and the nightly jobs) thos failures could still arise. We did verify that {{docker_cleanup.py}} works on CentOS 7 with the Python 3 docker API client before merging the patch, so its strange we did not see the issue then. [~accountid:557058:5ca52a09-2675-4285-a044-12ad20f6166a] some of your statements above seem to include some wrong assumption about how the system is built. We're not actually exposing the host's Docker deamon to the CI code, instead we we our own docker instance running inside the container that is used to run the CI code. That way we can ensure there can be no cross-talk when running multiple CI containers on the same hosts. [~accountid:557058:cc1e0e66-9881-45e2-b0b7-ccaa3e60f26e] as far as using podman, I think doing that at this point will be quite a challenge for a number of reasons: # We're currently using OpenShift 3.7 to manage our containers, this implies that we must run Docker on our hosts, since AFAIK OpenShift only started supporting CRIO in 4.0 or 4.1. # To allow CI scripts and tests suits to use Docker we run nested Docker instances inside the CI containers. We know that Docker in Docker work well for our use cases. Running Podman in Docker will probably be more challenging. # Since we're still using {{mock}} to encapsulate the CI script inside the CI container, we're bind-mounting the docker socket from the container into mock. We know there are issues when running Podman in mock, so solving those will take some work. # People that write CI scripts and suits tend to expect things to "just work" in CI like it does on their laptops, and hence tend to use Docker commands. Removing docker will force everyone to learn Podman, and we'll need to make changes everywhere. Out current suspicion is that this issue may have to do with the particular version Docker that is installed inside the CI container. While our {{global_setup.sh}} script generally keeps Docker up to date on the CI slaves, we've intentionally skipped that update code when running in a container. I suspect that the version of Docker that is in the CI containers is older then the once running on the CI slaves. That would explain why we did not see this issue when working on the {{docker_cleanup.py}} patch, since that was tested on the the normal slaves and not the containers. Here is what I think we should do now: # Verify again, that {{docker_cleanup.py}} woks well on CentOS with the Python 3 Docker client API . # If so, inspect the version of Docker we have in the containers and finally # Build an updated container image with a newer version of Docker as needed Note that updating the container image will require us to tests it thoroughly and ensure it can properly run both OST and {{kubevirt-ci}}.
OST is broken since this morning - looks like infra issue ---------------------------------------------------------
Key: OVIRT-2794 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2794 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Nir Soffer Assignee: infra
The last successful build was today at 08:10: Since then all builds fail very early with the error below - which is not related to oVirt. {code} Removing image: sha256:f8e5aa8e979155e074411bfef9adade6cdcdf3a5a2eb1d5ad2dbf0288d585ffa, force=True Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/docker/api/client.py", line 222, in _raise_for_status response.raise_for_status() File "/usr/lib/python3.6/site-packages/requests/models.py", line 893, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localunixsocket/v1.30/images/sha256:f8e5aa8e979155e074411bfef9adade6cdcdf3a5a2eb1d5ad2dbf0288d585ffa?force=True&noprune=False During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/jenkins/workspace/ovirt-system-tests_manual/jenkins/scripts/docker_cleanup.py", line 349, in <module> main() File "/home/jenkins/workspace/ovirt-system-tests_manual/jenkins/scripts/docker_cleanup.py", line 37, in main safe_image_cleanup(client, whitelisted_repos) File "/home/jenkins/workspace/ovirt-system-tests_manual/jenkins/scripts/docker_cleanup.py", line 107, in safe_image_cleanup _safe_rm(client, parent) File "/home/jenkins/workspace/ovirt-system-tests_manual/jenkins/scripts/docker_cleanup.py", line 329, in _safe_rm client.images.remove(image_id, force=force) File "/usr/lib/python3.6/site-packages/docker/models/images.py", line 288, in remove self.client.api.remove_image(*args, **kwargs) File "/usr/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped return f(self, resource_id, *args, **kwargs) File "/usr/lib/python3.6/site-packages/docker/api/image.py", line 481, in remove_image return self._result(res, True) File "/usr/lib/python3.6/site-packages/docker/api/client.py", line 228, in _result self._raise_for_status(response) File "/usr/lib/python3.6/site-packages/docker/api/client.py", line 224, in _raise_for_status raise create_api_error_from_http_exception(e) File "/usr/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception raise cls(e, response=response, explanation=explanation) docker.errors.NotFound: 404 Client Error: Not Found ("reference does not exist") Aborting. Build step 'Execute shell' marked build as failure {code} x [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5542/console> #5542 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5542/> Sep 5, 2019 3:02 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5542/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5541/console> #5541 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5541/> Sep 5, 2019 3:02 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5541/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5540/console> #5540 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5540/> Sep 5, 2019 3:01 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5540/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5539/console> #5539 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5539/> Sep 5, 2019 2:13 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5539/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5538/console> #5538 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5538/> Sep 5, 2019 1:58 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5538/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5537/console> #5537 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5537/> Sep 5, 2019 1:50 PM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5537/> [image: Failed > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5536/console> #5536 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5536/> Sep 5, 2019 10:21 AM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5536/> [image: x] <http://jenkins.ovirt.org/job/ovirt-system-tests_manual/jobConfigHistory/showDiffFiles?timestamp1=2019-08-27_12-38-35×tamp2=2019-09-05_08-22-23> [image: Success > Console Output] <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5535/console> #5535 <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5535/> Sep 5, 2019 8:10 AM <https://jenkins.ovirt.org/job/ovirt-system-tests_manual/5535/>
-- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100109)