[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

Evgheni Dereveanchin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Thu Jan 11 09:19:00 UTC 2018


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35638#comment-35638 ] 

Evgheni Dereveanchin commented on OVIRT-1840:
---------------------------------------------

Looking at timestamps of when docker was installed, seems that the slave was broken during this build:
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/3039/console

07:49:56 Running transaction
07:49:57   Updating   : container-storage-setup-0.8.0-3.git1d27ecf.el7.noarch       1/12 
07:49:57   Updating   : 2:oci-umount-2.3.0-1.git51e7c50.el7.x86_64                  2/12 
07:50:35   Updating   : 2:container-selinux-2.33-1.git86f33cd.el7.noarch            3/12 
07:50:36   Updating   : 2:docker-common-1.12.6-68.gitec8512b.el7.centos.x86_64      4/12 
07:50:38   Updating   : 2:docker-client-1.12.6-68.gitec8512b.el7.centos.x86_64      5/12 
07:50:39   Updating   : 2:docker-1.12.6-68.gitec8512b.el7.centos.x86_64             6/12 
13:49:32   Cleanup    : 2:docker-1.12.6-48.git0fdc778.el7.centos.x86_64             7/12
Build timed out (after 360 minutes). Marking the build as failed.

This explains the leftover yum processes on the system, which can block further yum installs or even theoretically corrupt RPMDB if docker decides to unfreeze for some reason.

> jobs freeze due to unresponsive docker
> --------------------------------------
>
>                 Key: OVIRT-1840
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
>             Project: oVirt - virtualization made easy
>          Issue Type: Task
>            Reporter: Evgheni Dereveanchin
>            Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180111/ccd07122/attachment.html>


More information about the Infra mailing list