[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

Evgheni Dereveanchin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Thu Jan 11 08:54:15 UTC 2018


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35637#comment-35637 ] 

Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM:
----------------------------------------------------------------------

Looking at the slave, here's the stuck part:
{{       `-bash -c cd "/home/jenkins" && java  -jar slave.jar}}
{{           `-java -jar slave.jar}}
{{               |-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{               |   `-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{               |       |-grep -oP .+?(?=:exported-artifacts)}}
{{               |       `-sudo -n docker images --format={{.Repository}}:{{.Tag}} }}
{{               |           `-docker-current images --format={{.Repository}}:{{.Tag}} }}
{{               |               `-5*[{docker-current}]}}


At the same time, I see the following in pstree output of the same node:}}
{{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }}
{{   `-docker-current images --format={{.Repository}}:{{.Tag}} }}
{{       `-5*[{docker-current}]}}
{{-sudo systemctl start docker}}
{{   `-systemctl start docker}}
{{-sudo -n /bin/yum install -y docker}}
{{   `-yum /bin/yum install -y docker}}
{{       `-sh /var/tmp/rpm-tmp.gO7ceb 1}}
{{           `-systemctl try-restart docker.service}}
{{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}}
{{   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}}
{{       `-docker-current ps -aq -f status=dead}}
{{           `-6*[{docker-current}]}}

As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.


was (Author: ederevea):
Looking at the slave, here's the stuck part:
{quote}       `-bash -c cd "/home/jenkins" && java  -jar slave.jar{quote}
{quote}           `-java -jar slave.jar{quote}
{quote}               |-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}               |   `-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}               |       |-grep -oP .+?(?=:exported-artifacts){quote}
{quote}               |       `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote}               |           `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote}               |               `-5*[{docker-current}]{quote}


At the same time, I see the following in pstree output of the same node:{quote}
{quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote}   `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote}       `-5*[{docker-current}]{quote}
{quote}-sudo systemctl start docker{quote}
{quote}   `-systemctl start docker{quote}
{quote}-sudo -n /bin/yum install -y docker{quote}
{quote}   `-yum /bin/yum install -y docker{quote}
{quote}       `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote}
{quote}           `-systemctl try-restart docker.service{quote}
{quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote}   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote}       `-docker-current ps -aq -f status=dead{quote}
{quote}           `-6*[{docker-current}]{quote}

As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.

> jobs freeze due to unresponsive docker
> --------------------------------------
>
>                 Key: OVIRT-1840
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
>             Project: oVirt - virtualization made easy
>          Issue Type: Task
>            Reporter: Evgheni Dereveanchin
>            Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180111/8e8ce35e/attachment-0001.html>


More information about the Infra mailing list