[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker
by Evgheni Dereveanchin (oVirt JIRA)
This is a multi-part message in MIME format...
------------=_1515660696-28855-361
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jir... ]
Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:51 AM:
----------------------------------------------------------------------
Looking at the slave, here's the stuck part:
{quote} `-bash -c cd "/home/jenkins" && java -jar slave.jar{quote}
{quote} `-java -jar slave.jar{quote}
{quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote} | |-grep -oP .+?(?=:exported-artifacts){quote}
{quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote} | `-5*[{docker-current}]{quote}
At the same time, I see the following in pstree output of the same node:{quote}
{quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote} `-5*[{docker-current}]{quote}
{quote}-sudo systemctl start docker{quote}
{quote} `-systemctl start docker{quote}
{quote}-sudo -n /bin/yum install -y docker{quote}
{quote} `-yum /bin/yum install -y docker{quote}
{quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote}
{quote} `-systemctl try-restart docker.service{quote}
{quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote} `-docker-current ps -aq -f status=dead{quote}
{quote} `-6*[{docker-current}]{quote}
As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.
was (Author: ederevea):
Looking at the slave, here's the stuck part:
`-bash -c cd "/home/jenkins" && java -jar slave.jar
`-java -jar slave.jar
|-bash -ex /tmp/jenkins6289914333579712645.sh
| `-bash -ex /tmp/jenkins6289914333579712645.sh
| |-grep -oP .+?(?=:exported-artifacts)
| `-sudo -n docker images --format={{.Repository}}:{{.Tag}}
| `-docker-current images --format={{.Repository}}:{{.Tag}}
| `-5*[{docker-current}]
At the same time, I see the following in pstree output of the same node:
-sudo -n docker images --format={{.Repository}}:{{.Tag}}
`-docker-current images --format={{.Repository}}:{{.Tag}}
`-5*[{docker-current}]
-sudo systemctl start docker
`-systemctl start docker
-sudo -n /bin/yum install -y docker
`-yum /bin/yum install -y docker
`-sh /var/tmp/rpm-tmp.gO7ceb 1
`-systemctl try-restart docker.service
-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0
`-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0
`-docker-current ps -aq -f status=dead
`-6*[{docker-current}]
As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.
> jobs freeze due to unresponsive docker
> --------------------------------------
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Reporter: Evgheni Dereveanchin
> Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...
> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
------------=_1515660696-28855-361
Content-Type: text/html; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
<html><body>
<pre>[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jir... ]</pre>
<h3>Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:51 AM:</h3>
<p>Looking at the slave, here's the stuck part: {quote} `-bash -c cd “/home/jenkins” && java -jar slave.jar{quote} {quote} `-java -jar slave.jar{quote} {quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | |-grep -oP .+?(?=:exported-artifacts){quote} {quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-5*[{docker-current}]{quote}</p>
<p>At the same time, I see the following in pstree output of the same node:{quote} {quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-5*[{docker-current}]{quote} {quote}-sudo systemctl start docker{quote} {quote} `-systemctl start docker{quote} {quote}-sudo -n /bin/yum install -y docker{quote} {quote} `-yum /bin/yum install -y docker{quote} {quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote} {quote} `-systemctl try-restart docker.service{quote} {quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-docker-current ps -aq -f status=dead{quote} {quote} `-6*[{docker-current}]{quote}</p>
<p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p>
<p>was (Author: ederevea): Looking at the slave, here's the stuck part:</p>
<pre> `-bash -c cd "/home/jenkins" && java -jar slave.jar
`-java -jar slave.jar
|-bash -ex /tmp/jenkins6289914333579712645.sh
| `-bash -ex /tmp/jenkins6289914333579712645.sh
| |-grep -oP .+?(?=:exported-artifacts)
| `-sudo -n docker images --format={{.Repository}}:{{.Tag}}
| `-docker-current images --format={{.Repository}}:{{.Tag}}
| `-5*[{docker-current}]</pre>
<p>At the same time, I see the following in pstree output of the same node: -sudo -n docker images --format={{.Repository}}:{{.Tag}}</p>
<pre>`-docker-current images --format={{.Repository}}:{{.Tag}}
`-5*[{docker-current}]</pre>
<p>-sudo systemctl start docker</p>
<pre>`-systemctl start docker</pre>
<p>-sudo -n /bin/yum install -y docker</p>
<pre> `-yum /bin/yum install -y docker
`-sh /var/tmp/rpm-tmp.gO7ceb 1
`-systemctl try-restart docker.service</pre>
<p>-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0</p>
<pre> `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0
`-docker-current ps -aq -f status=dead
`-6*[{docker-current}]</pre>
<p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p>
<blockquote><h3>jobs freeze due to unresponsive docker</h3>
<pre> Key: OVIRT-1840
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
Project: oVirt - virtualization made easy
Issue Type: Task
Reporter: Evgheni Dereveanchin
Assignee: infra</pre>
<p>Quite often do I see jobs stuck at various stages for hours that seem related to docker. Example: <a href="http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...">http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...</a> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.</p></blockquote>
<p>— This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100075)</p>
<img src="https://u4043402.ct.sendgrid.net/wf/open?upn=i5TMWGV99amJbNxJpSp2-2BJ33BS..." alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/>
</body></html>
------------=_1515660696-28855-361--