[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

This is a multi-part message in MIME format... ------------=_1515660854-31309-326 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit [ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35637#comment-35637 ] Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM: ---------------------------------------------------------------------- Looking at the slave, here's the stuck part: {{ `-bash -c cd "/home/jenkins" && java -jar slave.jar}} {{ `-java -jar slave.jar}} {{ |-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | `-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | |-grep -oP .+?(?=:exported-artifacts)}} {{ | `-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ | `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ | `-5*[{docker-current}]}} At the same time, I see the following in pstree output of the same node:}} {{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ `-5*[{docker-current}]}} {{-sudo systemctl start docker}} {{ `-systemctl start docker}} {{-sudo -n /bin/yum install -y docker}} {{ `-yum /bin/yum install -y docker}} {{ `-sh /var/tmp/rpm-tmp.gO7ceb 1}} {{ `-systemctl try-restart docker.service}} {{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}} {{ `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}} {{ `-docker-current ps -aq -f status=dead}} {{ `-6*[{docker-current}]}} As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it. was (Author: ederevea): Looking at the slave, here's the stuck part: {quote} `-bash -c cd "/home/jenkins" && java -jar slave.jar{quote} {quote} `-java -jar slave.jar{quote} {quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | |-grep -oP .+?(?=:exported-artifacts){quote} {quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-5*[{docker-current}]{quote} At the same time, I see the following in pstree output of the same node:{quote} {quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-5*[{docker-current}]{quote} {quote}-sudo systemctl start docker{quote} {quote} `-systemctl start docker{quote} {quote}-sudo -n /bin/yum install -y docker{quote} {quote} `-yum /bin/yum install -y docker{quote} {quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote} {quote} `-systemctl try-restart docker.service{quote} {quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote} {quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote} {quote} `-docker-current ps -aq -f status=dead{quote} {quote} `-6*[{docker-current}]{quote} As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.
jobs freeze due to unresponsive docker --------------------------------------
Key: OVIRT-1840 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840 Project: oVirt - virtualization made easy Issue Type: Task Reporter: Evgheni Dereveanchin Assignee: infra
Quite often do I see jobs stuck at various stages for hours that seem related to docker. Example: http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64... There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.
-- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100075) ------------=_1515660854-31309-326 Content-Type: text/html; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 7bit <html><body> <pre>[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.p... ]</pre> <h3>Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM:</h3> <p>Looking at the slave, here's the stuck part: {{ `-bash -c cd “/home/jenkins” && java -jar slave.jar}} {{ `-java -jar slave.jar}} {{ |-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | `-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | |-grep -oP .+?(?=:exported-artifacts)}} {{ | `-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ | `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ | `-5*[{docker-current}]}}</p> <p>At the same time, I see the following in pstree output of the same node:}} {{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ `-5*[{docker-current}]}} {{-sudo systemctl start docker}} {{ `-systemctl start docker}} {{-sudo -n /bin/yum install -y docker}} {{ `-yum /bin/yum install -y docker}} {{ `-sh /var/tmp/rpm-tmp.gO7ceb 1}} {{ `-systemctl try-restart docker.service}} {{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0}} {{ `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0}} {{ `-docker-current ps -aq -f status=dead}} {{ `-6*[{docker-current}]}}</p> <p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p> <p>was (Author: ederevea): Looking at the slave, here's the stuck part: {quote} `-bash -c cd “/home/jenkins” && java -jar slave.jar{quote} {quote} `-java -jar slave.jar{quote} {quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | |-grep -oP .+?(?=:exported-artifacts){quote} {quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-5*[{docker-current}]{quote}</p> <p>At the same time, I see the following in pstree output of the same node:{quote} {quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-5*[{docker-current}]{quote} {quote}-sudo systemctl start docker{quote} {quote} `-systemctl start docker{quote} {quote}-sudo -n /bin/yum install -y docker{quote} {quote} `-yum /bin/yum install -y docker{quote} {quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote} {quote} `-systemctl try-restart docker.service{quote} {quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-docker-current ps -aq -f status=dead{quote} {quote} `-6*[{docker-current}]{quote}</p> <p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p> <blockquote><h3>jobs freeze due to unresponsive docker</h3> <pre> Key: OVIRT-1840 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840 Project: oVirt - virtualization made easy Issue Type: Task Reporter: Evgheni Dereveanchin Assignee: infra</pre> <p>Quite often do I see jobs stuck at various stages for hours that seem related to docker. Example: <a href="http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console">http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console</a> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.</p></blockquote> <p>— This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100075)</p> <img src="https://u4043402.ct.sendgrid.net/wf/open?upn=i5TMWGV99amJbNxJpSp2-2BJ33BSM3t..." alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/> </body></html> ------------=_1515660854-31309-326--
participants (1)
-
Evgheni Dereveanchin (oVirt JIRA)