[JIRA] (OVIRT-1842) Ensure mock is insalled and up-to-date from global_setup,sh
by Barak Korren (oVirt JIRA)
Barak Korren created OVIRT-1842:
-----------------------------------
Summary: Ensure mock is insalled and up-to-date from global_setup,sh
Key: OVIRT-1842
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1842
Project: oVirt - virtualization made easy
Issue Type: Outage
Components: Jenkins Slaves, mock_runner
Reporter: Barak Korren
Assignee: infra
`{{mock}}` is probably the most important package for CI at this point. This
makes `{{global_setup.sh}}` ensure its installed and up to date.
Another benefit to this is that newer mock versions will be tested automatically when newer repo snapshots are configured for slaves.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
6 years, 10 months
[JIRA] (OVIRT-1841) mock now setus up chroots without networking
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1841?page=com.atlassian.jir... ]
Barak Korren reassigned OVIRT-1841:
-----------------------------------
Assignee: Barak Korren (was: infra)
> mock now setus up chroots without networking
> --------------------------------------------
>
> Key: OVIRT-1841
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1841
> Project: oVirt - virtualization made easy
> Issue Type: Outage
> Components: mock_runner
> Reporter: Barak Korren
> Assignee: Barak Korren
> Priority: Highest
>
> '{{mock}}' had been changed so that network setup became optional, and turned off by default.
> We need to:
> # Make '{{mock_runner.sh}}' turn networking on by default
> # Upgrade '{{mock}}' on all slave so that a newer version with the network setting is installed.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
6 years, 10 months
[JIRA] (OVIRT-1841) mock now setus up chroots without networking
by Barak Korren (oVirt JIRA)
Barak Korren created OVIRT-1841:
-----------------------------------
Summary: mock now setus up chroots without networking
Key: OVIRT-1841
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1841
Project: oVirt - virtualization made easy
Issue Type: Outage
Components: mock_runner
Reporter: Barak Korren
Assignee: infra
Priority: Highest
'{{mock}}' had been changed so that network setup became optional, and turned off by default.
We need to:
# Make '{{mock_runner.sh}}' turn networking on by default
# Upgrade '{{mock}}' on all slave so that a newer version with the network setting is installed.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
6 years, 10 months
[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker
by Evgheni Dereveanchin (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jir... ]
Evgheni Dereveanchin commented on OVIRT-1840:
---------------------------------------------
Looking at timestamps of when docker was installed, seems that the slave was broken during this build:
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/3039/con...
07:49:56 Running transaction
07:49:57 Updating : container-storage-setup-0.8.0-3.git1d27ecf.el7.noarch 1/12
07:49:57 Updating : 2:oci-umount-2.3.0-1.git51e7c50.el7.x86_64 2/12
07:50:35 Updating : 2:container-selinux-2.33-1.git86f33cd.el7.noarch 3/12
07:50:36 Updating : 2:docker-common-1.12.6-68.gitec8512b.el7.centos.x86_64 4/12
07:50:38 Updating : 2:docker-client-1.12.6-68.gitec8512b.el7.centos.x86_64 5/12
07:50:39 Updating : 2:docker-1.12.6-68.gitec8512b.el7.centos.x86_64 6/12
13:49:32 Cleanup : 2:docker-1.12.6-48.git0fdc778.el7.centos.x86_64 7/12
Build timed out (after 360 minutes). Marking the build as failed.
This explains the leftover yum processes on the system, which can block further yum installs or even theoretically corrupt RPMDB if docker decides to unfreeze for some reason.
> jobs freeze due to unresponsive docker
> --------------------------------------
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Reporter: Evgheni Dereveanchin
> Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...
> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
6 years, 10 months
[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker
by Evgheni Dereveanchin (oVirt JIRA)
This is a multi-part message in MIME format...
------------=_1515660854-31309-326
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jir... ]
Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM:
----------------------------------------------------------------------
Looking at the slave, here's the stuck part:
{{ `-bash -c cd "/home/jenkins" && java -jar slave.jar}}
{{ `-java -jar slave.jar}}
{{ |-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{ | `-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{ | |-grep -oP .+?(?=:exported-artifacts)}}
{{ | `-sudo -n docker images --format={{.Repository}}:{{.Tag}} }}
{{ | `-docker-current images --format={{.Repository}}:{{.Tag}} }}
{{ | `-5*[{docker-current}]}}
At the same time, I see the following in pstree output of the same node:}}
{{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }}
{{ `-docker-current images --format={{.Repository}}:{{.Tag}} }}
{{ `-5*[{docker-current}]}}
{{-sudo systemctl start docker}}
{{ `-systemctl start docker}}
{{-sudo -n /bin/yum install -y docker}}
{{ `-yum /bin/yum install -y docker}}
{{ `-sh /var/tmp/rpm-tmp.gO7ceb 1}}
{{ `-systemctl try-restart docker.service}}
{{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}}
{{ `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0}}
{{ `-docker-current ps -aq -f status=dead}}
{{ `-6*[{docker-current}]}}
As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.
was (Author: ederevea):
Looking at the slave, here's the stuck part:
{quote} `-bash -c cd "/home/jenkins" && java -jar slave.jar{quote}
{quote} `-java -jar slave.jar{quote}
{quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote} | |-grep -oP .+?(?=:exported-artifacts){quote}
{quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote} | `-5*[{docker-current}]{quote}
At the same time, I see the following in pstree output of the same node:{quote}
{quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote} `-5*[{docker-current}]{quote}
{quote}-sudo systemctl start docker{quote}
{quote} `-systemctl start docker{quote}
{quote}-sudo -n /bin/yum install -y docker{quote}
{quote} `-yum /bin/yum install -y docker{quote}
{quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote}
{quote} `-systemctl try-restart docker.service{quote}
{quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0{quote}
{quote} `-docker-current ps -aq -f status=dead{quote}
{quote} `-6*[{docker-current}]{quote}
As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.
> jobs freeze due to unresponsive docker
> --------------------------------------
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Reporter: Evgheni Dereveanchin
> Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...
> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
------------=_1515660854-31309-326
Content-Type: text/html; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
<html><body>
<pre>[ https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jir... ]</pre>
<h3>Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM:</h3>
<p>Looking at the slave, here's the stuck part: {{ `-bash -c cd “/home/jenkins” && java -jar slave.jar}} {{ `-java -jar slave.jar}} {{ |-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | `-bash -ex /tmp/jenkins6289914333579712645.sh}} {{ | |-grep -oP .+?(?=:exported-artifacts)}} {{ | `-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ | `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ | `-5*[{docker-current}]}}</p>
<p>At the same time, I see the following in pstree output of the same node:}} {{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }} {{ `-docker-current images --format={{.Repository}}:{{.Tag}} }} {{ `-5*[{docker-current}]}} {{-sudo systemctl start docker}} {{ `-systemctl start docker}} {{-sudo -n /bin/yum install -y docker}} {{ `-yum /bin/yum install -y docker}} {{ `-sh /var/tmp/rpm-tmp.gO7ceb 1}} {{ `-systemctl try-restart docker.service}} {{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0}} {{ `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0}} {{ `-docker-current ps -aq -f status=dead}} {{ `-6*[{docker-current}]}}</p>
<p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p>
<p>was (Author: ederevea): Looking at the slave, here's the stuck part: {quote} `-bash -c cd “/home/jenkins” && java -jar slave.jar{quote} {quote} `-java -jar slave.jar{quote} {quote} |-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | `-bash -ex /tmp/jenkins6289914333579712645.sh{quote} {quote} | |-grep -oP .+?(?=:exported-artifacts){quote} {quote} | `-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} | `-5*[{docker-current}]{quote}</p>
<p>At the same time, I see the following in pstree output of the same node:{quote} {quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-docker-current images --format={{.Repository}}:{{.Tag}}{quote} {quote} `-5*[{docker-current}]{quote} {quote}-sudo systemctl start docker{quote} {quote} `-systemctl start docker{quote} {quote}-sudo -n /bin/yum install -y docker{quote} {quote} `-yum /bin/yum install -y docker{quote} {quote} `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote} {quote} `-systemctl try-restart docker.service{quote} {quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n “$DEAD” ] && docker rm $DEAD; exit 0{quote} {quote} `-docker-current ps -aq -f status=dead{quote} {quote} `-6*[{docker-current}]{quote}</p>
<p>As all of these commands are stuck from various stages of the job while docker wasn't even used throughout it.</p>
<blockquote><h3>jobs freeze due to unresponsive docker</h3>
<pre> Key: OVIRT-1840
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
Project: oVirt - virtualization made easy
Issue Type: Task
Reporter: Evgheni Dereveanchin
Assignee: infra</pre>
<p>Quite often do I see jobs stuck at various stages for hours that seem related to docker. Example: <a href="http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...">http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86...</a> There's multiple docker commands stuck on the slave (will post in the next comment) so it seems to be deadlocked. Opening ticket to investigate which step exactly is causing this and possible ways of resolving. The job in question doesn't even use docker so shouldn't suffer if this happens.</p></blockquote>
<p>— This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100075)</p>
<img src="https://u4043402.ct.sendgrid.net/wf/open?upn=i5TMWGV99amJbNxJpSp2-2BJ33BS..." alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/>
</body></html>
------------=_1515660854-31309-326--
6 years, 10 months