CI fails

older
[JIRA] (OVIRT-419) Create mirror...

Yevgeny Zaspitsky

19 Apr 2016 19 Apr '16

7:13 p.m.

http://jenkins.ovirt.org/job/ovirt-engine_master_find-bugs_gerrit/44920/ : There was an infra issue, please contact infra@ovirt.org

...

From looking into the job log it appears that git failed fetching the updates from the server. That isn't the first time a git problem appears on the Jenkins CI nodes - similar failure happened on another my patch today. Is there a way to improve git communication stability on the Jenkins CI nodes?

Regards, Yevgeny

Attachments:

attachment.html (text/html — 696 bytes)

Show replies by date

David Caro

19 Apr 19 Apr

7:25 p.m.

--ZGDvG9BNY5+B3nxW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/19 20:13, Yevgeny Zaspitsky wrote:

...

http://jenkins.ovirt.org/job/ovirt-engine_master_find-bugs_gerrit/44920/ : There was an infra issue, please contact infra@ovirt.org =20 From looking into the job log it appears that git failed fetching the updates from the server. That isn't the first time a git problem appears on the Jenkins CI nodes - similar failure happened on another my patch today. Is there a way to improve git communication stability on the Jenkins CI nodes?

Yep, it timed out: 17:48:28 ERROR: Timeout after 10 minutes Currently our gerrit server is on amazon, and the jenkins slaves are at phoenix that sometimes has network issues. It might be possible to try to add a gerrit mirror locally at phoenix though it's not trivial. I see that the speed is <50K/s, that's actually really slow, probably a net= work issue on phx, worth taking a look there too... we still have to add proper monitoring to the network, but we are on our way. I'd try that approach first, though the mirror is a good idea that will probably have to be implemented anyhow once we start adding slaves, having = real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution. @infra what do you think?

...

=20 Regards, Yevgeny

...

_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --ZGDvG9BNY5+B3nxW Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXFmoBAAoJEEBxx+HSYmnD1McH/iZCw63/2D7nzlJM/tBAX80k J28afp6TcesYrF2YwlWKzin/NXND+6blDhQ0Y17/A1FAcY3caGzQT1TxVrPsRsYP Rf/szLoJn1FG4hH9HXWk7Na8bazdMNShAQuNZ9RmrQahFidoYQgVyNsZzD0BO442 WkHWeGfZhXeJ0ZshcVadGjRs8VtIw7QTJ4yTGy9YeqSCUtzUL5ljNHYI/RggVaA5 dWIKIs9mxY8t9QIUXC2EMfe7hY08qB40mZ5OCHJNscuAFVvC2nLyORjF/7TDrsY9 K65TFiKR9B1bVbuYZ7+VXzCWJ1ntdEs+fFS6MD5X6SEhi4FPGeycb9gvisSgipk= =ciSb -----END PGP SIGNATURE----- --ZGDvG9BNY5+B3nxW--

Barak Korren

20 Apr 20 Apr

8:28 a.m.

...

I'd try that approach first, though the mirror is a good idea that will probably have to be implemented anyhow once we start adding slaves, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror fast enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submission happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol? -- Barak Korren bkorren@redhat.com RHEV-CI Team

Eyal Edri

9:16 a.m.

On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com> wrote:

...

...
I'd try that approach first, though the mirror is a good idea that will probably have to be implemented anyhow once we start adding slaves, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror fast enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submission happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos to PHX, this will help: 1. Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) 2. Reduce load (?) from the gerrit server and perhaps reduce errors of the per patch jobs that will still run from gerrit.ovirt.org (AMAZON) 3. A longer goal will be either to migrate the gerrit server to PHX or to find away to properly mirror the gerrit server (but then i fear there might be race/problem as mentioned) We have an open ticket on that which was blocked due to not wanting to clone any private git repos as well, we need to look into that again.

...

-- Barak Korren bkorren@redhat.com RHEV-CI Team _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

Barak Korren

9:20 a.m.

On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...

On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com> wrote:

...
...
I'd try that approach first, though the mirror is a good idea that will probably have to be implemented anyhow once we start adding slaves, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror fast enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submission happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos to PHX, this will help:

Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) Reduce load (?) from the gerrit server and perhaps reduce errors of the per patch jobs that will still run from gerrit.ovirt.org (AMAZON) A longer goal will be either to migrate the gerrit server to PHX or to find away to properly mirror the gerrit server (but then i fear there might be race/problem as mentioned)

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

Eyal Edri

9:25 a.m.

On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wrote:

...

On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...
On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com>

...
...
...
I'd try that approach first, though the mirror is a good idea that

will

...
...
probably have to be implemented anyhow once we start adding slaves, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror fast enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submission happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos to PHX, this will help:

Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) Reduce load (?) from the gerrit server and perhaps reduce errors of the

wrote: per

...
patch jobs that will still run from gerrit.ovirt.org (AMAZON) A longer goal will be either to migrate the gerrit server to PHX or to find away to properly mirror the gerrit server (but then i fear there might be race/problem as mentioned)

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

I don't see how a race condition can occur with a merge commit, Can you elaborate? -- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

David Caro

9:40 a.m.

--K3Y3NTg/qyuIFs24 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/20 10:25, Eyal Edri wrote:

...

On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wrote: =20

...
On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...
On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com>

...
...
...
I'd try that approach first, though the mirror is a good idea that

will

...
...
probably have to be implemented anyhow once we start adding slaves, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror fa= st enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submissi= on happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos to = PHX, this will help:

Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) Reduce load (?) from the gerrit server and perhaps reduce errors of t= he

wrote: per

...
patch jobs that will still run from gerrit.ovirt.org (AMAZON) A longer goal will be either to migrate the gerrit server to PHX or to find away to properly mirror the gerrit server (but then i fear there migh= t be race/problem as mentioned)

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

=20 I don't see how a race condition can occur with a merge commit, Can you elaborate?

=46rom the gerrit config on jenkins: Replication cache expiration time in minutes If one of the server supports replication events, these events are cached i= n memory because they can be received before the build is triggered and thi= s plugin gets called to evaluate if the build can run. Cache allows the plu= gin to look if the replication events were already received when it gets ca= lled to evaluate if the build can run. If the time elapsed between this plu= gin gets called and the time the build entered the queue is greated than th= e cache expiration time, the plugin will assume that replication events wer= e received and will let the build run. Changing this value will only take effect when Jenkins is restarted=20

...

=20 =20 =20 =20 --=20 Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel =20 phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --K3Y3NTg/qyuIFs24 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXFzJTAAoJEEBxx+HSYmnD/doH/2nMykApAnpG1ZtTSSXsl7xJ ZQYd4eEiiJ/7UaAi9Ig7GA5MorOET6yVPzP6S7tQNing7DtYpv2nyTZLSv9rSCpC jkmms2iZfbzeC4AJtkAAKL/qo7Co75Y+GCDcsGCO4pnaRzu4eOMg8N4veL7Fw4ie WU4jTls0SSnXqOZWNp5586/iafbOcfGDcDwt+Wzf9J+cr94a4RWNvDkhO275HwYK rHRE6C5aIiZF1hrS2T8gVyHUr9AwH+lRzY2sVJ87aFQ78S1tk3RrfEAnobu/emol LBeta3OYqETAKJRI561bxiGtdX4X7gB52jCHxQasErXdgxHs2YklYcU7ab0MsF4= =r8hk -----END PGP SIGNATURE----- --K3Y3NTg/qyuIFs24--

David Caro

9:41 a.m.

...

On 04/20 10:25, Eyal Edri wrote:

...
On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wro= te: =20

...
On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...
On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com>

wrote:

...
...
...
I'd try that approach first, though the mirror is a good idea th=

at will

...
...
probably have to be implemented anyhow once we start adding slav= es, having real info on the network usage/errors will give us insight to actually determine what's the issue, and thus, what's the best solution.

@infra what do you think?

The issue with mirroring is how can you make sure that you mirror = fast enough to enable CI. Even if Gerrit can push to the mirror on patch submission, there will still be some time delta between the submis= sion happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same bene= fit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos t= o PHX, this will help:

Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) Reduce load (?) from the gerrit server and perhaps reduce errors of=

...

...
...
per

...
patch jobs that will still run from gerrit.ovirt.org (AMAZON) A longer goal will be either to migrate the gerrit server to PHX or= to find away to properly mirror the gerrit server (but then i fear there mi= ght be race/problem as mentioned)

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

=20 I don't see how a race condition can occur with a merge commit, Can you elaborate? =20 =20 From the gerrit config on jenkins: =20 =20 Replication cache expiration time in minutes =20 If one of the server supports replication events, these events are cached= in memory because they can be received before the build is triggered and t= his plugin gets called to evaluate if the build can run. Cache allows the p= lugin to look if the replication events were already received when it gets = called to evaluate if the build can run. If the time elapsed between this p= lugin gets called and the time the build entered the queue is greated than =

--arYKMy5bKB/hcRo6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/20 09:40, David Caro wrote: the the cache expiration time, the plugin will assume that replication events w= ere received and will let the build run.

...

=20 Changing this value will only take effect when Jenkins is restarted=20 =20

And from the specific server options: Block builds in the queue until the replication events for the configured G= errit slave(s) are received.

...

=20

...
=20 =20 =20 =20 --=20 Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel =20 phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --arYKMy5bKB/hcRo6 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXFzKPAAoJEEBxx+HSYmnDyEUH/016JkPNKn2PXBgXOBap2ocm tgsELdNz77v0aom34FfsB70iMYvKJNkjLYkxwTldhz4orfuL6ZWOiBRR0VuQw2lT UKUK3iUmFewB5E+q/VEUlkIJEEi99WtA/2dw5nIroc5G7SF1BirHHNpsP4BAH4bV YRX46K/4t5OYJQIjGzQEh2N+tVCpH5P5V/Dw7rjz2juefjSxVDILd3nmhag9FXQe K7ci1ORDnRFbiZKmLjaWxqBzijaZG+drxDSQU9vNcE8PY9IQPARhp2jsW/TYigIj 7qZDqqQCwUJnN5QkAWaBOSg624S8BxPwwE15+WxKJdtqk7d43163HR8/1XKDHBI= =K/Fk -----END PGP SIGNATURE----- --arYKMy5bKB/hcRo6--

Barak Korren

9:42 a.m.

...

...
Replication cache expiration time in minutes

If one of the server supports replication events, these events are cached in memory because they can be received before the build is triggered and this plugin gets called to evaluate if the build can run. Cache allows the plugin to look if the replication events were already received when it gets called to evaluate if the build can run. If the time elapsed between this plugin gets called and the time the build entered the queue is greated than the cache expiration time, the plugin will assume that replication events were received and will let the build run.

Changing this value will only take effect when Jenkins is restarted

And from the specific server options:

Block builds in the queue until the replication events for the configured Gerrit slave(s) are received.

Hmm, this means we must use Gerrit replication rather then simple 'git pull' based mirroring.

Eyal Edri

9:44 a.m.

OK, But if we move to use git server instead of gerrit server (only for post merge), we won't use the Gerrit trigger plugin, so I don't see how its still relevant. Instead of using the gerrit trigger to run post merge jobs, we'll use the SCM plugin instead like a normal git. Will this approach have any issues? On Wed, Apr 20, 2016 at 10:41 AM, David Caro <dcaro@redhat.com> wrote:

...

...
On 04/20 10:25, Eyal Edri wrote:

...
On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wrote:

...
On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...
On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.com>

wrote:

...
...
> I'd try that approach first, though the mirror is a good idea

...
...
...
will

...
...
> probably have to be implemented anyhow once we start adding slaves, > having real > info on the network usage/errors will give us insight to actually > determine > what's the issue, and thus, what's the best solution. > > @infra what do you think? >

The issue with mirroring is how can you make sure that you mirror fast enough to enable CI. Even if Gerrit can push to the mirror on

...
...
...
...
...
submission, there will still be some time delta between the submission happening (and the patch event showing up in Jenins) and the mirror being synced. This looks like a nasty race condition. What the mirror essentially does is make sure that bits are copied from Amazom to PHX just once. I wonder if we can get the same benefit with a simple HTTP proxy, how proxy-able is the Git HTTP protocol?

I think we should prioritize mirroring the GIT (not gerrit) repos to PHX, this will help:

Speed up all post merge jobs and reduce potential of errors from git clone (they will be in the same network) Reduce load (?) from the gerrit server and perhaps reduce errors of the per patch jobs that will still run from gerrit.ovirt.org (AMAZON) A longer goal will be either to migrate the gerrit server to PHX or to find away to properly mirror the gerrit server (but then i fear there might be race/problem as mentioned)

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

I don't see how a race condition can occur with a merge commit, Can you elaborate?

From the gerrit config on jenkins:

Replication cache expiration time in minutes

If one of the server supports replication events, these events are cached in memory because they can be received before the build is triggered and this plugin gets called to evaluate if the build can run. Cache allows

On 04/20 09:40, David Caro wrote: that patch the plugin to look if the replication events were already received when it gets called to evaluate if the build can run. If the time elapsed between this plugin gets called and the time the build entered the queue is greated than the cache expiration time, the plugin will assume that replication events were received and will let the build run.

...
Changing this value will only take effect when Jenkins is restarted

And from the specific server options:

Block builds in the queue until the replication events for the configured Gerrit slave(s) are received.

...
...
-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel

phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

-- David Caro

Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

-- David Caro

Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

David Caro

9:47 a.m.

--Q6Ii71d/u7QX3MLh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/20 10:44, Eyal Edri wrote:

...

OK, But if we move to use git server instead of gerrit server (only for post merge), we won't use the Gerrit trigger plugin, so I don't see how its still relevant. Instead of using the gerrit trigger to run post merge jobs, we'll use the SCM plugin instead like a normal git. =20 Will this approach have any issues?

...

=20 On Wed, Apr 20, 2016 at 10:41 AM, David Caro <dcaro@redhat.com> wrote: =20

...
...
On 04/20 10:25, Eyal Edri wrote:

...
On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com> wrote:

...
On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote:

...
On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <bkorren@redhat.c=

om> wrote:

...
> > > I'd try that approach first, though the mirror is a good idea

...
...
...
will

...
> > probably have to be implemented anyhow once we start adding slaves, > > having real > > info on the network usage/errors will give us insight to actually > > determine > > what's the issue, and thus, what's the best solution. > > > > @infra what do you think? > > > > The issue with mirroring is how can you make sure that you mir= ror fast > enough to enable CI. Even if Gerrit can push to the mirror on

On 04/20 09:40, David Caro wrote: that patch

...
...
...
...
> submission, there will still be some time delta between the submission > happening (and the patch event showing up in Jenins) and the mirror > being synced. This looks like a nasty race condition. > What the mirror essentially does is make sure that bits are co=

Well, no feedback on gerrit, no link to the gerrit change that caused it, a= nd only triggering on merges. Though we already do that on some projects due to the high time they take to run. pied

...

...
...
...
...
...
> from Amazom to PHX just once. I wonder if we can get the same benefit > with a simple HTTP proxy, how proxy-able is the Git HTTP proto=

col? > > > > > >> > > > > > > > > > > > > I think we should prioritize mirroring the GIT (not gerrit) rep= os > > to PHX, > > > > > > this will help: > > > > > > > > > > > > Speed up all post merge jobs and reduce potential of errors from > > git > > > > > clone > > > > > > (they will be in the same network) > > > > > > Reduce load (?) from the gerrit server and perhaps reduce errors > > of the > > > > > per > > > > > > patch jobs that will still run from gerrit.ovirt.org (AMAZON) > > > > > > A longer goal will be either to migrate the gerrit server to PHX > > or to > > > > > find > > > > > > away to properly mirror the gerrit server (but then i fear there > > might be > > > > > > race/problem as mentioned) > > > > > > > > > > > > > > > > Please look at my comment about possible race conditions caused by > > > > > mirroring. Simple mirroring may cause more trouble then its worth= =2E We > > > > > need to consider proxying instead. > > > > > > > > > > > > > I don't see how a race condition can occur with a merge commit, > > > > Can you elaborate? > > > > > > > > > From the gerrit config on jenkins: > > > > > > > > > Replication cache expiration time in minutes > > > > > > If one of the server supports replication events, these events are > > cached in memory because they can be received before the build is trigg= ered > > and this plugin gets called to evaluate if the build can run. Cache all= ows > > the plugin to look if the replication events were already received when= it > > gets called to evaluate if the build can run. If the time elapsed betwe= en > > this plugin gets called and the time the build entered the queue is gre= ated > > than the cache expiration time, the plugin will assume that replication > > events were received and will let the build run. > > > > > > Changing this value will only take effect when Jenkins is restarted > > > > > > > > > And from the specific server options: > > > > Block builds in the queue until the replication events for the configur= ed > > Gerrit slave(s) are received. > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Eyal Edri > > > > Associate Manager > > > > RHEV DevOps > > > > EMEA ENG Virtualization R&D > > > > Red Hat Israel > > > > > > > > phone: +972-9-7692018 > > > > irc: eedri (on #tlv #rhev-dev #rhev-integ) > > > > > > -- > > > David Caro > > > > > > Red Hat S.L. > > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > > > Tel.: +420 532 294 605 > > > Email: dcaro@redhat.com > > > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > > > Web: www.redhat.com > > > RHT Global #: 82-62605 > > > > > > > > -- > > David Caro > > > > Red Hat S.L. > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > Tel.: +420 532 294 605 > > Email: dcaro@redhat.com > > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > > Web: www.redhat.com > > RHT Global #: 82-62605 > > >=20 >=20 >=20 > --=20 > Eyal Edri > Associate Manager > RHEV DevOps > EMEA ENG Virtualization R&D > Red Hat Israel >=20 > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ)

--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --Q6Ii71d/u7QX3MLh Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXFzQWAAoJEEBxx+HSYmnDLfgH/0pyqs6y6NTN2YsfCWstUHq9 o6ZC/plSAkF1sLKwtqvlaut3hw4b1Z1nc1USTH6GSRTrLqS0vXADQZTZDhKO6FEp 0PzD9wpAFw7InQbSTzBQKu/SD+mVyErw12aoVwLBoNrEsUaSU+k2GCD8a6/HcDvB rkmjMPOMdnl3Z4XmzZCdGPQmpQA8ivllrV3XHc75hXqLeqBuVEkEAGGgk0jRE6Hf 7dnAhmp/TUtI/oUysz/A1+zVUVjubGBH41mxf+SbvXB1cYW8T+cpEyDvqsvGfUq2 TQBVyhg8tQ0bxpOviqoOqOQhj/9YGarHSKK6jAht5F2nEyft/CYjHYVKU6U5rLw= =sciK -----END PGP SIGNATURE----- --Q6Ii71d/u7QX3MLh--

Eyal Edri

1:11 p.m.

On Wed, Apr 20, 2016 at 10:47 AM, David Caro <dcaro@redhat.com> wrote:

...

On 04/20 10:44, Eyal Edri wrote:

...
OK, But if we move to use git server instead of gerrit server (only for post merge), we won't use the Gerrit trigger plugin, so I don't see how its still relevant. Instead of using the gerrit trigger to run post merge jobs, we'll use the SCM plugin instead like a normal git.

Will this approach have any issues?

Well, no feedback on gerrit, no link to the gerrit change that caused it, and only triggering on merges.

On second thought, Can't we keep using the Gerrit Trigger plugin for events, but use the mirror URL for cloning instead of gerrit.ovirt.org, IMO it will work and worth a try. Also, about the cloning, What if we won't create the target GIT REPO of the 'private' repos we want to avoid cloning, That will fail replication and won't clone the repo we want to keep.

...

Though we already do that on some projects due to the high time they take to run.

...
On Wed, Apr 20, 2016 at 10:41 AM, David Caro <dcaro@redhat.com> wrote:

...
On 04/20 09:40, David Caro wrote:

...
On 04/20 10:25, Eyal Edri wrote:

...
On Wed, Apr 20, 2016 at 10:20 AM, Barak Korren <bkorren@redhat.com

...
wrote:

...
...
...
On 20 April 2016 at 10:16, Eyal Edri <eedri@redhat.com> wrote: > > > On Wed, Apr 20, 2016 at 9:28 AM, Barak Korren <

...
...
...
...
...
wrote: >> >> > I'd try that approach first, though the mirror is a good idea that will >> > probably have to be implemented anyhow once we start adding slaves, >> > having real >> > info on the network usage/errors will give us insight to actually >> > determine >> > what's the issue, and thus, what's the best solution. >> > >> > @infra what do you think? >> > >> >> The issue with mirroring is how can you make sure that you mirror fast >> enough to enable CI. Even if Gerrit can push to the mirror on patch >> submission, there will still be some time delta between the submission >> happening (and the patch event showing up in Jenins) and the mirror >> being synced. This looks like a nasty race condition. >> What the mirror essentially does is make sure that bits are copied >> from Amazom to PHX just once. I wonder if we can get the same benefit >> with a simple HTTP proxy, how proxy-able is the Git HTTP

...
...
...
...
...
>> > > I think we should prioritize mirroring the GIT (not gerrit) repos to PHX, > this will help: > > Speed up all post merge jobs and reduce potential of errors from git clone > (they will be in the same network) > Reduce load (?) from the gerrit server and perhaps reduce errors of the per > patch jobs that will still run from gerrit.ovirt.org (AMAZON) > A longer goal will be either to migrate the gerrit server to PHX or to find > away to properly mirror the gerrit server (but then i fear

...
...
might be

...
...
...
> race/problem as mentioned) >

Please look at my comment about possible race conditions caused by mirroring. Simple mirroring may cause more trouble then its worth. We need to consider proxying instead.

I don't see how a race condition can occur with a merge commit, Can you elaborate?

From the gerrit config on jenkins:

Replication cache expiration time in minutes

If one of the server supports replication events, these events are cached in memory because they can be received before the build is

bkorren@redhat.com> protocol? there triggered

...
...
and this plugin gets called to evaluate if the build can run. Cache allows the plugin to look if the replication events were already received when it gets called to evaluate if the build can run. If the time elapsed between this plugin gets called and the time the build entered the queue is greated than the cache expiration time, the plugin will assume that replication events were received and will let the build run.

...
Changing this value will only take effect when Jenkins is restarted

And from the specific server options:

Block builds in the queue until the replication events for the configured Gerrit slave(s) are received.

...
...
-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel

phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

-- David Caro

Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

-- David Caro

Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel

phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

-- David Caro

Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

3481

Age (days ago)

3482

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Barak Korren
David Caro
Eyal Edri
Yevgeny Zaspitsky

CI fails

tags

participants (4)