Ok,
it's clear now that bing robot is being too aggressive, it downloaded around 6GB in
the last 3 days, I think that we should ban it, at least until we have better infra usinf
the robots.txt.
Today it's crawling again and the machine load is up to 20 again. Yesterday evening it
stopped crawling and the load dropped to 1-3...
I can do it myself but I need green light :)
----- Original Message -----
From: "David Caro Estevez" <dcaroest(a)redhat.com>
To: "Itamar Heim" <iheim(a)redhat.com>
Cc: "infra" <infra(a)ovirt.org>
Sent: Wednesday, June 12, 2013 5:40:43 PM
Subject: Re: gerrit not working
Hi, I've been monitoring it this evening, and some interesting things I've
found:
1) We are getting a lot of requests from bot (bing, ezooms, majestic) and
some of them are requesting malformed urls that make gerrit crash, for
example:
94.249.193.61 - - [12/Jun/2013:08:41:38 -0400] "GET
/gitweb?p=ovirt-engine.git;a=blob;f=backend/manager/modules/c511backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/VDS.java;h=bdd4414a843498ddacd548ee3dce158dc1b9a28f;hb=f122a88ef=
HTTP/1.0" 200 261640 - "Mozilla/5.0 (compatible; MJ12bot/v1.4.3;
http://www.majestic12.co.uk/bot.php?+)"
that triggers the errors:
[2013-06-12 08:41:36,409] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
: CGI: fatal: bad revision 'f122a88ef='
[2013-06-12 08:41:36,410] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
: CGI: [Wed Jun 12 08:41:35 2013] gitweb.cgi: Argument
"package org.ovirt.engine.core.common.businessentiti..." isn't
numeric
in printf at /var/www/git/gitweb.cgi line 5412, <$fd> line 1.
.... A lot more similar errors
Then there are other requests that ask for bad revisions and you can see the
error trace too:
[2013-06-12 08:46:27,356] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
: CGI: fatal: bad revision 'f122a881lass517'
But I'm not sure if those are really breaking something, we could put some
robots.txt in the root or something anyhow.
2) The problems come when gerrit is unable to process a request in less than
120s (2min), then this happens:
[2013-06-12 08:50:27,311] WARN
com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor
worker killed after 120587ms
java.util.concurrent.ExecutionException:
java.util.concurrent.CancellationException
... full tracedump
But I'm not sure what causes the slowness. We can put a more strict
robots.txt file to avoid them.
The load is around 18-20, we can try to lower the number of concurrent
threads (from xinetd), that will make gerrit respond better to those that
can connect, but refuse the others, instead of hanging on everyone.
On the other side I see that it's not swapping, it has between 700 and 300MB
of RAM free, open files has peaks of 2000 (ulimit is set to 655350, and
memory is ok, so no problem), the number of gerrit threads is around 100,
the number of connections (in any state) is 50 tops, and the number of
git-daemon processes is around 50 (xinetd has a limit of 200...).
I'll try to get some statistics about git connections and http connections
and see if I can find any common denominator (it's very likely that most of
the traffic comes from redhat office ip, but let's take a look anyhow).
Some things that I see can be done meanwhile:
- Modify robots.txt to avoid crawlers and get rid of some of the error
entries of the log
- Create a mirror for jenkins, so it does not have to clone the repo from
gerrit each time (maybe just changing the jobs config to not cleanup the
repo is enough, taking into account that they can end up being dirty)
- Set up graphing for gerrit machine, so we can have more accurate data
(that means setting up also the graphs server), right now I'm using a small
script (monitor_gerrit.sh) that shows raw numbers...
- Lower the number of simultaneous threads to something below 50, with the
inconvenience that it will refuse some clients.
I'll keep trying to figure out what is happening.
----- Original Message -----
> From: "Itamar Heim" <iheim(a)redhat.com>
> To: "Omer Frenkel" <ofrenkel(a)redhat.com>, "David Caro
Estevez"
> <dcaroest(a)redhat.com>
> Cc: "infra" <infra(a)ovirt.org>
> Sent: Wednesday, June 12, 2013 4:22:32 PM
> Subject: Re: gerrit not working
>
> On 06/12/2013 04:28 PM, Omer Frenkel wrote:
> > i cant fetch, asked around and seems its not only me,
> > can anyone take a look?
> >
> > $ git fetch
> > Write failed: Broken pipe
> > fatal: Could not read from remote repository.
> >
> > Please make sure you have the correct access rights
> > and the repository exists.
> >
>
> I restarted the service (which more people can try via the jenkins job
> as first mitigation).
>
> david - care to look at the logs and try to understand which of new
> errors in the logs post the upgrade are interesting?
>
> thanks,
> Itamar
>
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra