Ok,
it's clear now that bing robot is being too aggressive, it downloaded around 6GB in
the last 3 days, I think that we should ban it, at least until we have better infra usinf
the robots.txt.
Today it's crawling again and the machine load is up to 20 again. Yesterday evening
it stopped crawling and the load dropped to 1-3...
I can do it myself but I need green light :)
ack, but i prefer changes will wait to beginning of the week (sunday or
monday), rather then before the weekend.
next step is to enable the mirroring plugin to github[1], please look at
this as well while at. once its activated, we can move all jenkins jobs
to monitor github mirror rather than gerrit for jobs polling for
committs (vs. patch trigger).
[1] the config is already in place, as this worked till the upgrade,
since gerrit 2.5 moved this previously built-in logic to a plugin.
----- Original Message -----
> From: "David Caro Estevez" <dcaroest(a)redhat.com>
> To: "Itamar Heim" <iheim(a)redhat.com>
> Cc: "infra" <infra(a)ovirt.org>
> Sent: Wednesday, June 12, 2013 5:40:43 PM
> Subject: Re: gerrit not working
>
> Hi, I've been monitoring it this evening, and some interesting things I've
> found:
>
>
>
> 1) We are getting a lot of requests from bot (bing, ezooms, majestic) and
> some of them are requesting malformed urls that make gerrit crash, for
> example:
>
> 94.249.193.61 - - [12/Jun/2013:08:41:38 -0400] "GET
>
/gitweb?p=ovirt-engine.git;a=blob;f=backend/manager/modules/c511backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/VDS.java;h=bdd4414a843498ddacd548ee3dce158dc1b9a28f;hb=f122a88ef=
> HTTP/1.0" 200 261640 - "Mozilla/5.0 (compatible; MJ12bot/v1.4.3;
>
http://www.majestic12.co.uk/bot.php?+)"
>
> that triggers the errors:
>
> [2013-06-12 08:41:36,409] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
> : CGI: fatal: bad revision 'f122a88ef='
> [2013-06-12 08:41:36,410] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
> : CGI: [Wed Jun 12 08:41:35 2013] gitweb.cgi: Argument
> "package org.ovirt.engine.core.common.businessentiti..." isn't
numeric
> in printf at /var/www/git/gitweb.cgi line 5412, <$fd> line 1.
> .... A lot more similar errors
>
> Then there are other requests that ask for bad revisions and you can see the
> error trace too:
> [2013-06-12 08:46:27,356] ERROR com.google.gerrit.httpd.gitweb.GitWebServlet
> : CGI: fatal: bad revision 'f122a881lass517'
>
> But I'm not sure if those are really breaking something, we could put some
> robots.txt in the root or something anyhow.
>
>
>
> 2) The problems come when gerrit is unable to process a request in less than
> 120s (2min), then this happens:
>
> [2013-06-12 08:50:27,311] WARN
> com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor
> worker killed after 120587ms
> java.util.concurrent.ExecutionException:
> java.util.concurrent.CancellationException
> ... full tracedump
>
> But I'm not sure what causes the slowness. We can put a more strict
> robots.txt file to avoid them.
>
> The load is around 18-20, we can try to lower the number of concurrent
> threads (from xinetd), that will make gerrit respond better to those that
> can connect, but refuse the others, instead of hanging on everyone.
>
> On the other side I see that it's not swapping, it has between 700 and 300MB
> of RAM free, open files has peaks of 2000 (ulimit is set to 655350, and
> memory is ok, so no problem), the number of gerrit threads is around 100,
> the number of connections (in any state) is 50 tops, and the number of
> git-daemon processes is around 50 (xinetd has a limit of 200...).
>
>
> I'll try to get some statistics about git connections and http connections
> and see if I can find any common denominator (it's very likely that most of
> the traffic comes from redhat office ip, but let's take a look anyhow).
>
>
> Some things that I see can be done meanwhile:
> - Modify robots.txt to avoid crawlers and get rid of some of the error
> entries of the log
> - Create a mirror for jenkins, so it does not have to clone the repo from
> gerrit each time (maybe just changing the jobs config to not cleanup the
> repo is enough, taking into account that they can end up being dirty)
> - Set up graphing for gerrit machine, so we can have more accurate data
> (that means setting up also the graphs server), right now I'm using a small
> script (monitor_gerrit.sh) that shows raw numbers...
> - Lower the number of simultaneous threads to something below 50, with the
> inconvenience that it will refuse some clients.
>
> I'll keep trying to figure out what is happening.
>
> ----- Original Message -----
>> From: "Itamar Heim" <iheim(a)redhat.com>
>> To: "Omer Frenkel" <ofrenkel(a)redhat.com>, "David Caro
Estevez"
>> <dcaroest(a)redhat.com>
>> Cc: "infra" <infra(a)ovirt.org>
>> Sent: Wednesday, June 12, 2013 4:22:32 PM
>> Subject: Re: gerrit not working
>>
>> On 06/12/2013 04:28 PM, Omer Frenkel wrote:
>>> i cant fetch, asked around and seems its not only me,
>>> can anyone take a look?
>>>
>>> $ git fetch
>>> Write failed: Broken pipe
>>> fatal: Could not read from remote repository.
>>>
>>> Please make sure you have the correct access rights
>>> and the repository exists.
>>>
>>
>> I restarted the service (which more people can try via the jenkins job
>> as first mitigation).
>>
>> david - care to look at the logs and try to understand which of new
>> errors in the logs post the upgrade are interesting?
>>
>> thanks,
>> Itamar
>>
> _______________________________________________
> Infra mailing list
> Infra(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/infra
>