On 05/31/2013 12:16 PM, Dave Neary wrote:
Hi guys,
We really need to get a handle on this, having conversations like this
is kind of ridiculous. First, by making it clear who has access to the
various resources we depend on, second by setting up some kind of
"weekend & TZ watch" system so that if something happens on a Friday or
Saturday, or in the middle of our admins' night, that we know how to get
services running again, and finally, by having some kind of monitoring
solution up and running at last.
I know that some of these things are on our agenda:
https://fedorahosted.org/ovirt/ticket/48
https://fedorahosted.org/ovirt/ticket/20
just for the immediate parts of this;
gerrit has monitoring - a jenkins job sending an email when it is down
to infra mailing list.
basic "get it to work if hung/down" operator job is available in jenkins
for a much wider distribution of users who can restart it.
But this really feels like it should be a top priority for us.
IRC extract, 10:20am CEST Friday May 31 2013:
<kanagaraj>
gerrit.ovirt.org is down. anyone facing this issue?
<tdosek> kanagaraj: same here. it's down. :(
<fabiand> yep, it's down ...
awh ...
<fabiand> dneary, gerrit is down ... have you got control over it?
<dneary> fabiand, I don't
dcaro, Ping?
fabiand, dcaro is the man
<dcaro> dneary: me neither :S
<dneary> Ah
Isn't Gerrit on Alterway servers now?
<fabiand> ewoud, do you maybe know?
<ewoud> dneary: not yet
<dcaro> dneary: let me see if I can access...
<mskrivanek> dcaro: use the force!
<ewoud> itamar: can you check gerrit?
<dcaro> dneary: it's in amazon :S
<dcaro> no, no access :,(
<dneary> The wiki page
http://www.ovirt.org/Infrastructure_team_administrators lists itamar and
Rydekull
<dcaro> none of them are around...
<lhornyak> gerrit is down?
<lhornyak> ping is ok though
anyone working on the gerrit fubar? :)
* dcaro looks the other way
dcaro starts whistling
Cheers,
Dave.