
On 05/31/2013 12:16 PM, Dave Neary wrote:
Hi guys,
We really need to get a handle on this, having conversations like this is kind of ridiculous. First, by making it clear who has access to the various resources we depend on, second by setting up some kind of "weekend & TZ watch" system so that if something happens on a Friday or Saturday, or in the middle of our admins' night, that we know how to get services running again, and finally, by having some kind of monitoring solution up and running at last.
I know that some of these things are on our agenda: https://fedorahosted.org/ovirt/ticket/48 https://fedorahosted.org/ovirt/ticket/20
just for the immediate parts of this; gerrit has monitoring - a jenkins job sending an email when it is down to infra mailing list. basic "get it to work if hung/down" operator job is available in jenkins for a much wider distribution of users who can restart it.
But this really feels like it should be a top priority for us.
IRC extract, 10:20am CEST Friday May 31 2013:
<kanagaraj> gerrit.ovirt.org is down. anyone facing this issue? <tdosek> kanagaraj: same here. it's down. :( <fabiand> yep, it's down ... awh ... <fabiand> dneary, gerrit is down ... have you got control over it? <dneary> fabiand, I don't dcaro, Ping? fabiand, dcaro is the man <dcaro> dneary: me neither :S <dneary> Ah Isn't Gerrit on Alterway servers now? <fabiand> ewoud, do you maybe know? <ewoud> dneary: not yet <dcaro> dneary: let me see if I can access... <mskrivanek> dcaro: use the force! <ewoud> itamar: can you check gerrit? <dcaro> dneary: it's in amazon :S <dcaro> no, no access :,( <dneary> The wiki page http://www.ovirt.org/Infrastructure_team_administrators lists itamar and Rydekull <dcaro> none of them are around... <lhornyak> gerrit is down? <lhornyak> ping is ok though anyone working on the gerrit fubar? :) * dcaro looks the other way dcaro starts whistling
Cheers, Dave.