[DRAFT] Outage :: No disk space :: 2012-08-30
Mike Burns
mburns at redhat.com
Sat Sep 1 20:43:18 UTC 2012
----- Original Message -----
> Hi:
>
> I didn't really participate in this outage, so I thought others could
> help us draft up notes about it. I put some barebones below.
>
> One outcome we need to look at is, what do people do when they
> perceive
> services are out?
>
> Of course, if the service is the wiki, they can't check that for what
> to
> do ...
>
> How do we communicate when major communication services of ovirt.org
> are
> down? IRC is great but not enough ... If we can arrange for a
> reliable
> third-party mail relay to alias a page to the Infra team, great, but
> how
> do we keep it from getting spam?
>
> Another angle to resolve is service monitoring so we know when things
> go
> out rather than waiting for service users to tell us. I got some
> direct
> emails from people (since the infra@ list wasn't working), but I was
> unavailable and unaware of the problem until Robert called me when he
> was working on fixing it. I don't mind getting pager alerts, as long
> as
> we can tune things so they are not crazy often. :)
I think we should have multiple places that we notify.
1. IRC
2. wiki page
3. infra list
4. someplace on wordpress (preferably the main page).
This should be sufficient long term (i.e. once we have a better hosting solution than just the kitchen sink box.
I agree with getting service monitoring set up as well. We can even accomplish this to a certain extent with jenkins (and a separate non-jenkins cron job to monitor jenkins).
Mike
>
> == What occurred ==
>
> Even the doubled disk space on linode01.ovirt.ort (to 25 GB) wasn't
> enough to last long.
>
>
> == When ==
>
> XXXX?
>
> date -d "2012-08-30 XXXX UTC"
>
> == Affected services ==
>
> lists.ovirt.org
> wiki.ovirt.org
> ovirt.org/.*
> ovirtbot
> Gerritt backup
> Jenkins backup
> [[What else?]]
>
> == Responses to take ==
>
> * Get new hosting solution in place.
> * Double current disk space before new hosting move, to give us room
> to
> breath.
> * Work up a response place that is posted in the IRC topic or
> somewhere
> good so people know how to contact all of the Infra team when
> something
> is happening.
> * New service need: monitoring server
>
>
> --
> Karsten 'quaid' Wade, Sr. Analyst - Community Growth
> http://TheOpenSourceWay.org .^\ http://community.redhat.com
> @quaid (identi.ca/twitter/IRC) \v' gpg: AD0E0C41
>
>
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
More information about the Infra
mailing list