documenting ovirt.org outages downtime

Eyal Edri eedri at redhat.com
Tue Oct 22 09:02:40 UTC 2013



----- Original Message -----
> From: "Dave Neary" <dneary at redhat.com>
> To: "Eyal Edri" <eedri at redhat.com>
> Cc: "infra" <infra at ovirt.org>
> Sent: Sunday, October 20, 2013 12:10:05 AM
> Subject: Re: documenting ovirt.org outages downtime
> 
> Hi,
> 
> Not only me, I hope!
> 
> ovirt.org runs on OpenShift, and is constantly in a race against a 5GB
> disk quota. Most of the disk quota is taken up with an ibdata1 file from
> mysql (3.5GB and counting), the rest is user uploads, access & error
> logs, git repo & mediawiki source code.
> 
> I'm working on putting some kind of early warning system in place to let
> us know when we hit 90% of our disk quota - but we need to figure out
> how to stop that index file growing so big, how to process our httpd
> logs and get them off the host, and/or how to increase our disk quota on
> OpenShift.
> 
> The disk filled up at the same time as another application on the same
> node had very high disk I/O and load, which caused the restarting f our
> application to fail, the database didn't come back up correctly. A
> second restart was what fixed it, but I didn't do that immediately
> because I did not know what had caused the failure and didn't want to
> risk data corruption.
> 
> The access I have is an SSH access (I shared my SSH key with Karsten,
> and could then git clone, push, and ssh to the application node). As of
> this week, OpenShift apps support teams, so Karsten will be able to add
> a number of us to be maintainers of the app, and we will also have usage
> of the rhc command line tools.

Can you please add current infra team members to the group?
or do you want each to send his pub key? 

> 
> The contacts on the OpenShift team are #openshift on freenode, or the
> OpenShift forum.
> 
> Karsten, does that look complete?

I'm still missing real commands/actions on what to do in an outage.
examples:
 1. how to login and server name
 2. where is our app is located (to those who are not familiar with openshift)
 3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
 4. email/irc contact for support from openshift

if you can share this info on the email, i can copy it to a txt file and locate on linode 
for time you're not availalbe and ovirt.org is down.

thanks,

Eyal.

> 
> Thanks,
> Dave.
> 
> On 10/17/2013 01:26 PM, Eyal Edri wrote:
> > hi,
> > 
> > today there was another outage on ovirt.org (not sure how was it fixed
> > yet), afaik only dneary knows how to handle such outages?
> > can we have a txt file on the normal place on resources.ovirt.org with info
> > on:
> > 
> >  - who to contact for openshift support/outage
> >  - commands needed for fixing/debugging issues
> >  - known issues/previous issues that were fixed + solution.
> > 
> > thoughts/ideas?
> > 
> > Eyal.
> > _______________________________________________
> > Infra mailing list
> > Infra at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/infra
> > 
> 
> --
> Dave Neary - Community Action and Impact
> Open Source and Standards, Red Hat - http://community.redhat.com
> Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13
> 



More information about the Infra mailing list