----- Original Message -----
From: "Dave Neary" <dneary(a)redhat.com>
To: "Eyal Edri" <eedri(a)redhat.com>
Cc: "infra" <infra(a)ovirt.org>
Sent: Sunday, October 20, 2013 12:10:05 AM
Subject: Re: documenting
ovirt.org outages downtime
Hi,
Not only me, I hope!
ovirt.org runs on OpenShift, and is constantly in a race against a 5GB
disk quota. Most of the disk quota is taken up with an ibdata1 file from
mysql (3.5GB and counting), the rest is user uploads, access & error
logs, git repo & mediawiki source code.
I'm working on putting some kind of early warning system in place to let
us know when we hit 90% of our disk quota - but we need to figure out
how to stop that index file growing so big, how to process our httpd
logs and get them off the host, and/or how to increase our disk quota on
OpenShift.
The disk filled up at the same time as another application on the same
node had very high disk I/O and load, which caused the restarting f our
application to fail, the database didn't come back up correctly. A
second restart was what fixed it, but I didn't do that immediately
because I did not know what had caused the failure and didn't want to
risk data corruption.
The access I have is an SSH access (I shared my SSH key with Karsten,
and could then git clone, push, and ssh to the application node). As of
this week, OpenShift apps support teams, so Karsten will be able to add
a number of us to be maintainers of the app, and we will also have usage
of the rhc command line tools.
Can you please add current infra team members to the group?
or do you want each to send his pub key?
The contacts on the OpenShift team are #openshift on freenode, or the
OpenShift forum.
Karsten, does that look complete?
I'm still missing real commands/actions on what to do in an outage.
examples:
1. how to login and server name
2. where is our app is located (to those who are not familiar with openshift)
3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
4. email/irc contact for support from openshift
if you can share this info on the email, i can copy it to a txt file and locate on linode
for time you're not availalbe and
ovirt.org is down.
thanks,
Eyal.
Thanks,
Dave.
On 10/17/2013 01:26 PM, Eyal Edri wrote:
> hi,
>
> today there was another outage on
ovirt.org (not sure how was it fixed
> yet), afaik only dneary knows how to handle such outages?
> can we have a txt file on the normal place on
resources.ovirt.org with info
> on:
>
> - who to contact for openshift support/outage
> - commands needed for fixing/debugging issues
> - known issues/previous issues that were fixed + solution.
>
> thoughts/ideas?
>
> Eyal.
> _______________________________________________
> Infra mailing list
> Infra(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/infra
>
--
Dave Neary - Community Action and Impact
Open Source and Standards, Red Hat -
http://community.redhat.com
Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13