
hi, today there was another outage on ovirt.org (not sure how was it fixed yet), afaik only dneary knows how to handle such outages? can we have a txt file on the normal place on resources.ovirt.org with info on: - who to contact for openshift support/outage - commands needed for fixing/debugging issues - known issues/previous issues that were fixed + solution. thoughts/ideas? Eyal.

Hi, Not only me, I hope! ovirt.org runs on OpenShift, and is constantly in a race against a 5GB disk quota. Most of the disk quota is taken up with an ibdata1 file from mysql (3.5GB and counting), the rest is user uploads, access & error logs, git repo & mediawiki source code. I'm working on putting some kind of early warning system in place to let us know when we hit 90% of our disk quota - but we need to figure out how to stop that index file growing so big, how to process our httpd logs and get them off the host, and/or how to increase our disk quota on OpenShift. The disk filled up at the same time as another application on the same node had very high disk I/O and load, which caused the restarting f our application to fail, the database didn't come back up correctly. A second restart was what fixed it, but I didn't do that immediately because I did not know what had caused the failure and didn't want to risk data corruption. The access I have is an SSH access (I shared my SSH key with Karsten, and could then git clone, push, and ssh to the application node). As of this week, OpenShift apps support teams, so Karsten will be able to add a number of us to be maintainers of the app, and we will also have usage of the rhc command line tools. The contacts on the OpenShift team are #openshift on freenode, or the OpenShift forum. Karsten, does that look complete? Thanks, Dave. On 10/17/2013 01:26 PM, Eyal Edri wrote:
hi,
today there was another outage on ovirt.org (not sure how was it fixed yet), afaik only dneary knows how to handle such outages? can we have a txt file on the normal place on resources.ovirt.org with info on:
- who to contact for openshift support/outage - commands needed for fixing/debugging issues - known issues/previous issues that were fixed + solution.
thoughts/ideas?
Eyal. _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Dave Neary - Community Action and Impact Open Source and Standards, Red Hat - http://community.redhat.com Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13

----- Original Message -----
From: "Dave Neary" <dneary@redhat.com> To: "Eyal Edri" <eedri@redhat.com> Cc: "infra" <infra@ovirt.org> Sent: Sunday, October 20, 2013 12:10:05 AM Subject: Re: documenting ovirt.org outages downtime
Hi,
Not only me, I hope!
ovirt.org runs on OpenShift, and is constantly in a race against a 5GB disk quota. Most of the disk quota is taken up with an ibdata1 file from mysql (3.5GB and counting), the rest is user uploads, access & error logs, git repo & mediawiki source code.
I'm working on putting some kind of early warning system in place to let us know when we hit 90% of our disk quota - but we need to figure out how to stop that index file growing so big, how to process our httpd logs and get them off the host, and/or how to increase our disk quota on OpenShift.
The disk filled up at the same time as another application on the same node had very high disk I/O and load, which caused the restarting f our application to fail, the database didn't come back up correctly. A second restart was what fixed it, but I didn't do that immediately because I did not know what had caused the failure and didn't want to risk data corruption.
The access I have is an SSH access (I shared my SSH key with Karsten, and could then git clone, push, and ssh to the application node). As of this week, OpenShift apps support teams, so Karsten will be able to add a number of us to be maintainers of the app, and we will also have usage of the rhc command line tools.
Can you please add current infra team members to the group? or do you want each to send his pub key?
The contacts on the OpenShift team are #openshift on freenode, or the OpenShift forum.
Karsten, does that look complete?
I'm still missing real commands/actions on what to do in an outage. examples: 1. how to login and server name 2. where is our app is located (to those who are not familiar with openshift) 3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete) 4. email/irc contact for support from openshift if you can share this info on the email, i can copy it to a txt file and locate on linode for time you're not availalbe and ovirt.org is down. thanks, Eyal.
Thanks, Dave.
On 10/17/2013 01:26 PM, Eyal Edri wrote:
hi,
today there was another outage on ovirt.org (not sure how was it fixed yet), afaik only dneary knows how to handle such outages? can we have a txt file on the normal place on resources.ovirt.org with info on:
- who to contact for openshift support/outage - commands needed for fixing/debugging issues - known issues/previous issues that were fixed + solution.
thoughts/ideas?
Eyal. _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Dave Neary - Community Action and Impact Open Source and Standards, Red Hat - http://community.redhat.com Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13

On Tue, Oct 22, 2013 at 05:02:40AM -0400, Eyal Edri wrote:
Dave Neary wrote:
The access I have is an SSH access (I shared my SSH key with Karsten, and could then git clone, push, and ssh to the application node). As of this week, OpenShift apps support teams, so Karsten will be able to add a number of us to be maintainers of the app, and we will also have usage of the rhc command line tools.
Can you please add current infra team members to the group? or do you want each to send his pub key?
IIRC Rydekull and I already have our keys there. Last time I used it, there were some actions that need the password though (such as using the RHC tool to easily restart).
The contacts on the OpenShift team are #openshift on freenode, or the OpenShift forum.
Karsten, does that look complete?
I'm still missing real commands/actions on what to do in an outage. examples: 1. how to login and server name
It's easiest to just use www.ovirt.org, which is a CNAME to wiki-ovirt.rhcloud.com. I'm sure the username is now in the usual place.
2. where is our app is located (to those who are not familiar with openshift)
Not sure what you exactly mean. Physically? Which directory? You should know openshift does a lot through environment variables. For example, there's $OPENSHIFT_PHP_LOG_DIR which points to the apache logs, $OPENSHIFT_DATA_DIR which points to the persistent data, such as our mediawiki image uploads or $OPENSHIFT_MYSQL_DB_LOG_DIR for the mysql logs. env is a very useful command on openshift.
3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
Disk is limited by standard quotas: [wiki-ovirt.rhcloud.com 847edb45aea84198838f915be6faa066]\> quota -s Disk quotas for user 847edb45aea84198838f915be6faa066 (uid 3689): Filesystem blocks quota limit grace files quota limit grace /dev/mapper/EBSStore01-user_home01 4911M 0 5120M 9571 0 200k It seems currently most space is used by our database (3.5G) and as you might know, mysql innodb storage is never returned. Even if you purge all tables, the ibdata file can't shrink. That doesn't have to be bad, but something to be aware of.
4. email/irc contact for support from openshift
On what kind of plan are we? If free, I think the support is mostly best effort on #openshift and https://www.openshift.com/forums/openshift.

----- Original Message -----
From: "Ewoud Kohl van Wijngaarden" <ewoud+ovirt@kohlvanwijngaarden.nl> To: infra@ovirt.org Sent: Tuesday, October 22, 2013 1:03:07 PM Subject: Re: documenting ovirt.org outages downtime
On Tue, Oct 22, 2013 at 05:02:40AM -0400, Eyal Edri wrote:
Dave Neary wrote:
The access I have is an SSH access (I shared my SSH key with Karsten, and could then git clone, push, and ssh to the application node). As of this week, OpenShift apps support teams, so Karsten will be able to add a number of us to be maintainers of the app, and we will also have usage of the rhc command line tools.
Can you please add current infra team members to the group? or do you want each to send his pub key?
IIRC Rydekull and I already have our keys there. Last time I used it, there were some actions that need the password though (such as using the RHC tool to easily restart).
The contacts on the OpenShift team are #openshift on freenode, or the OpenShift forum.
Karsten, does that look complete?
I'm still missing real commands/actions on what to do in an outage. examples: 1. how to login and server name
It's easiest to just use www.ovirt.org, which is a CNAME to wiki-ovirt.rhcloud.com. I'm sure the username is now in the usual place.
2. where is our app is located (to those who are not familiar with openshift)
Not sure what you exactly mean. Physically? Which directory?
You should know openshift does a lot through environment variables. For example, there's $OPENSHIFT_PHP_LOG_DIR which points to the apache logs, $OPENSHIFT_DATA_DIR which points to the persistent data, such as our mediawiki image uploads or $OPENSHIFT_MYSQL_DB_LOG_DIR for the mysql logs. env is a very useful command on openshift.
3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
Disk is limited by standard quotas:
[wiki-ovirt.rhcloud.com 847edb45aea84198838f915be6faa066]\> quota -s Disk quotas for user 847edb45aea84198838f915be6faa066 (uid 3689): Filesystem blocks quota limit grace files quota limit grace /dev/mapper/EBSStore01-user_home01 4911M 0 5120M 9571 0 200k
It seems currently most space is used by our database (3.5G) and as you might know, mysql innodb storage is never returned. Even if you purge all tables, the ibdata file can't shrink. That doesn't have to be bad, but something to be aware of.
4. email/irc contact for support from openshift
On what kind of plan are we? If free, I think the support is mostly best effort on #openshift and https://www.openshift.com/forums/openshift.
i think that info is great for starts, i added it as 'troubleshooting_infra_issues.txt', under HOWTOS dir in the normal place on linode. this is important, cause once the wiki is down, we need an alternate place for DRP procedures. so if you think on other issues we might need to put there, please update the file. Eyal.
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

Hi, On 10/22/2013 12:03 PM, Ewoud Kohl van Wijngaarden wrote:
On Tue, Oct 22, 2013 at 05:02:40AM -0400, Eyal Edri wrote:
I'm still missing real commands/actions on what to do in an outage. examples: 1. how to login and server name
It's easiest to just use www.ovirt.org, which is a CNAME to wiki-ovirt.rhcloud.com. I'm sure the username is now in the usual place.
The username is something long and complicated, and you'll need to get it from the git remote for the wiki sources.
2. where is our app is located (to those who are not familiar with openshift)
Not sure what you exactly mean. Physically? Which directory?
The git repo, and the standard OpenShift workflow, is what I meant.
3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
Disk is limited by standard quotas:
[wiki-ovirt.rhcloud.com 847edb45aea84198838f915be6faa066]\> quota -s Disk quotas for user 847edb45aea84198838f915be6faa066 (uid 3689): Filesystem blocks quota limit grace files quota limit grace /dev/mapper/EBSStore01-user_home01 4911M 0 5120M 9571 0 200k
It seems currently most space is used by our database (3.5G) and as you might know, mysql innodb storage is never returned. Even if you purge all tables, the ibdata file can't shrink. That doesn't have to be bad, but something to be aware of.
Yes - painfully aware of it :-( No real solution either.
4. email/irc contact for support from openshift
On what kind of plan are we? If free, I think the support is mostly best effort on #openshift and https://www.openshift.com/forums/openshift.
I'm not sure, but I think we were bumped to Silver. Cheers, Dave. -- Dave Neary - Community Action and Impact Open Source and Standards, Red Hat - http://community.redhat.com Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13

Hi, On 10/22/2013 10:02 AM, Eyal Edri wrote:
Can you please add current infra team members to the group? or do you want each to send his pub key?
I don't have access - and yes, Karsten will need public SSH keys for everyone. I guess they're in the wiki already. To add people to a group, people will also need OpenShift usernames.
I'm still missing real commands/actions on what to do in an outage. examples: 1. how to login and server name
In the app, the git repo to use is generated and used. [dneary@leitrim wiki]$ git remote -v origin ssh://847edb45aea84198838f915be6faa066@wiki-ovirt.rhcloud.com/~/git/wiki.git (fetch) origin ssh://847edb45aea84198838f915be6faa066@wiki-ovirt.rhcloud.com/~/git/wiki.git (push) To connect, once your public SSH key is there, you can just ssh to uuid@wiki-ovirt.rhcloud.com
2. where is our app is located (to those who are not familiar with openshift)
It's on OpenShift :-) It's at wiki-ovirt.rhcloud.com - that means the app name is wiki, the username is ovirt. I think the password for it may be in the passwords file and any of us can go in & add SSH keys & add users to the team.
3. how to check if disk is full, and ideas on fixing it - (e.g which logs to delete)
Right now, I don't think we should delete logs. The access logs are the est source of web analytics we have. I'm looking at setting up Piwik for the wiki, it's pretty straightforward, which will reduce the need from now on, but I don't want to lose our access logs just yet. The main issue is that we have a 5GB quota and a 3.5GB database index file. Deleting logs is not going to win us much time. We need to automate logfile compression, which addresses things short term.
4. email/irc contact for support from openshift
I have been asking on #openshift on FreeNode, and/or #libra-ops internally on Red Hat IRC. The "proper" support channel is the OpenShift forum or the IRC channel.
if you can share this info on the email, i can copy it to a txt file and locate on linode for time you're not availalbe and ovirt.org is down.
Hope that helps! Dave. -- Dave Neary - Community Action and Impact Open Source and Standards, Red Hat - http://community.redhat.com Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13
participants (3)
-
Dave Neary
-
Ewoud Kohl van Wijngaarden
-
Eyal Edri