
On Tue, 2012-03-20 at 08:28 -0700, Karsten 'quaid' Wade wrote:
On 03/20/2012 07:18 AM, Ofer Schreiber wrote:
www.ovirt.org and gerrit.ovirt.org are now up and running.
We experienced two issues: 1. DB corruption on www.ovirt.org, caused by a full file system. 2. Faulty gerrit service, probably caused by #1.
Both issues were handled by oVirt infra team (mburns, quaid and myself)
I'm in a meeting all day today so I won't pause for a single root-cause analysis, but instead just dump pieces as I go.
As a first step, I'm fixing the easy mistakes, which includes that we didn't have a straightforward backup of the MediaWiki and WordPress databases. (We do have a daily Linode backup, but that's the painful way.)
As a stop-gap, I setup a few bash scripts to run every day to grab the database.
crontab -e # Give root word about the backup MAILTO=root # # Run five minutes after Midnight Eastern at quietest time, every day 5 0 * * * /root/bin/wordpress-backup.sh # Run ten minutes after Midnight Eastern at quietest time, every day 10 0 * * * /root/bin/mediawiki-backup.sh
The root cause today was a fillup of /home/gerrit-backup/gerrit.ovirt.org-gerrit2-home-backup/ which has a daily snapshot of everything-that-is-gerrit.
The problem is, I didn't build a clean-up for those backups, so they went back to January when I did the last manual clean-up.
Gerrit probably fellover when trying to do the rsync of its backup to linode01.ovirt.org. That's the only way these two servers interact that I recall.
So we need a cleanup script to run in cron.weekly or cron.daily to erase the old backups.
See the other reply on this thread from Eyal. There is a handy find script that will work well for this. Note: we need it on the gerrit server as well if we use that as the backup server for the www site backup. Mike
We also need a script to rsync out the daily backup of the databases (and maybe other useful bits such as /var/www/html/w and /usr/share/wordpress.) We could copy this back over to gerrit.ovirt.org.
Umm, hacky, but would work. And be better than the current situation.
There is so little disk space on linode01.ovirt.org because I never intended to use that host this long. I've been working to find a better solution, preferably one running on KVM. :) and ideally provided by e.g. one of the sponsors.
- Karsten _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra