Outage Update - www.ovirt.org and gerrit.ovirt.org
Mike Burns
mburns at redhat.com
Tue Mar 20 16:50:35 UTC 2012
On Tue, 2012-03-20 at 08:28 -0700, Karsten 'quaid' Wade wrote:
> On 03/20/2012 07:18 AM, Ofer Schreiber wrote:
> > www.ovirt.org and gerrit.ovirt.org are now up and running.
> >
> > We experienced two issues: 1. DB corruption on www.ovirt.org,
> > caused by a full file system. 2. Faulty gerrit service, probably
> > caused by #1.
> >
> > Both issues were handled by oVirt infra team (mburns, quaid and
> > myself)
>
> I'm in a meeting all day today so I won't pause for a single
> root-cause analysis, but instead just dump pieces as I go.
>
> As a first step, I'm fixing the easy mistakes, which includes that we
> didn't have a straightforward backup of the MediaWiki and WordPress
> databases. (We do have a daily Linode backup, but that's the painful way.)
>
> As a stop-gap, I setup a few bash scripts to run every day to grab the
> database.
>
> crontab -e
> # Give root word about the backup
> MAILTO=root
> #
> # Run five minutes after Midnight Eastern at quietest time, every day
> 5 0 * * * /root/bin/wordpress-backup.sh
> # Run ten minutes after Midnight Eastern at quietest time, every day
> 10 0 * * * /root/bin/mediawiki-backup.sh
>
> The root cause today was a fillup of
> /home/gerrit-backup/gerrit.ovirt.org-gerrit2-home-backup/ which has a
> daily snapshot of everything-that-is-gerrit.
>
> The problem is, I didn't build a clean-up for those backups, so they
> went back to January when I did the last manual clean-up.
>
> Gerrit probably fellover when trying to do the rsync of its backup to
> linode01.ovirt.org. That's the only way these two servers interact
> that I recall.
>
> So we need a cleanup script to run in cron.weekly or cron.daily to
> erase the old backups.
See the other reply on this thread from Eyal. There is a handy find
script that will work well for this.
Note: we need it on the gerrit server as well if we use that as the
backup server for the www site backup.
Mike
>
> We also need a script to rsync out the daily backup of the databases
> (and maybe other useful bits such as /var/www/html/w and
> /usr/share/wordpress.) We could copy this back over to gerrit.ovirt.org.
>
> Umm, hacky, but would work. And be better than the current situation.
>
> There is so little disk space on linode01.ovirt.org because I never
> intended to use that host this long. I've been working to find a
> better solution, preferably one running on KVM. :) and ideally
> provided by e.g. one of the sponsors.
>
> - Karsten
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
More information about the Infra
mailing list