Outage Update - www.ovirt.org and gerrit.ovirt.org

Karsten 'quaid' Wade kwade at redhat.com
Tue Mar 20 15:28:59 UTC 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/20/2012 07:18 AM, Ofer Schreiber wrote:
> www.ovirt.org and gerrit.ovirt.org are now up and running.
> 
> We experienced two issues: 1. DB corruption on www.ovirt.org,
> caused by a full file system. 2. Faulty gerrit service, probably
> caused by #1.
> 
> Both issues were handled by oVirt infra team (mburns, quaid and
> myself)

I'm in a meeting all day today so I won't pause for a single
root-cause analysis, but instead just dump pieces as I go.

As a first step, I'm fixing the easy mistakes, which includes that we
didn't have a straightforward backup of the MediaWiki and WordPress
databases. (We do have a daily Linode backup, but that's the painful way.)

As a stop-gap, I setup a few bash scripts to run every day to grab the
database.

crontab -e
# Give root word about the backup
MAILTO=root
#
# Run five minutes after Midnight Eastern at quietest time, every day
5 0 * * * /root/bin/wordpress-backup.sh
# Run ten minutes after Midnight Eastern at quietest time, every day
10 0 * * * /root/bin/mediawiki-backup.sh

The root cause today was a fillup of
/home/gerrit-backup/gerrit.ovirt.org-gerrit2-home-backup/ which has a
daily snapshot of everything-that-is-gerrit.

The problem is, I didn't build a clean-up for those backups, so they
went back to January when I did the last manual clean-up.

Gerrit probably fellover when trying to do the rsync of its backup to
linode01.ovirt.org. That's the only way these two servers interact
that I recall.

So we need a cleanup script to run in cron.weekly or cron.daily to
erase the old backups.

We also need a script to rsync out the daily backup of the databases
(and maybe other useful bits such as /var/www/html/w and
/usr/share/wordpress.) We could copy this back over to gerrit.ovirt.org.

Umm, hacky, but would work. And be better than the current situation.

There is so little disk space on linode01.ovirt.org because I never
intended to use that host this long. I've been working to find a
better solution, preferably one running on KVM. :) and ideally
provided by e.g. one of the sponsors.

- - Karsten
- -- 
name:  Karsten 'quaid' Wade, Sr. Community Architect
team:    Red Hat Community Architecture & Leadership
uri:              http://communityleadershipteam.org
                         http://TheOpenSourceWay.org
gpg:                                        AD0E0C41
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFPaKI72ZIOBq0ODEERAhnAAKCvNMDHxxG3IR2rDBBarqsn7V/UAACg4HIo
T9rM2fCZTCdpDGrQsz/Xq2o=
=vzYt
-----END PGP SIGNATURE-----



More information about the Infra mailing list