it runs a script [1], i guess we can clone this to check other infra servers as well..
[1]
#!/bin/sh
df -H | grep -vE '^Filesystem|tmpfs|cdrom|file.tlv|loop' | awk '{ print $5
" " $1 }' | while read output;
do
echo $output
usep=$(echo $output | awk '{ print $1}' | cut -d'%' -f1 )
partition=$(echo $output | awk '{ print $2 }' )
if [ $usep -ge 90 ]; then
echo "Running out of space \"$partition ($usep%)\" on $(hostname) as on
$(date)"
exit 1
fi
done
----- Original Message -----
From: "Mike Burns" <mburns(a)redhat.com>
To: "Doron Fediuck" <dfediuck(a)redhat.com>
Cc: "infra" <infra(a)ovirt.org>, "users" <users(a)ovirt.org>,
"board" <board(a)ovirt.org>
Sent: Wednesday, November 14, 2012 3:58:17 PM
Subject: Re: Wiki and Mailing Lists Outage -- 2012-11-14
On Wed, 2012-11-14 at 08:45 -0500, Doron Fediuck wrote:
> Thanks Mike!
> I suggest to have a cron alerting for no-space issues.
We run logwatch which is supposed to highlight these issues, but I
suspect that no one is actually reading the logwatch report. A
separate
cron job or monitoring service is also a possibility.
Mike
>
> ----- Original Message -----
> > From: "Mike Burns" <mburns(a)redhat.com>
> > To: "board" <board(a)ovirt.org>, "infra"
<infra(a)ovirt.org>, "users"
> > <users(a)ovirt.org>
> > Sent: Wednesday, November 14, 2012 3:31:11 PM
> > Subject: Wiki and Mailing Lists Outage -- 2012-11-14
> >
> > We experienced an outage today in both the wiki and the mailing
> > lists.
> >
> > * Wiki content was available throughout the outage, but attempts
> > to
> > login or edit received an error message about requiring cookies
> > to be
> > enabled.
> > * All mails to the mailing list failed to show up on the lists,
> > but
> > also did not return rejection messages.
> >
> > Cause:
> >
> > This was caused by an "Out of Space" error on the host running
> > both
> > of
> > these services. A temporary workaround was put in place to get
> > both
> > services up and running again.
> >
> >
> > Action Taken:
> >
> > Remove the oldest gerrit backup (600MB)
> > Remove some older non-functional ovirt-node-iso images and rpms
> > from
> > the
> > releases (source remains there)
> >
> > Long term solution:
> >
> > Migrating these services away from a single host onto hosted
> > solutions
> > (OpenShift, AlterWay).
> >
> > Current Situation:
> >
> > Wiki is back up and running, login works as expected
> > Lists are processing the backlog of emails since the outage
> > began.
> > At this time, it does not appear that any mail was lost due to
> > the
> > outage.
> >
> >
> > Thanks for the patience and understanding
> >
> > Mike
> >
> > _______________________________________________
> > Infra mailing list
> > Infra(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/infra
> >
> _______________________________________________
> Board mailing list
> Board(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/board
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra