OST Network suite is failing on "OSError: [Errno 28] No space left on device"

Barak Korren bkorren at redhat.com
Tue Mar 20 08:11:09 UTC 2018


On 20 March 2018 at 09:17, Yedidyah Bar David <didi at redhat.com> wrote:
> On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler <dholler at redhat.com> wrote:
>> Thanks Gal, I expect the problem is fixed until something eats
>> all space in /dev/shm.
>> But the usage of /dev/shm is logged in the output, so we would be able
>> to detect the problem next time instantly.
>>
>> From my point of view it would be good to know why /dev/shm was full,
>> to prevent this situation in future.
>
> Gal already wrote below - it was because some build failed to clean up
> after itself.
>
> I don't know about this specific case, but I was told that I am
> personally causing such issues by using the 'cancel' button, so I
> sadly stopped. Sadly, because our CI system is quite loaded and when I
> know that some build is useless, I wish to kill it and save some
> load...
>
> Back to your point, perhaps we should make jobs check /dev/shm when
> they _start_, and either alert/fail/whatever if it's not almost free,
> or, if we know what we are doing, just remove stuff there? That might
> be much easier than fixing things to clean up in end, and/or debugging
> why this cleaning failed.

Sure thing, patches to:

    [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh

Are welcome, we often find interesting stuff to add there...

If constrained for time, please turn this comment into an orderly RFE in Jira...

-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted


More information about the Infra mailing list