repoman errors in OST

Nadav Goldin ngoldin at redhat.com
Wed Feb 22 18:36:18 UTC 2017


> Since we cannot reproduce this, and we cannot easily stop using
> repoman in OST at this point. We implemented a work-around for the
> time being where we directed the master flow to run on a fixed set of
> nodes that have A LOT of RAM [3].

Take into account that this will significantly make the suites run
slower(+10 minutes), as iirc all those servers are multi-NUMA. Also
something must be really exploding, because the basic suite does not
take more than 10GB of ram, and most of the low memory servers have
around 48GB.


> filling up with files, instead, repoman`s memory usage was exploding
> (20G+) to the point where there was not more memory available for use
> by /dev/shm.

I have a wild guess that this also happens because repoman does
post-filtering, and it first downloads all packages, then filters
them.

About node and appliance, I think we should avoid downloading them,
they are not used anywhere as far as I know. This filter should
work(in extra_sources) last I checked, i.e.:
rec:http://plain.resources.ovirt.org/repos/ovirt/tested/4.1/rpm/el7/:name~^(?!ovirt-node-ng-image|ovirt-engine-appliance).*
If it goes in the groovy it will need some regex escaping love..
Though if my previous assumption is correct(post-filtering) it
probably wouldn't matter.

This raises the questions(again) of how do we filter stuff from
repoman efficiently, without hiding them in 'extra_sources'.



Nadav.



On Wed, Feb 22, 2017 at 8:07 PM, Barak Korren <bkorren at redhat.com> wrote:
> Hi everyone,
>
> We've recently seen repeating errors where the OST 'master upgrade
> from release' suit failed with a repoman exception.
> Close analysis revealed that repoman was failing because it ran out of
> space in /dev/shm (OST suites are configured to run fro, /dev/shm if
> the slave has more then 16G available in it).
>
> The thing is, there is nothing that seems special about this suit and
> the packages it downloads, but since we suspected package sizes we
> opened OST-49 [1].
>
> Trying to get more information we monitored a slave while it was
> running the suit. We found out that it wasn't the /dev/shm that we
> filling up with files, instead, repoman`s memory usage was exploding
> (20G+) to the point where there was not more memory available for use
> by /dev/shm.
> As a result we reported REP-3 [2].
>
> This is not happening all the time. The same suit sometimes succeeds
> on the exact same slaves. We haven't yet managed to manually reproduce
> this.
>
> Since we cannot reproduce this, and we cannot easily stop using
> repoman in OST at this point. We implemented a work-around for the
> time being where we directed the master flow to run on a fixed set of
> nodes that have A LOT of RAM [3].
>
> Needless to say this is not a long term solution. We need to somehow
> manage to reproduce or gain insight on the problem. Alternatively we
> can consider reworking the OST suites to not use repoman for
> downloading, but still use it for local repo building (Where its
> unique properties are crucial).
>
> [1]: https://ovirt-jira.atlassian.net/browse/OST-49
> [2]: https://ovirt-jira.atlassian.net/browse/REP-3
> [3]: http://jenkins.ovirt.org/label/integ-tests-big/
>
> --
> Barak Korren
> bkorren at redhat.com
> RHCE, RHCi, RHV-DevOps Team
> https://ifireball.wordpress.com/
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra


More information about the Infra mailing list