This is IMO miss-diagnosing the issue - the problem is not a failed
mirror - the problem is 404 on getting a metadata file from a mirror
that was updated because you have a stale repomd.xml file on your
local cache. Another mirror will not help there because it would
probably be updated as well.

I do not think there is anything wrong with mirrors occasionally failing. Nobody promised that they will always be working. Exactly for that all repos provide a large network of distributed mirrors and this failover functionality in yum so it is able to find the working mirror in the list.

If we do have stale repomd.xml somewhere than it should not happen and it is a bug in our system. We need to invalidate it.

Also Fedora and most other mirrors use pull (the only exception I am aware of are Debian and Ubuntu security mirrors that use push for speed reason, but they are atomic) so they are not updated all at once. If need we can open this thread ion some relevant mailing list to confirm or deny my assumptions.
 
You could also solve it by running 'yum clean' all the time but that
would severely slow things down.

I am not very good and yum/rpm stuff, but how it worked on my normal Fedora system without me executing any yum cleans?

The best solution is IMO to have our own "stable" mirror that _never_
changes while jobs are running.

Why would our mirror be any more stable? We already have over 100 mirrors on the internet, so why we think that our mirror will be any more stable then those? We already developed repoproxy  that is that "stable" mirror you are looking for and as we see it occasionally fails. And it is normal. Every single point that you use will fail. The only way to provide a resiliency is to be able to get rid of a central point. 

-- 
Anton Marchukov
Senior Software Engineer - RHEV CI - Red Hat