Heads up! Influence of our recent Mock/Proxy changes on Lago jobs

23 Dec 2016

      Hi infra team members!

As you may know, we've recently changed out proxied Mock configuration
so that the 'http_proxy' environment variable gets defined inside the
Mock environment. This was in an effort to make 'pip', 'curl' and
'wget' commands go through our PHX proxy. As it turns out, this also
have unforeseen influence on yum tools.

Now, when it come to yum, as it is used inside the mock environmet, we
long has the proxied configuration hard-wiring it to use the proxy by
setting it in "yum.conf". However, so far, yum tools (Such as
reposync) that brought their own configuration, essentially bypassed
the "yum.conf" file and hence were not using the proxy.

Well, now it turns out that 'yum' and the derived tools also respect
the 'http_proxy' environment variable [1]:

    10.2. Configuring Proxy Server Access for a Single User

    To enable proxy access for a specific user, add the lines in the example box
    below to the user's shell profile. For the default bash shell, the
profile is
    the file ~/.bash_profile. The settings below enable yum to use the proxy
    server mycache.mydomain.com, connecting to port 3128.

        # The Web proxy server used by this account
        http_proxy="http://mycache.mydomain.com:3128"
        export http_proxy

This is generally a good thing, but it can lead to formerly unexpected
consequences.

Case-to-point: The Lago job reposync failures of last Thursday (Dec 22nd, 2016).

The root-cause behind the failures was that the
"ovirt-web-ui-0.1.0-4.el7.centos.x86_64.rpm" file was changed in the
"ovirt-master-snapshot-static" repo. Updating an RPM file without
changing the version or revision numbers breaks YUM`s rules and makes
reposync choke. We already knew about this and actually had a
work-around in the Lago code [2].

We I came in Thursday morning, and saw reposync failing in all the
Lago jobs, I just assumed that our work-around simply failed to work.
My assumption was enforced by the fact that I was able to reproduce
the issue by running 'reposync' manually on the Lago hosts, and also
managed to rectify it by removing the offending from file the reposync
cache. I spent the next few hours chasing down failing jobs and
cleaning up the caches on the hosts they ran on. I took me a while to
figure out that I was seeing the problem (Essentially, the older
version of the package file) reappear on the same hosts over and over
again!
Wondering how could that be, and after ensuring the older package file
was nowhere to be found on any of the repos the jobs were using, Me
and Gal took a look at the Lago code to see if it could be causing the
issue. Imagine our puzzlement when we realized the work-around code
was doing _exactly_ what I was doing manually, and still somehow
managed to make the very issue it was designed to solve reappear!
Eventually the problem seemed to disappear on its own. Now, armed with
the knowledge above I can provide a plausible explanation to what we
were seeing.
The difference between my manual executions of 'reposync' and the way
Lago was running it was that Lago was running within Mock, where
'http_proxy' was defined. What was probably happening is that reposync
kept getting the old RPM file from the proxy while still getting a
newer yum metadate file.

Conclusion - The next time such an issue arises, we must make sure to
clear the PHX proxy cache, there is actually no need to clear the
cache on the Lago hosts themselves, because our work-around will
resolve the issue there. Longer term we may configure the proxy to not
cache files coming from resources.ovirt.org.

[1]: https://www.centos.org/docs/5/html/yum/sn-yum-proxy-server.html
[2]: https://github.com/lago-project/lago/blob/master/ovirtlago/reposetup.py#L141...

-- 
Barak Korren
bkorren@redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/

Barak Korren

Anton Marchukov

Barak Korren

Anton Marchukov

tags

participants (2)