[oVirt Jenkins] test-repo_ovirt_experimental_master - Build #3800 - FAILURE!

------=_Part_19_556033321.1480570134122 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3800/, Build Number: 3800, Build Status: FAILURE ------=_Part_19_556033321.1480570134122--

------=_Part_21_1030861845.1480575899111 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3801/, Build Number: 3801, Build Status: SUCCESS ------=_Part_21_1030861845.1480575899111--

Will this error get solved also by the patch for replacing the proxies? Or we need to mirror epel to oVirt to avoid such errors? ovirt-master-epel-el7/primary_ FAILED *05:28:44* (10/12): ovirt-master-epel 2% [ ] 0.0 B/s | 482 kB --:-- ETA ovirt-master-epel-el7/primary_ FAILED *05:28:44* ovirt-master-epel-el7/prim 0% [ ] 0.0 B/s | 0 B --:-- ETA *05:28:44* and following error: Error setting up repositories: failure: repodata/486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf11861-primary.sqlite.xz from ovirt-master-epel-el7: [Errno 256] No more mirrors to try.*05:28:44* http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/486c936a72b1d31...: [Errno 14] HTTPS Error 404 - Not Found*05:28:44* [0m*05:28:44* [36m # Syncing remote repos locally (this might take some time): [31mERROR [0m (in 0:00:30) [0m*05:28:44* [36m@ Create prefix internal repo: [31mERROR [0m (in 0:00:30) [0m*05:28:44* [31mError occured, aborting*05:28:44* Traceback (most recent call last):*05:28:44* File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 264, in do_run*05:28:44* self.cli_plugins[args.ovirtverb].do_run(args)*05:28:44* File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run*05:28:44* self._do_run(**vars(args))*05:28:44* File "/usr/lib/python2.7/site-packages/lago/utils.py", line 489, in wrapper*05:28:44* return func(*args, **kwargs)*05:28:44* File "/usr/lib/python2.7/site-packages/lago/utils.py", line 500, in wrapper*05:28:44* return func(*args, prefix=prefix, **kwargs)*05:28:44* File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 179, in do_ovirt_reposetup*05:28:44* custom_sources=custom_sources,*05:28:44* File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 621, in wrapper*05:28:44* return func(*args, **kwargs)*05:28:44* Fi On Thu, Dec 1, 2016 at 9:04 AM, <jenkins@jenkins.phx.ovirt.org> wrote:
Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/3801/, Build Number: 3801, Build Status: SUCCESS _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On 1 December 2016 at 09:26, Eyal Edri <eedri@redhat.com> wrote:
Will this error get solved also by the patch for replacing the proxies? Or we need to mirror epel to oVirt to avoid such errors?
05:28:44 and following error: Error setting up repositories: failure: repodata/486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf11861-primary.sqlite.xz from ovirt-master-epel-el7: [Errno 256] No more mirrors to try. 05:28:44 http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/486c936a72b1d31...: [Errno 14] HTTPS Error 404 - Not Found
Looks like the sqlite index file got replaced while our test is running. We'll probably have to mirror to be resilient to this (Proxy cannot help you with something it did not proxy yet). One thing to note is that simple rsync mirror will not be enough, we will need a mirror system that will make _atomic_ updates to the mirror. Rsync will just make it behave like DS globalsync behaves. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Hello All. Let's try not to over-complicate this. The error here we should care about is "no more mirrors to try" Just use more than one mirror in yum configuration in ovrit system tests. That's what we enabled in standard ci and it works fine there (although we are not running this new config for a long). Yum will just failover to the next mirror in the list automatically. This is how we can get a resiliency. And proxy there is just to avoid overloading mirror with extra traffic when downloading RPMs and speed the download up. Anton. On Thu, Dec 1, 2016 at 8:34 AM, Barak Korren <bkorren@redhat.com> wrote:
On 1 December 2016 at 09:26, Eyal Edri <eedri@redhat.com> wrote:
Will this error get solved also by the patch for replacing the proxies? Or we need to mirror epel to oVirt to avoid such errors?
05:28:44 and following error: Error setting up repositories: failure: repodata/486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf1 1861-primary.sqlite.xz from ovirt-master-epel-el7: [Errno 256] No more mirrors to try. 05:28:44 http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/ 486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf1 1861-primary.sqlite.xz: [Errno 14] HTTPS Error 404 - Not Found
Looks like the sqlite index file got replaced while our test is running. We'll probably have to mirror to be resilient to this (Proxy cannot help you with something it did not proxy yet). One thing to note is that simple rsync mirror will not be enough, we will need a mirror system that will make _atomic_ updates to the mirror. Rsync will just make it behave like DS globalsync behaves.
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/
-- Anton Marchukov Senior Software Engineer - RHEV CI - Red Hat

On 1 December 2016 at 09:43, Anton Marchukov <amarchuk@redhat.com> wrote:
Hello All.
Let's try not to over-complicate this. The error here we should care about is "no more mirrors to try" Just use more than one mirror in yum configuration in ovrit system tests. That's what we enabled in standard ci and it works fine there (although we are not running this new config for a long).
This is IMO miss-diagnosing the issue - the problem is not a failed mirror - the problem is 404 on getting a metadata file from a mirror that was updated because you have a stale repomd.xml file on your local cache. Another mirror will not help there because it would probably be updated as well. You could also solve it by running 'yum clean' all the time but that would severely slow things down. The best solution is IMO to have our own "stable" mirror that _never_ changes while jobs are running. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

On Thu, Dec 1, 2016 at 10:17 AM, Barak Korren <bkorren@redhat.com> wrote:
On 1 December 2016 at 09:43, Anton Marchukov <amarchuk@redhat.com> wrote:
Hello All.
Let's try not to over-complicate this. The error here we should care about is "no more mirrors to try" Just use more than one mirror in yum configuration in ovrit system tests. That's what we enabled in standard ci and it works fine there (although we are not running this new config for a long).
This is IMO miss-diagnosing the issue - the problem is not a failed mirror - the problem is 404 on getting a metadata file from a mirror that was updated because you have a stale repomd.xml file on your local cache. Another mirror will not help there because it would probably be updated as well.
You could also solve it by running 'yum clean' all the time but that would severely slow things down.
The best solution is IMO to have our own "stable" mirror that _never_ changes while jobs are running.
I also support this approach, it will probably speed up tests as well, and its not too much work/overhead just to add anew VM on PHX called mirrors.phx.ovirt.org and mirror there any repo we use.
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

This is IMO miss-diagnosing the issue - the problem is not a failed mirror - the problem is 404 on getting a metadata file from a mirror that was updated because you have a stale repomd.xml file on your local cache. Another mirror will not help there because it would probably be updated as well.
I do not think there is anything wrong with mirrors occasionally failing. Nobody promised that they will always be working. Exactly for that all repos provide a large network of distributed mirrors and this failover functionality in yum so it is able to find the working mirror in the list. If we do have stale repomd.xml somewhere than it should not happen and it is a bug in our system. We need to invalidate it. Also Fedora and most other mirrors use pull (the only exception I am aware of are Debian and Ubuntu security mirrors that use push for speed reason, but they are atomic) so they are not updated all at once. If need we can open this thread ion some relevant mailing list to confirm or deny my assumptions.
You could also solve it by running 'yum clean' all the time but that would severely slow things down.
I am not very good and yum/rpm stuff, but how it worked on my normal Fedora system without me executing any yum cleans? The best solution is IMO to have our own "stable" mirror that _never_
changes while jobs are running.
Why would our mirror be any more stable? We already have over 100 mirrors on the internet, so why we think that our mirror will be any more stable then those? We already developed repoproxy that is that "stable" mirror you are looking for and as we see it occasionally fails. And it is normal. Every single point that you use will fail. The only way to provide a resiliency is to be able to get rid of a central point. -- Anton Marchukov Senior Software Engineer - RHEV CI - Red Hat

Also an interesting discussion to have is the same we had for maven cache files. We use reposync in lago that is supposed to sync repos locally and then it can update the existing cache to match the remote mirror when it is invoked, then as I understand we get its cache deleted each time because we use mock. Now I am not sure do we really need mock in ovirt system tests? As I understand it uses lago and lago runs everything inside vms so it is somehow isolated already? Anton. On Thu, Dec 1, 2016 at 8:34 AM, Barak Korren <bkorren@redhat.com> wrote:
On 1 December 2016 at 09:26, Eyal Edri <eedri@redhat.com> wrote:
Will this error get solved also by the patch for replacing the proxies? Or we need to mirror epel to oVirt to avoid such errors?
05:28:44 and following error: Error setting up repositories: failure: repodata/486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf1 1861-primary.sqlite.xz from ovirt-master-epel-el7: [Errno 256] No more mirrors to try. 05:28:44 http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/ 486c936a72b1d31db8b5892cb0c0372ba3c171509f168c1c24b5e32d5bf1 1861-primary.sqlite.xz: [Errno 14] HTTPS Error 404 - Not Found
Looks like the sqlite index file got replaced while our test is running. We'll probably have to mirror to be resilient to this (Proxy cannot help you with something it did not proxy yet). One thing to note is that simple rsync mirror will not be enough, we will need a mirror system that will make _atomic_ updates to the mirror. Rsync will just make it behave like DS globalsync behaves.
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/
-- Anton Marchukov Senior Software Engineer - RHEV CI - Red Hat

On 1 December 2016 at 10:14, Anton Marchukov <amarchuk@redhat.com> wrote:
Also an interesting discussion to have is the same we had for maven cache files.
We use reposync in lago that is supposed to sync repos locally and then it can update the existing cache to match the remote mirror when it is invoked, then as I understand we get its cache deleted each time because we use mock.
You understand wrong. The cache dir (/var/lib/lago) is bind-mounted into mock, so it persists across runs.
Now I am not sure do we really need mock in ovirt system tests? As I understand it uses lago and lago runs everything inside vms so it is somehow isolated already?
You still need an isolated environment to run Lago itself and its dependencies. Also the test code itself is not running in a VM. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

You understand wrong. The cache dir (/var/lib/lago) is bind-mounted into mock, so it persists across runs.
This is interesting cause I see it constantly downloads some RPMs each run. Maybe repos updated that fast then it makes sense. Will take a look during debug.
Now I am not sure do we really need mock in ovirt system tests? As I understand it uses lago and lago runs everything inside vms so it is somehow isolated already?
You still need an isolated environment to run Lago itself and its dependencies. Also the test code itself is not running in a VM.
Ok. Makes sense if the tests run outside the lago. And also since we bind mount that should be good enough. -- Anton Marchukov Senior Software Engineer - RHEV CI - Red Hat
participants (4)
-
Anton Marchukov
-
Barak Korren
-
Eyal Edri
-
jenkins@jenkins.phx.ovirt.org