[ovirt-infra] jenkins is misbehaving

Nadav Goldin ngoldin at redhat.com
Thu Jun 30 01:11:26 UTC 2016


I was more or less able to reproduce the problem. I ran git clone on
ovirt-engine.git from one of the VMs in the Jenkins_CentOS cluster for
200 times, with time out set to 90 seconds, and 15 seconds between
each clone. It had 13/200 failures, which is exactly 6.5%. This
explains why we don't see it often, it might be more severe as this
testing was done during the night when Jenkins/Gerrit aren't busy.
During that time there were few, but not 13, exceptions in gerrit's
error_log:
[2016-06-29 19:18:26,933] [NioProcessor-1] WARN
com.google.gerrit.sshd.GerritServerSession : Exception caught
org.apache.sshd.common.SshException: Received 97 on unknown channel 0
        at org.apache.sshd.common.session.AbstractConnectionService.getChannel(AbstractConnectionService.java:301)
.....
Since it doesn't have the client IP log, its hard to tell if it is
correlated, even if it is, not all attempts reached to the exception
log. So there is a problem, independent of Jenkins itself. Will need
to dig deeper to find out what is causing it..




On Wed, Jun 29, 2016 at 8:22 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
> 1. Its the second time it happens this week[1]
> 2. Around a month a go, I did a log analyse of how often this happens,
> and it was more than 10 times a week.
> 3. After Shlomi resolved few issues on Gerrit, it seem to have gone away.
> My guess is that this is network related or overload on Gerrit - it
> either fails when trying to connect to Gerrit, or while cloning(like
> in [1]). I didn't find any consistency in the error, which makes it
> hard to reproduce. The current re-trigger Anton did was on a BM metal
> slave, so I doubt its related to overload on the Jenkins slave itself.
>
> [1] http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3281/console
>
> On Wed, Jun 29, 2016 at 7:47 PM, Eyal Edri <eedri at redhat.com> wrote:
>> Shlomi arw you running anything on gerrit now?
>> If you're copying the content please stop as it might affect gerrit
>> performance.
>>
>> On Jun 29, 2016 7:37 PM, "Anton Marchukov" <amarchuk at redhat.com> wrote:
>>>
>>> Hello All.
>>>
>>> I tried to clone manually and this works:
>>>
>>> [amarchuk at ovirt-srv22 ~]$ git clone
>>> git://gerrit.ovirt.org/ovirt-engine.git
>>> Cloning into 'ovirt-engine'...
>>> remote: Counting objects: 784726, done.
>>> remote: Compressing objects: 100% (204209/204209), done.
>>> remote: Total 784726 (delta 360293), reused 777805 (delta 358840)
>>> Receiving objects: 100% (784726/784726), 136.26 MiB | 28.66 MiB/s, done.
>>> Resolving deltas: 100% (360293/360293), done.
>>>
>>>
>>> But failed in the job
>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3379/console
>>>
>>> So it is either not 100% reproducible or some Jenkins issue.
>>>
>>> Anybody did anything on Jenkins recently that can be correlated with this?
>>>
>>> Also gerrit plugin started to loose events...
>>>
>>> Anton.
>>>
>>> On Wed, Jun 29, 2016 at 6:14 PM, Piotr Kliczewski
>>> <piotr.kliczewski at gmail.com> wrote:
>>>>
>>>> Some time ago jenkins did not update the patches with the score. Now I
>>>> see that builds are not triggered. One of the builds that I triggered
>>>> manually [1] failed with:
>>>>
>>>> 16:04:47 ERROR: Timeout after 10 minutes
>>>> 16:04:47 ERROR: Error cloning remote repo 'origin'
>>>> 16:04:47 hudson.plugins.git.GitException: Command "git fetch --tags
>>>> --progress git://gerrit.ovirt.org/ovirt-engine.git
>>>> +refs/heads/*:refs/remotes/origin/*" returned status code 143:
>>>> 16:04:47 stdout:
>>>> 16:04:47 stderr:
>>>> 16:04:47 at
>>>> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1640)
>>>>
>>>> Thanks,
>>>> Piotr
>>>>
>>>> [1]
>>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-el7-x86_64/3381/console
>>>> _______________________________________________
>>>> Infra mailing list
>>>> Infra at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>>
>>>
>>>
>>> --
>>> Anton Marchukov
>>> Senior Software Engineer - RHEV CI - Red Hat
>>>
>>>
>>> _______________________________________________
>>> Infra mailing list
>>> Infra at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>
>> _______________________________________________
>> Infra mailing list
>> Infra at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>



More information about the Infra mailing list