[ovirt-infra] jenkins is misbehaving

Nadav Goldin ngoldin at redhat.com
Tue Jul 5 08:45:36 UTC 2016


Opened a ticket for tracking[1], still didn't get to do more tests.
Also, this never happens in VDSM, probably because ovirt-engine.git is
244MB while vdsm.git is only 52MB.

another failure:
http://jenkins.ovirt.org/job/ovirt-engine_3.6_upgrade-db-from-3.6_el6_created/44/





[1] https://ovirt-jira.atlassian.net/browse/OVIRT-403

On Thu, Jun 30, 2016 at 4:11 AM, Nadav Goldin <ngoldin at redhat.com> wrote:
> I was more or less able to reproduce the problem. I ran git clone on
> ovirt-engine.git from one of the VMs in the Jenkins_CentOS cluster for
> 200 times, with time out set to 90 seconds, and 15 seconds between
> each clone. It had 13/200 failures, which is exactly 6.5%. This
> explains why we don't see it often, it might be more severe as this
> testing was done during the night when Jenkins/Gerrit aren't busy.
> During that time there were few, but not 13, exceptions in gerrit's
> error_log:
> [2016-06-29 19:18:26,933] [NioProcessor-1] WARN
> com.google.gerrit.sshd.GerritServerSession : Exception caught
> org.apache.sshd.common.SshException: Received 97 on unknown channel 0
>         at org.apache.sshd.common.session.AbstractConnectionService.getChannel(AbstractConnectionService.java:301)
> .....
> Since it doesn't have the client IP log, its hard to tell if it is
> correlated, even if it is, not all attempts reached to the exception
> log. So there is a problem, independent of Jenkins itself. Will need
> to dig deeper to find out what is causing it..
>
>
>
>
> On Wed, Jun 29, 2016 at 8:22 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
>> 1. Its the second time it happens this week[1]
>> 2. Around a month a go, I did a log analyse of how often this happens,
>> and it was more than 10 times a week.
>> 3. After Shlomi resolved few issues on Gerrit, it seem to have gone away.
>> My guess is that this is network related or overload on Gerrit - it
>> either fails when trying to connect to Gerrit, or while cloning(like
>> in [1]). I didn't find any consistency in the error, which makes it
>> hard to reproduce. The current re-trigger Anton did was on a BM metal
>> slave, so I doubt its related to overload on the Jenkins slave itself.
>>
>> [1] http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3281/console
>>
>> On Wed, Jun 29, 2016 at 7:47 PM, Eyal Edri <eedri at redhat.com> wrote:
>>> Shlomi arw you running anything on gerrit now?
>>> If you're copying the content please stop as it might affect gerrit
>>> performance.
>>>
>>> On Jun 29, 2016 7:37 PM, "Anton Marchukov" <amarchuk at redhat.com> wrote:
>>>>
>>>> Hello All.
>>>>
>>>> I tried to clone manually and this works:
>>>>
>>>> [amarchuk at ovirt-srv22 ~]$ git clone
>>>> git://gerrit.ovirt.org/ovirt-engine.git
>>>> Cloning into 'ovirt-engine'...
>>>> remote: Counting objects: 784726, done.
>>>> remote: Compressing objects: 100% (204209/204209), done.
>>>> remote: Total 784726 (delta 360293), reused 777805 (delta 358840)
>>>> Receiving objects: 100% (784726/784726), 136.26 MiB | 28.66 MiB/s, done.
>>>> Resolving deltas: 100% (360293/360293), done.
>>>>
>>>>
>>>> But failed in the job
>>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3379/console
>>>>
>>>> So it is either not 100% reproducible or some Jenkins issue.
>>>>
>>>> Anybody did anything on Jenkins recently that can be correlated with this?
>>>>
>>>> Also gerrit plugin started to loose events...
>>>>
>>>> Anton.
>>>>
>>>> On Wed, Jun 29, 2016 at 6:14 PM, Piotr Kliczewski
>>>> <piotr.kliczewski at gmail.com> wrote:
>>>>>
>>>>> Some time ago jenkins did not update the patches with the score. Now I
>>>>> see that builds are not triggered. One of the builds that I triggered
>>>>> manually [1] failed with:
>>>>>
>>>>> 16:04:47 ERROR: Timeout after 10 minutes
>>>>> 16:04:47 ERROR: Error cloning remote repo 'origin'
>>>>> 16:04:47 hudson.plugins.git.GitException: Command "git fetch --tags
>>>>> --progress git://gerrit.ovirt.org/ovirt-engine.git
>>>>> +refs/heads/*:refs/remotes/origin/*" returned status code 143:
>>>>> 16:04:47 stdout:
>>>>> 16:04:47 stderr:
>>>>> 16:04:47 at
>>>>> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1640)
>>>>>
>>>>> Thanks,
>>>>> Piotr
>>>>>
>>>>> [1]
>>>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-el7-x86_64/3381/console
>>>>> _______________________________________________
>>>>> Infra mailing list
>>>>> Infra at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Anton Marchukov
>>>> Senior Software Engineer - RHEV CI - Red Hat
>>>>
>>>>
>>>> _______________________________________________
>>>> Infra mailing list
>>>> Infra at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>
>>>
>>> _______________________________________________
>>> Infra mailing list
>>> Infra at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>



More information about the Infra mailing list