[ovirt-infra] jenkins is misbehaving

Nadav Goldin ngoldin at redhat.com
Tue Jul 5 08:46:22 UTC 2016


fixing wrong ticket link:
[1] https://ovirt-jira.atlassian.net/browse/OVIRT-619


On Tue, Jul 5, 2016 at 11:45 AM, Nadav Goldin <ngoldin at redhat.com> wrote:
> Opened a ticket for tracking[1], still didn't get to do more tests.
> Also, this never happens in VDSM, probably because ovirt-engine.git is
> 244MB while vdsm.git is only 52MB.
>
> another failure:
> http://jenkins.ovirt.org/job/ovirt-engine_3.6_upgrade-db-from-3.6_el6_created/44/
>
>
>
>
>
> [1] https://ovirt-jira.atlassian.net/browse/OVIRT-403
>
> On Thu, Jun 30, 2016 at 4:11 AM, Nadav Goldin <ngoldin at redhat.com> wrote:
>> I was more or less able to reproduce the problem. I ran git clone on
>> ovirt-engine.git from one of the VMs in the Jenkins_CentOS cluster for
>> 200 times, with time out set to 90 seconds, and 15 seconds between
>> each clone. It had 13/200 failures, which is exactly 6.5%. This
>> explains why we don't see it often, it might be more severe as this
>> testing was done during the night when Jenkins/Gerrit aren't busy.
>> During that time there were few, but not 13, exceptions in gerrit's
>> error_log:
>> [2016-06-29 19:18:26,933] [NioProcessor-1] WARN
>> com.google.gerrit.sshd.GerritServerSession : Exception caught
>> org.apache.sshd.common.SshException: Received 97 on unknown channel 0
>>         at org.apache.sshd.common.session.AbstractConnectionService.getChannel(AbstractConnectionService.java:301)
>> .....
>> Since it doesn't have the client IP log, its hard to tell if it is
>> correlated, even if it is, not all attempts reached to the exception
>> log. So there is a problem, independent of Jenkins itself. Will need
>> to dig deeper to find out what is causing it..
>>
>>
>>
>>
>> On Wed, Jun 29, 2016 at 8:22 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
>>> 1. Its the second time it happens this week[1]
>>> 2. Around a month a go, I did a log analyse of how often this happens,
>>> and it was more than 10 times a week.
>>> 3. After Shlomi resolved few issues on Gerrit, it seem to have gone away.
>>> My guess is that this is network related or overload on Gerrit - it
>>> either fails when trying to connect to Gerrit, or while cloning(like
>>> in [1]). I didn't find any consistency in the error, which makes it
>>> hard to reproduce. The current re-trigger Anton did was on a BM metal
>>> slave, so I doubt its related to overload on the Jenkins slave itself.
>>>
>>> [1] http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3281/console
>>>
>>> On Wed, Jun 29, 2016 at 7:47 PM, Eyal Edri <eedri at redhat.com> wrote:
>>>> Shlomi arw you running anything on gerrit now?
>>>> If you're copying the content please stop as it might affect gerrit
>>>> performance.
>>>>
>>>> On Jun 29, 2016 7:37 PM, "Anton Marchukov" <amarchuk at redhat.com> wrote:
>>>>>
>>>>> Hello All.
>>>>>
>>>>> I tried to clone manually and this works:
>>>>>
>>>>> [amarchuk at ovirt-srv22 ~]$ git clone
>>>>> git://gerrit.ovirt.org/ovirt-engine.git
>>>>> Cloning into 'ovirt-engine'...
>>>>> remote: Counting objects: 784726, done.
>>>>> remote: Compressing objects: 100% (204209/204209), done.
>>>>> remote: Total 784726 (delta 360293), reused 777805 (delta 358840)
>>>>> Receiving objects: 100% (784726/784726), 136.26 MiB | 28.66 MiB/s, done.
>>>>> Resolving deltas: 100% (360293/360293), done.
>>>>>
>>>>>
>>>>> But failed in the job
>>>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-fc23-x86_64/3379/console
>>>>>
>>>>> So it is either not 100% reproducible or some Jenkins issue.
>>>>>
>>>>> Anybody did anything on Jenkins recently that can be correlated with this?
>>>>>
>>>>> Also gerrit plugin started to loose events...
>>>>>
>>>>> Anton.
>>>>>
>>>>> On Wed, Jun 29, 2016 at 6:14 PM, Piotr Kliczewski
>>>>> <piotr.kliczewski at gmail.com> wrote:
>>>>>>
>>>>>> Some time ago jenkins did not update the patches with the score. Now I
>>>>>> see that builds are not triggered. One of the builds that I triggered
>>>>>> manually [1] failed with:
>>>>>>
>>>>>> 16:04:47 ERROR: Timeout after 10 minutes
>>>>>> 16:04:47 ERROR: Error cloning remote repo 'origin'
>>>>>> 16:04:47 hudson.plugins.git.GitException: Command "git fetch --tags
>>>>>> --progress git://gerrit.ovirt.org/ovirt-engine.git
>>>>>> +refs/heads/*:refs/remotes/origin/*" returned status code 143:
>>>>>> 16:04:47 stdout:
>>>>>> 16:04:47 stderr:
>>>>>> 16:04:47 at
>>>>>> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1640)
>>>>>>
>>>>>> Thanks,
>>>>>> Piotr
>>>>>>
>>>>>> [1]
>>>>>> http://jenkins.ovirt.org/job/ovirt-engine_master_check-patch-el7-x86_64/3381/console
>>>>>> _______________________________________________
>>>>>> Infra mailing list
>>>>>> Infra at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Anton Marchukov
>>>>> Senior Software Engineer - RHEV CI - Red Hat
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Infra mailing list
>>>>> Infra at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>>
>>>>
>>>> _______________________________________________
>>>> Infra mailing list
>>>> Infra at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>



More information about the Infra mailing list