On Sun, May 04, 2014 at 09:40:23AM +0300, ybronhei wrote:
> On 05/02/2014 11:39 AM, Francesco Romani wrote:
>> ----- Original Message -----
>>> From: "David Caro" <dcaroest(a)redhat.com>
>>> To: "Francesco Romani" <fromani(a)redhat.com>
>>> Cc: infra(a)ovirt.org
>>> Sent: Wednesday, April 23, 2014 7:50:58 PM
>>> Subject: Re: Intermittent Jenkins crashes
>>
>>> Let's try to see if it's a problem that only affects one slave, one
>>> python version, one distribution or fails anywhere. If it only affects
>>> one slave, we might just reprovision it (the one you pointed out is a
>>> vm). If it's related to a package version we can try to upgrade it, or
>>> downgrade it, or fix it (in the best case).
>>>
>>> If it's anything else it will be more complicated to fix, and we will
>>> have to look deeper (try to reproduce manually, add traces, maybe as
>>> you say it's an issue on the RAM, but being a vm, we might expect it
>>> failing also on the host).
>>>
>>>
>>> I started running it only on f19 slaves, to see if it happens, I'll
>>> check f20 slaves after.
>>
>> Looks like I was wrong, it happens on other VMs as well, as
>>
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8609/console
>> shows.
>>
>> So, I think we need to get at least one of those coredumps.
>> I begun to enable temporarily hoping to catch one of those, but
>> looks like we need to cast a wider net (everything reverted as I wrote).
>>
>> Please let's talk this again next week (starting 2014/02/05), I'm
>> available basically anytime (UTC+1).
>>
>> Bests and thanks,
>>
> We must have coredump here. it introduced in patch
>
http://gerrit.ovirt.org/25263 which might load libvirt lib somehow.
> I couldn't put my finger on the exact reason. one coredump of this
> crash can give us the reason for it.
>
> imo it doesn't worth too much investigation due to our refactoring
> in that area which hopefully will be merged soon and replace this
> code, but if its not hard to produce this coredump I'll be glad to
> fix it.
A reproducible segfault in Python worths a lot of investigation. We do
not know if it is limited to Jenkins slaves; it may indicated a serious
bug that may bite us badly in the future.
Does Mooli's refactoring make the segfault go away? (I did not check the
tests myself)
still in progress and need to check more flows with the new
implementation (reviews will be very appreciated if you can -
)
I agree that investigate this segfault can be fruitful. coredump is the
only way that might assist us here to know more.
Francesco, any news with getting it on the jenkins vm?
--
Yaniv Bronhaim.