Intermittent Jenkins crashes

Dan Kenigsberg danken at redhat.com
Tue May 6 09:48:37 UTC 2014


On Sun, May 04, 2014 at 09:40:23AM +0300, ybronhei wrote:
> On 05/02/2014 11:39 AM, Francesco Romani wrote:
> >----- Original Message -----
> >>From: "David Caro" <dcaroest at redhat.com>
> >>To: "Francesco Romani" <fromani at redhat.com>
> >>Cc: infra at ovirt.org
> >>Sent: Wednesday, April 23, 2014 7:50:58 PM
> >>Subject: Re: Intermittent Jenkins crashes
> >
> >>Let's try to see if it's a problem that only affects one slave, one
> >>python version, one distribution or fails anywhere. If it only affects
> >>one slave, we might just reprovision it (the one you pointed out is a
> >>vm). If it's related to a package version we can try to upgrade it, or
> >>downgrade it, or fix it (in the best case).
> >>
> >>If it's anything else it will be more complicated to fix, and we will
> >>have to look deeper (try to reproduce manually, add traces, maybe as
> >>you say it's an issue on the RAM, but being a vm, we might expect it
> >>failing also on the host).
> >>
> >>
> >>I started running it only on f19 slaves, to see if it happens, I'll
> >>check f20 slaves after.
> >
> >Looks like I was wrong, it happens on other VMs as well, as
> >http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8609/console
> >shows.
> >
> >So, I think we need to get at least one of those coredumps.
> >I begun to enable temporarily hoping to catch one of those, but
> >looks like we need to cast a wider net (everything reverted as I wrote).
> >
> >Please let's talk this again next week (starting 2014/02/05), I'm
> >available basically anytime (UTC+1).
> >
> >Bests and thanks,
> >
> We must have coredump here. it introduced in patch
> http://gerrit.ovirt.org/25263 which might load libvirt lib somehow.
> I couldn't put my finger on the exact reason. one coredump of this
> crash can give us the reason for it.
> 
> imo it doesn't worth too much investigation due to our refactoring
> in that area which hopefully will be merged soon and replace this
> code, but if its not hard to produce this coredump I'll be glad to
> fix it.

A reproducible segfault in Python worths a lot of investigation. We do
not know if it is limited to Jenkins slaves; it may indicated a serious
bug that may bite us badly in the future.

Does Mooli's refactoring make the segfault go away? (I did not check the
tests myself)



More information about the Infra mailing list