[ovirt-devel] local vdsm build fails

Francesco Romani fromani at redhat.com
Fri Jun 6 08:03:04 UTC 2014


----- Original Message -----
> From: "David Caro" <dcaroest at redhat.com>
> To: "Michal Skrivanek" <michal.skrivanek at redhat.com>
> Cc: devel at ovirt.org
> Sent: Friday, June 6, 2014 9:53:23 AM
> Subject: Re: [ovirt-devel] local vdsm build fails
> 
> On Fri 06 Jun 2014 09:23:33 AM CEST, Michal Skrivanek wrote:
> >
> > On Jun 6, 2014, at 09:19 , Piotr Kliczewski <piotr.kliczewski at gmail.com>
> > wrote:
> >
> >> All,
> >>
> >> I pulled the latest vdsm from master and noticed that build is failing.
> >>
> >> Here is the patch that causes the failuer:
> >>
> >> http://gerrit.ovirt.org/#/c/28226
> >>
> >> and looking at jenkins comments I can see that jenkins was failing
> >> with the same reason:
> >>
> >> http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_gerrit/1064/console
> >
> > btw at least yesterday again there were so many false errors with jenkins
> > not being able to run the tests properly that it's unusable….
> > was that the reason the result was ignored? (though the err is clear about
> > relevance to that patch)
> 
> Can you point out which jobs were false positives? Also, can you
> specify for each one how to determine if it's a test failure from the
> logs? As specific as possible? We can filter the logs for those
> failures and set a different message so you'll know from the gerrit
> comments if it was a real issue or infra failure.

Quite some noise is from python segfaulting. I can reproduce the segfault locally
but I'm having hard time pinpointing the issue. Reported the bug upstream:

https://github.com/nose-devs/nose/issues/817

Let me summarize what I (we) know about python segfaulting:

* the segfault should be reproduceable on any box running nose >= 1.3.0, just using
$ cd vdsm
$ ./configure && make
$ NOSE_WITH_XUNIT=1 make check
or at least I can reproduce the issue on all the boxes I tried locally (vanilla F20, F19)

* if we run each testunit separately, we do NOT observe the failure.
This triggers the segfault:
$ cd tests
$ ./run_tests_local.sh ./*.py

This does not:
$ cd tests
$ for TEST in `ls ./*.py`; do ./run_tests_local.sh $TEST; done

* the stack traces I observed are huge, more than 750 levels deep.
This suggests the stack exausted, and this in turn probably triggered by some kind of recursion
gone wild. Note the offending stack trace is just on one thread; all the others are quiet.

* I tried to reproduce the issue with a simpler use case with no luck so far.

At the moment I don't have better suggestions than bite the bullet
and dig in the huge stack trace looking for repetitive patterns or some sort of hint.

A core dump available for post-mortem analysis, from my laptop, which is a F20 with few updates
(mostly from virt-preview and few other places - full list, if relevant, provided as pkgs.txt.gz in the
folder below)

here [link will be provided off-list]

core.20626.1000.gz is the fresh new core, core.20626.1000.md5 is its checksum.

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani



More information about the Devel mailing list