Re: [ovirt-devel] local vdsm build fails

6 Jun 2014

      ----- Original Message -----
...
From: "David Caro" <dcaroest@redhat.com>
To: "Michal Skrivanek" <michal.skrivanek@redhat.com>
Cc: devel@ovirt.org
Sent: Friday, June 6, 2014 9:53:23 AM
Subject: Re: [ovirt-devel] local vdsm build fails
On Fri 06 Jun 2014 09:23:33 AM CEST, Michal Skrivanek wrote:
...
On Jun 6, 2014, at 09:19 , Piotr Kliczewski <piotr.kliczewski@gmail.com>
wrote:
...
All,
I pulled the latest vdsm from master and noticed that build is failing.
Here is the patch that causes the failuer:
http://gerrit.ovirt.org/#/c/28226
and looking at jenkins comments I can see that jenkins was failing
with the same reason:
http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...
btw at least yesterday again there were so many false errors with jenkins
not being able to run the tests properly that it's unusable….
was that the reason the result was ignored? (though the err is clear about
relevance to that patch)
Can you point out which jobs were false positives? Also, can you
specify for each one how to determine if it's a test failure from the
logs? As specific as possible? We can filter the logs for those
failures and set a different message so you'll know from the gerrit
comments if it was a real issue or infra failure.
Quite some noise is from python segfaulting. I can reproduce the segfault locally
but I'm having hard time pinpointing the issue. Reported the bug upstream:

https://github.com/nose-devs/nose/issues/817

Let me summarize what I (we) know about python segfaulting:

* the segfault should be reproduceable on any box running nose >= 1.3.0, just using
$ cd vdsm
$ ./configure && make
$ NOSE_WITH_XUNIT=1 make check
or at least I can reproduce the issue on all the boxes I tried locally (vanilla F20, F19)

* if we run each testunit separately, we do NOT observe the failure.
This triggers the segfault:
$ cd tests
$ ./run_tests_local.sh ./*.py

This does not:
$ cd tests
$ for TEST in `ls ./*.py`; do ./run_tests_local.sh $TEST; done

* the stack traces I observed are huge, more than 750 levels deep.
This suggests the stack exausted, and this in turn probably triggered by some kind of recursion
gone wild. Note the offending stack trace is just on one thread; all the others are quiet.

* I tried to reproduce the issue with a simpler use case with no luck so far.

At the moment I don't have better suggestions than bite the bullet
and dig in the huge stack trace looking for repetitive patterns or some sort of hint.

A core dump available for post-mortem analysis, from my laptop, which is a F20 with few updates
(mostly from virt-preview and few other places - full list, if relevant, provided as pkgs.txt.gz in the
folder below)

here [link will be provided off-list]

core.20626.1000.gz is the fresh new core, core.20626.1000.md5 is its checksum.

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani