
Sorry, forgot to add. By "main, if not only lead" I mean: * /var/crash is empty * abrt-cli list yields nothing relevant Bests, ----- Original Message -----
From: "Francesco Romani" <fromani@redhat.com> To: infra@ovirt.org Sent: Wednesday, April 23, 2014 5:48:46 PM Subject: Intermittent Jenkins crashes
Hi infra
Recently tests started to fail quite randomly due to the python interpreter crashing.
E.g. for example (but many others are like this) http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8370/console
LibvirtModuleConfigureTests testLibvirtConfigureToSSLFalse ../tests/run_tests_local.sh: line 10: 31835 Segmentation fault PYTHONDONTWRITEBYTECODE=1 LC_ALL=C PYTHONPATH="../lib:../vdsm:../client:../vdsm_api:$PYTHONPATH" "$PYTHON_EXE" ../tests/testrunner.py --local-modules $@
quite often, re-running the same tests using jenkins manual trigger or uploading a new version of the affected patch seem to somehow fix the crash.
I have ssh access to the affected box, so I did more investigation
the main, if not only, lead those crashes leave behind is a laconic
[8855948.327687] python[10418]: segfault at 1 ip 00000036f2c88637 sp 00007fffda3c3a60 error 4 in libpython2.7.so.1.0[36f2c00000+178000]
the error code sometimes varies, the addresses do not. So, I followed http://enki-tech.blogspot.it/2012/08/debugging-c-part-3-dmesg.html
and found the following:
[root@jenkins-slave-vm02 ~]# ./getcrash.sh '[9141800.034517] python[11612]: segfault at 1 ip 00000036f2c88637 sp 00007fffe1127c50 error 4 in libpython2.7.so.1.0[36f2c00000+178000]' Segmentation fault in libpython2.7.so.1.0 at: 0x88637. [root@jenkins-slave-vm02 ~]# gdb /usr/lib64/libpython2.7.so.1.0 GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20 [...] Reading symbols from /usr/lib64/libpython2.7.so.1.0...Reading symbols from /usr/lib/debug/usr/lib64/libpython2.7.so.1.0.debug...done. done. (gdb) disass 0x88637 No function contains specified address. (gdb)
(getcrash.sh is a copy of the script presented in the page linked above)
The only sense I can make from all of the above summarized, is a faulty RAM bank, but this is little more than a wild guess.
Any suggestion on how to go further?
Thanks,
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani