
Hi infra Recently tests started to fail quite randomly due to the python interpreter crashing. E.g. for example (but many others are like this) http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8370/console LibvirtModuleConfigureTests testLibvirtConfigureToSSLFalse ../tests/run_tests_local.sh: line 10: 31835 Segmentation fault PYTHONDONTWRITEBYTECODE=1 LC_ALL=C PYTHONPATH="../lib:../vdsm:../client:../vdsm_api:$PYTHONPATH" "$PYTHON_EXE" ../tests/testrunner.py --local-modules $@ quite often, re-running the same tests using jenkins manual trigger or uploading a new version of the affected patch seem to somehow fix the crash. I have ssh access to the affected box, so I did more investigation the main, if not only, lead those crashes leave behind is a laconic [8855948.327687] python[10418]: segfault at 1 ip 00000036f2c88637 sp 00007fffda3c3a60 error 4 in libpython2.7.so.1.0[36f2c00000+178000] the error code sometimes varies, the addresses do not. So, I followed http://enki-tech.blogspot.it/2012/08/debugging-c-part-3-dmesg.html and found the following: [root@jenkins-slave-vm02 ~]# ./getcrash.sh '[9141800.034517] python[11612]: segfault at 1 ip 00000036f2c88637 sp 00007fffe1127c50 error 4 in libpython2.7.so.1.0[36f2c00000+178000]' Segmentation fault in libpython2.7.so.1.0 at: 0x88637. [root@jenkins-slave-vm02 ~]# gdb /usr/lib64/libpython2.7.so.1.0 GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20 [...] Reading symbols from /usr/lib64/libpython2.7.so.1.0...Reading symbols from /usr/lib/debug/usr/lib64/libpython2.7.so.1.0.debug...done. done. (gdb) disass 0x88637 No function contains specified address. (gdb) (getcrash.sh is a copy of the script presented in the page linked above) The only sense I can make from all of the above summarized, is a faulty RAM bank, but this is little more than a wild guess. Any suggestion on how to go further? Thanks, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani