Intermittent Jenkins crashes
Francesco Romani
fromani at redhat.com
Wed Apr 23 15:51:23 UTC 2014
Sorry, forgot to add.
By "main, if not only lead" I mean:
* /var/crash is empty
* abrt-cli list yields nothing relevant
Bests,
----- Original Message -----
> From: "Francesco Romani" <fromani at redhat.com>
> To: infra at ovirt.org
> Sent: Wednesday, April 23, 2014 5:48:46 PM
> Subject: Intermittent Jenkins crashes
>
> Hi infra
>
> Recently tests started to fail quite randomly due to the python interpreter
> crashing.
>
> E.g. for example (but many others are like this)
> http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8370/console
>
> LibvirtModuleConfigureTests
> testLibvirtConfigureToSSLFalse
> ../tests/run_tests_local.sh: line 10: 31835
> Segmentation fault PYTHONDONTWRITEBYTECODE=1 LC_ALL=C
> PYTHONPATH="../lib:../vdsm:../client:../vdsm_api:$PYTHONPATH"
> "$PYTHON_EXE" ../tests/testrunner.py --local-modules $@
>
> quite often, re-running the same tests using jenkins manual trigger
> or uploading a new version of the affected patch seem to somehow fix the
> crash.
>
> I have ssh access to the affected box, so I did more investigation
>
> the main, if not only, lead those crashes leave behind is a laconic
>
> [8855948.327687] python[10418]: segfault at 1 ip 00000036f2c88637 sp
> 00007fffda3c3a60 error 4 in libpython2.7.so.1.0[36f2c00000+178000]
>
> the error code sometimes varies, the addresses do not.
> So, I followed
> http://enki-tech.blogspot.it/2012/08/debugging-c-part-3-dmesg.html
>
> and found the following:
>
> [root at jenkins-slave-vm02 ~]# ./getcrash.sh '[9141800.034517] python[11612]:
> segfault at 1 ip 00000036f2c88637 sp 00007fffe1127c50 error 4 in
> libpython2.7.so.1.0[36f2c00000+178000]'
> Segmentation fault in libpython2.7.so.1.0 at: 0x88637.
> [root at jenkins-slave-vm02 ~]# gdb /usr/lib64/libpython2.7.so.1.0
> GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20
> [...]
> Reading symbols from /usr/lib64/libpython2.7.so.1.0...Reading symbols from
> /usr/lib/debug/usr/lib64/libpython2.7.so.1.0.debug...done.
> done.
> (gdb) disass 0x88637
> No function contains specified address.
> (gdb)
>
> (getcrash.sh is a copy of the script presented in the page linked above)
>
> The only sense I can make from all of the above summarized, is a faulty RAM
> bank, but this is little more than a wild guess.
>
> Any suggestion on how to go further?
>
> Thanks,
>
> --
> Francesco Romani
> RedHat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
More information about the Infra
mailing list