Intermittent Jenkins crashes
David Caro
dcaroest at redhat.com
Wed Apr 23 17:50:58 UTC 2014
On Wed 23 Apr 2014 05:51:23 PM CEST, Francesco Romani wrote:
> Sorry, forgot to add.
>
> By "main, if not only lead" I mean:
> * /var/crash is empty
> * abrt-cli list yields nothing relevant
>
> Bests,
>
> ----- Original Message -----
>> From: "Francesco Romani" <fromani at redhat.com>
>> To: infra at ovirt.org
>> Sent: Wednesday, April 23, 2014 5:48:46 PM
>> Subject: Intermittent Jenkins crashes
>>
>> Hi infra
>>
>> Recently tests started to fail quite randomly due to the python interpreter
>> crashing.
>>
>> E.g. for example (but many others are like this)
>> http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit/8370/console
>>
>> LibvirtModuleConfigureTests
>> testLibvirtConfigureToSSLFalse
>> ../tests/run_tests_local.sh: line 10: 31835
>> Segmentation fault PYTHONDONTWRITEBYTECODE=1 LC_ALL=C
>> PYTHONPATH="../lib:../vdsm:../client:../vdsm_api:$PYTHONPATH"
>> "$PYTHON_EXE" ../tests/testrunner.py --local-modules $@
>>
>> quite often, re-running the same tests using jenkins manual trigger
>> or uploading a new version of the affected patch seem to somehow fix the
>> crash.
>>
>> I have ssh access to the affected box, so I did more investigation
>>
>> the main, if not only, lead those crashes leave behind is a laconic
>>
>> [8855948.327687] python[10418]: segfault at 1 ip 00000036f2c88637 sp
>> 00007fffda3c3a60 error 4 in libpython2.7.so.1.0[36f2c00000+178000]
>>
>> the error code sometimes varies, the addresses do not.
>> So, I followed
>> http://enki-tech.blogspot.it/2012/08/debugging-c-part-3-dmesg.html
>>
>> and found the following:
>>
>> [root at jenkins-slave-vm02 ~]# ./getcrash.sh '[9141800.034517] python[11612]:
>> segfault at 1 ip 00000036f2c88637 sp 00007fffe1127c50 error 4 in
>> libpython2.7.so.1.0[36f2c00000+178000]'
>> Segmentation fault in libpython2.7.so.1.0 at: 0x88637.
>> [root at jenkins-slave-vm02 ~]# gdb /usr/lib64/libpython2.7.so.1.0
>> GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20
>> [...]
>> Reading symbols from /usr/lib64/libpython2.7.so.1.0...Reading symbols from
>> /usr/lib/debug/usr/lib64/libpython2.7.so.1.0.debug...done.
>> done.
>> (gdb) disass 0x88637
>> No function contains specified address.
>> (gdb)
>>
>> (getcrash.sh is a copy of the script presented in the page linked above)
>>
>> The only sense I can make from all of the above summarized, is a faulty RAM
>> bank, but this is little more than a wild guess.
>>
>> Any suggestion on how to go further?
>>
>> Thanks,
>>
>> --
>> Francesco Romani
>> RedHat Engineering Virtualization R & D
>> Phone: 8261328
>> IRC: fromani
>> _______________________________________________
>> Infra mailing list
>> Infra at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>
>
Let's try to see if it's a problem that only affects one slave, one
python version, one distribution or fails anywhere. If it only affects
one slave, we might just reprovision it (the one you pointed out is a
vm). If it's related to a package version we can try to upgrade it, or
downgrade it, or fix it (in the best case).
If it's anything else it will be more complicated to fix, and we will
have to look deeper (try to reproduce manually, add traces, maybe as
you say it's an issue on the RAM, but being a vm, we might expect it
failing also on the host).
I started running it only on f19 slaves, to see if it happens, I'll
check f20 slaves after.
--
David Caro
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
Email: dcaro at redhat.com
Web: www.redhat.com
RHT Global #: 82-62605
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20140423/58f8c397/attachment.sig>
More information about the Infra
mailing list