Debugging stuck vdsm jobs

Nir Soffer nsoffer at redhat.com
Sat May 28 23:07:18 UTC 2016


On Thu, May 26, 2016 at 11:08 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> Hi all,
>
> We had 2 issues causing vdsm check-patch and check-merge jobs to get stuck.
>
> I fixed the one that caused most trouble:
> https://gerrit.ovirt.org/57993
>
> The other issue may be related to ioprocess, I fixed a related issue:
> https://gerrit.ovirt.org/57473
>
> But I have seen stuck jobs after this change, so the issue may not
> be fixed yet.
>
> If you see a stuck vdsm job - job that run more than 15 minutes, please
> get me a backtrace:
>
> 1. locate the test_runner process pid:
>
>     $ ps aux | grep testrunner.py | grep -v grep
>     nsoffer  26297 82.6  0.9 389592 111144 pts/3   R+   22:52   0:02
> /usr/bin/python ../tests/testrunner.py ...
>
> 2. save a backtrace:
>
>     gdb attach 26297 --batch -ex "thread apply all py-bt" > py-bt.out

This requires the python-debuginfo package, typically installed using:

    dnf debuginfo-install python

I sent this patch, detecting stuck vdsm tests, printing a backtrace, and killing
the stuck process:
https://gerrit.ovirt.org/58212

It works, but we don't get a backtrace, since python-debuginfo is not installed
although I require it - probably we need to add the fedora-debug repository
to check-patch.repos. I tried to use the urls from /etc/yum.repos.d/fedora.repo,
but none of them work.

I will need help from infra to get it working.

Nir



More information about the Infra mailing list