Debugging stuck vdsm jobs

David Caro Estevez dcaro at redhat.com
Mon May 30 07:46:47 UTC 2016


On 05/29 02:24, Nir Soffer wrote:
> On Sun, May 29, 2016 at 2:10 AM, Nir Soffer <nsoffer at redhat.com> wrote:
> > It look like this when tests times out:
> >
> > 23:04:44 miscTests.EventTests
> > 23:04:44     testEmit                                                    OK
> > 23:04:44     testEmitCallbackException                                   OK
> > 23:04:49     testEmitStale                                               OK
> > 23:04:49     testInstanceMethod                                          OK
> > 23:04:50     testInstanceMethodDead                                      OK
> > 23:04:55     testOneShot
> > 23:04:55 ========================================================================
> > 23:04:55 =           Timeout completing tests - extracting stacktrace
> >          =
> > 23:04:55 ========================================================================
> > 23:04:55
> > 23:04:55 attach: No such file or directory.
> > 23:04:55 [New LWP 7887]
> > 23:04:55 [New LWP 7880]
> > 23:04:55 [New LWP 7873]
> > 23:04:55 [New LWP 7866]
> > 23:04:55 [New LWP 7859]
> > 23:04:55 [New LWP 7852]
> > 23:04:55 [New LWP 7845]
> > 23:04:55 [Thread debugging using libthread_db enabled]
> > 23:04:55 Using host libthread_db library "/lib64/libthread_db.so.1".
> > 23:04:56 0x00007f17f0a1fa82 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> > from /lib64/libpthread.so.0
> > 23:04:56
> > 23:04:56 Thread 8 (Thread 0x7f17df860700 (LWP 7845)):
> > 23:04:56 Undefined command: "py-bt".  Try "help".
> > 23:04:56 OK
> > 23:04:56     testUnregister
> > 23:04:56 ========================================================================
> > 23:04:56 =                        Aborting tests
> >          =
> > 23:04:56 ========================================================================
> > 23:04:56 ../tests/run_tests_local.sh: line 35:  7743 Killed
> >       "$PYTHON_EXE" ../tests/testrunner.py --local-modules $@
> >
> >
> >
> > On Sun, May 29, 2016 at 2:07 AM, Nir Soffer <nsoffer at redhat.com> wrote:
> >> On Thu, May 26, 2016 at 11:08 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> >>> Hi all,
> >>>
> >>> We had 2 issues causing vdsm check-patch and check-merge jobs to get stuck.
> >>>
> >>> I fixed the one that caused most trouble:
> >>> https://gerrit.ovirt.org/57993
> >>>
> >>> The other issue may be related to ioprocess, I fixed a related issue:
> >>> https://gerrit.ovirt.org/57473
> >>>
> >>> But I have seen stuck jobs after this change, so the issue may not
> >>> be fixed yet.
> >>>
> >>> If you see a stuck vdsm job - job that run more than 15 minutes, please
> >>> get me a backtrace:
> >>>
> >>> 1. locate the test_runner process pid:
> >>>
> >>>     $ ps aux | grep testrunner.py | grep -v grep
> >>>     nsoffer  26297 82.6  0.9 389592 111144 pts/3   R+   22:52   0:02
> >>> /usr/bin/python ../tests/testrunner.py ...
> >>>
> >>> 2. save a backtrace:
> >>>
> >>>     gdb attach 26297 --batch -ex "thread apply all py-bt" > py-bt.out
> >>
> >> This requires the python-debuginfo package, typically installed using:
> >>
> >>     dnf debuginfo-install python
> >>
> >> I sent this patch, detecting stuck vdsm tests, printing a backtrace, and killing
> >> the stuck process:
> >> https://gerrit.ovirt.org/58212
> >>
> >> It works, but we don't get a backtrace, since python-debuginfo is not installed
> >> although I require it - probably we need to add the fedora-debug repository
> >> to check-patch.repos. I tried to use the urls from /etc/yum.repos.d/fedora.repo,
> >> but none of them work.
> >>
> >> I will need help from infra to get it working.
> 
> I sent also this patch, that should fix the issue on jenkins, but I
> cannot test it on jenkins:
> https://gerrit.ovirt.org/58213

Instead of forcing adding the repo for all the projects, you should use the
*repos files that vdsm has in the automation directory to add there any extra
repos that you want when running/installing

> 
> Nir

-- 
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dcaro at redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web: www.redhat.com
RHT Global #: 82-62605
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20160530/54d2f0bb/attachment.sig>


More information about the Infra mailing list