On Tue, Mar 16, 2021 at 10:47 PM Greg King <greg.king(a)oracle.com> wrote:
I am new to vdsm and trying to understand the architecture/internals
much
better
Welcome to vdsm Greg!
The ovirt documentation for architecture I have found so far seems to be
relatively high level
And it is mostly outdated, but we don't have anything better.
My effort to understand the architecture by walking through the vdsm
code
using pdb/rpdb is slow and probably not all that efficient
Does anyone have pointers to documentation that might explain the vdsm
modules, classes and internals a little more in depth?
I don't think we have more detailed documentation, but there are lot of
talks and slide decks that give more info on specific topics, and are
usually
are more updated:
https://www.ovirt.org/community/archived_conferences_presentations.html
There is also lot of content on youtube, here some example that I could
find easily:
- [oVirt 3.6 deep dive] - live storage migration between mixed domains
https://www.youtube.com/watch?v=BPy29Q__VV4
- oVirt 4.1 deep dive - VM leases
https://www.youtube.com/watch?v=MVa-4fQo2V8
- Back to the future – incremental backup in oVirt
https://www.youtube.com/watch?v=X-xHD9ddN6s
- oVirt 4k - teaching an old dog new tricks
https://www.youtube.com/watch?v=Q1VQxjYEzDY
I’d also like to understand where I might be able to add rpdb.set_trace()
so I can step through functions being called in libvirt.py
I don't think using a debugger is very helpful with vdsm, since vdsm is not
designed for stopping a thread for unlimited time. In some cases the system
will log warning and traceback every 60 seconds about blocked worker.
In other cases monitoring code may fail to update stats, which may cause
engines to deactivate a host or migrate vms or other trouble.
The best way to debug and understand vdsm is to follow the logs, and add
move logs when needed. The main advantage compared with a debugger is
that the time spent with the logs will pay back when you have to debug real
issues in user setup, when logs are the only available resource.
Having said that, being able to follow the entire flow by printing a
traceback
is a great way to understand how the system works.
You can use vdsm.common.concurrent.format_traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
To print traceback at interesting points. For tracing function from the
libvirt
python binginding, you can modify libvirtconnection.py:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
This module creates a connection, and wraps libvirt.virDomain with a wrapper
that panics on fatal errors. You can modify the wrapper to log a traceback
for all or some of libvirt.virDomain functions.
Another option it to modify the virDomain wrapper to log a traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
For example here:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
Good luck with your vdsm ride!
Nir