On Tue, Mar 16, 2021 at 10:47 PM Greg King <greg.king@oracle.com> wrote:

I am new to vdsm and trying to understand the architecture/internals much better


Welcome to vdsm Greg!

The ovirt documentation for architecture I have found so far seems to be relatively high level


And it is mostly outdated, but we don't have anything better.
 

My effort to understand the architecture by walking through the vdsm code using pdb/rpdb is slow and probably not all that efficient

 

Does anyone have pointers to documentation that might explain the vdsm modules, classes and internals a little more in depth?


I don't think we have more detailed documentation, but there are lot of
talks and slide decks that give more info on specific topics, and are usually
are more updated:
https://www.ovirt.org/community/archived_conferences_presentations.html

There is also lot of content on youtube, here some example that I could
find easily:
- [oVirt 3.6 deep dive] - live storage migration between mixed domains
  https://www.youtube.com/watch?v=BPy29Q__VV4
- oVirt 4.1 deep dive - VM leases
  https://www.youtube.com/watch?v=MVa-4fQo2V8
- Back to the future – incremental backup in oVirt
  https://www.youtube.com/watch?v=X-xHD9ddN6s
- oVirt 4k - teaching an old dog new tricks
  https://www.youtube.com/watch?v=Q1VQxjYEzDY

 

I’d also like to understand where I might be able to add rpdb.set_trace() so I can step through functions being called in libvirt.py


I don't think using a debugger is very helpful with vdsm, since vdsm is not
designed for stopping a thread for unlimited time. In some cases the system
will log warning and traceback every 60 seconds about blocked worker.
In other cases monitoring code may fail to update stats, which may cause
engines to deactivate a host or migrate vms or other trouble.

The best way to debug and understand vdsm is to follow the logs, and add
move logs when needed. The main advantage compared with a debugger is
that the time spent with the logs will pay back when you have to debug real
issues in user setup, when logs are the only available resource.

Having said that, being able to follow the entire flow by printing a traceback
is a great way to understand how the system works.

You can use vdsm.common.concurrent.format_traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/common/concurrent.py#L367

To print traceback at interesting points. For tracing function from the libvirt
python binginding, you can modify libvirtconnection.py:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/common/libvirtconnection.py#L127

This module creates a connection, and wraps libvirt.virDomain with a wrapper
that panics on fatal errors. You can modify the wrapper to log a traceback
for all or some of libvirt.virDomain functions.

Another option it to modify the virDomain wrapper to log a traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/virt/virdomain.py#L82

For example here:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/virt/virdomain.py#L99

Good luck with your vdsm ride!

Nir