On Tue, Mar 16, 2021 at 10:47 PM Greg King <greg.king@oracle.com> wrote:

I am new to vdsm and trying to understand the architecture/internals much better

Welcome to vdsm Greg!

The ovirt documentation for architecture I have found so far seems to be relatively high level

And it is mostly outdated, but we don't have anything better.

My effort to understand the architecture by walking through the vdsm code using pdb/rpdb is slow and probably not all that efficient

Does anyone have pointers to documentation that might explain the vdsm modules, classes and internals a little more in depth?

I don't think we have more detailed documentation, but there are lot of

talks and slide decks that give more info on specific topics, and are usually

are more updated:

https://www.ovirt.org/community/archived_conferences_presentations.html

There is also lot of content on youtube, here some example that I could

find easily:

- [oVirt 3.6 deep dive] - live storage migration between mixed domains

https://www.youtube.com/watch?v=BPy29Q__VV4

- oVirt 4.1 deep dive - VM leases

https://www.youtube.com/watch?v=MVa-4fQo2V8

- Back to the future – incremental backup in oVirt

https://www.youtube.com/watch?v=X-xHD9ddN6s

- oVirt 4k - teaching an old dog new tricks

https://www.youtube.com/watch?v=Q1VQxjYEzDY

I’d also like to understand where I might be able to add rpdb.set_trace() so I can step through functions being called in libvirt.py

I don't think using a debugger is very helpful with vdsm, since vdsm is not

designed for stopping a thread for unlimited time. In some cases the system

will log warning and traceback every 60 seconds about blocked worker.

In other cases monitoring code may fail to update stats, which may cause

engines to deactivate a host or migrate vms or other trouble.

The best way to debug and understand vdsm is to follow the logs, and add

move logs when needed. The main advantage compared with a debugger is

that the time spent with the logs will pay back when you have to debug real

issues in user setup, when logs are the only available resource.

Having said that, being able to follow the entire flow by printing a traceback

is a great way to understand how the system works.

You can use vdsm.common.concurrent.format_traceback:

https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/common/concurrent.py#L367

To print traceback at interesting points. For tracing function from the libvirt

python binginding, you can modify libvirtconnection.py:

https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/common/libvirtconnection.py#L127

This module creates a connection, and wraps libvirt.virDomain with a wrapper

that panics on fatal errors. You can modify the wrapper to log a traceback

for all or some of libvirt.virDomain functions.

Another option it to modify the virDomain wrapper to log a traceback:

https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/virt/virdomain.py#L82

For example here:

https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f42b/lib/vdsm/virt/virdomain.py#L99

Good luck with your vdsm ride!

Nir