Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

9 Jul 2014

      ----- Original Message -----
...
From: "Nir Soffer" <nsoffer@redhat.com>
To: "Francesco Romani" <fromani@redhat.com>
Cc: devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com>, "Adam
Litke" <alitke@redhat.com>
Sent: Monday, July 7, 2014 4:53:29 PM
Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of	stuck calls
...
...
...
This will also allow task to change the sampling policy depending on the
results.
If some calls always fails, maybe we can run it less often.
This looks a feature worth having
But lets take a note and forget about this - we should focus on simple
solution
Agreed. Stuff for future work.
...
...
Yes. In all the reported cases it was directly or indirectly storage being
unavailable.
Directly: libvirt does some access (stat()) for example and gets stuck.
NFS soft mount doesn't always solve this.
Indirectly: qemu stuck in D state because it was doing I/O and the storage
disapperead
underneath its foots.
When this happens - does all calls related to specific vm can block,
or it is always the dangerous calls you listed bellow?
As Michal said, the danger come mostly form the listed calls and in general
to storage related calls. To makes things even more complex:

https://www.redhat.com/archives/libvir-list/2014-July/msg00478.html

Relevant quote for the lazy people  (:))
...
If there is a prior call to libvirt that involves that guest domain
which has blocked on storage, then this can prevent subsequent calls
from completely since the prior call may hold a lock.
However this is less or more what we look for - not spam libvirt with related call
if one get stuck, so it seems is not that bad for us, I believe.
...
...
...
If one vm may stoop responding, causing all libvirt calls for this vm to
block, then a thread pool with one connection per worker thread can lead
to a failure when all connection happen to run a request that blocks
the thread. If this is the case, then each task related to one vm must
depend on other tasks and should not be skipped until the previous task
returned, simulating the current threading model without creating 100's
of threads.
Agreed, we should introduce this concept and this is lacking in my
threadpool
proposal.
So basically the current threading model is the behavior we want?
If some call get stuck, stop sampling this vm. Continue when the
call returns.
Michal? Federico?
Yep - but with less threads, and surely with a constant number of them.
Your schedule library (review in my queue at very high priority) is indeed
a nice step in this direcation.

Waiting for Federico's ack.
...
...
...
You can block access to storage using iptables, which may cause the block
related calls to stuck, and try to close a connection after few seconds
from
another thread.
Sure, is near to the top of my TODO list.
Lets wait with this. If we can continue to send other request on the same
connection while one call never return, I don't see a need to close it.
Work in progress...
...
...
- block-based devices: libvirt enters the QEMU monitor to ask for the last
block extent
This is our only usage.
Yep, but a very important one indeed
...
...
* _sampleCpu uses virDomainGetCPUStats, which uses cgroups. Should be safe.
Are should be safe really safe? can you check with libvirt developers
about that?
Done: https://www.redhat.com/archives/libvir-list/2014-July/msg00477.html

Waiting for (more) answers.
...
...
* _sampleNet uses virDomainInterfaceStats, which uses /proc/net/dev. Should
be safe.
* _sampleBalloon uses virDomainGetInfo, which uses /proc, but also needs to
enter
  the domain monitor.
And the domain monitor may be stuck when storage is not available?
Just to be clear, here I mean the QEMU domain (aka VM) monitor.
This is a very good question. I think yes, it can get stuck, but I don't have
hard evidence here, yet.
...
...
* storage/VM health (prevent VM misbehaving on storage exausted):
_highWrite
* storage status: _updateVolume, _sampleDisk, _sampleDiskLatency.
We can optimize the later two, looks wasteful to ask for data, discard
some,
then ask again for the missing data. Need to check carefully if the API
allows to do both in one go, but looks feasible.
* balloon stats: _sampleBalloon: needs to enter the QEMU monitor
* everything else: does not touch disk nor qemu, so no special treatment
here.
If we separate storage calls so vm sampling continue even if storage
cause qemu to block - do we gain anything? I guess this vm is
not useful if its storage is not available, and will soon will
pause anyway.
Looks like the simplest solution is the best and fancy separation
of sampling calls is needed.
You are right about a VM being useless without storage.
Probably separation is not the solution here, however we must consider some kind of isolation.

If a VM gets stuck on storage (D-state etc. etc.) for whatever reason, it's fine
to block all the other samplings of that VM, however the other VM should continue
to sample until they block individually.

We shouldn't assume if the storage is not-responsive for a VM, than it will be for all
the other VMs on an host.

Please note that I'm *not* implying we are assuming the above, it is just to highlight
a requirement for sampling in general and to preserve the current (good) behaviour specifically.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani