Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)

1 May 2014

      On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:
...
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:
...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:
...
hey,
somehow we missed the summary of this call, and few "big" issues
were raised there. so i would like to share it with all and hear
more comments
- task id in http header - allows engine to initiate calls with id
instead of following vdsm response - federico already started this
work, and this is mandatory for live merge feature afaiu.
Adam, Federico, may I revisit this question from another angle?
Why does Vdsm needs to know live-merge's task id?
As far as I understand, (vmid, disk id) are enough to identify a live
merge process.
A vmId + diskId can uniquely identify a block job at a single moment
in time since qemu guarantees that only a single block job can run at
any given point in time.  But this gives us no way to differentiate
two sequential jobs that run on the same disk.  Therefore, without
having an engine-supplied jobID, we can never be sure if a one job
finished and another started since the last time we polled stats.
Why would Engine ever want to initiate a new live merge of a
(vmId,diskId) before it has a conclusive result of the previous
success/failure of the previous attempt? As far as I understand, this
should never happen, and it's actually good for the API to force
avoidence of such a case.
...
Additionally, engine-supplied UUIDs is part of a developing framework
for next-generation async tasks.  Engine prefers to use a single
identifier to represent any kind of task (rather than some problem
domain specific combination of UUIDs).  Adhering to this rule will
help us to converge on a single implementation of ng async tasks
moving forward.
I do not think that having a (virtual) table of
    task_id -> vmId,diskId
in Vdsm is much simpler than having it on the Engine machine.

I still find the nothion of a new framework for async tasks quite
useful. But as I requested before, I think we should design it first,
so it fits all conceivable users. In particular, if we should not tie it
to the existence of a running VM. We'd better settle on persistence
semantics that works for everybody (such as network tasks).

Last time, the idea was struck down by Saggi and others from infra, who
are afraid to repeat mistakes from the current task framework.
...
...
If we do not have a task id, we do not need to worry on how to pass it,
and where to persist it.
There are at least 3 reasons to persist a block job ID:
* To associate a specific block job operation with a specific
 engine-initiated flow.
* So that you can clean up after a job that completed when vdsm could
 not receive the completion event.
But if Vdsm dies before it managed to clean up, Engine would have to
perform the cleanup via another host. So having this short-loop cleanup
is redundant.
...
* Since we must ask libvirt about block job events on a per VM, per
 disk basis, tracking the devices on which we expect block jobs
 enables us to eliminate wasteful calls to libvirt.
This can be done by an in-memory cache.
...
Hope this makes the rationale a bit clearer...
Yes, but I am not yet convinced...