----- Original Message -----
From: "Dan Kenigsberg" <danken(a)redhat.com>
To: "Francesco Romani" <fromani(a)redhat.com>
Cc: "vdsm-devel" <vdsm-devel(a)lists.fedorahosted.org>, devel(a)ovirt.org
Sent: Tuesday, April 8, 2014 6:57:15 PM
Subject: Re: cleaning statistics retrieval
On Tue, Apr 08, 2014 at 06:52:50AM -0400, Francesco Romani wrote:
> Hello VDSM developers,
Please use devel@ovirt, and mention "vdsm" at the subject. This thread
in particular involves Engine as well.
Right, I forgot. Sorry about that.
> I'd like to discuss and explain the plans for cleaning up
Vm.getStats()
> in vdsm/virt/vm.py, and how it affects a bug we have:
>
https://bugzilla.redhat.com/show_bug.cgi?id=1073478
>
> Let's start from the bug.
>
> To make a long story short, there is a (small) race in VDSM, probably
> introduced by commit
> 8fedf8bde3c28edb07add23c3e9b72681cb48e49
> The race can actually be triggered only if the VM is shutting down, so it
> is not that bad.
>
> Fixing the reported issue in the specific case can be done with a trivial
> if, and that it what I did
> in my initial
http://gerrit.ovirt.org/#/c/25803/1/vdsm/vm.py,cm
Could you explain how an AttributeError there moved the VM to Down?
This should actually be this bug of engine
https://bugzilla.redhat.com/show_bug.cgi?id=1072282
if GetVmStats fails for whatever reason, engine thinks the VM is down.
> And this is the core of the issue.
> My initial idea is to either return success and a complete, well formed
> statistics set, or return an error.
> However current engine seems to not cope properly with this, and we cannot
> break backward compatibility.
Would you be more precise? If getAllVmStats returns an errCode for one
of the Vms, what happens?
Of course.
For GetAllVmStats, AFAIK, but please correct me if I am wrong, because I'm not really
expert on engine side,
engine simply does not expects anything different from a list
of either a RunningVmStats or an ExitedVmStats.
So not sure (will verify just after this mail) if engine can cope with mixed content,
meaning [ stats, errCode, stats, stats ... ]
For GetVmStats, like Michal said, the engine expects the call to succeed otherwise
it puts the VM into the Unknown state.
>
> Looks like the only way to go is to always return success and to add a
> field to describe the content of the
> statistics (partial, complete...). THis is admittedly a far cry from the
> ideal solution, but it is dictated
> by the need to preserve the compatibility with current/old engines.
I don't think that I understand your suggestion, but it does not sound
right to send a partial dictionary and to "appologize" about its
paritiality elsewhere. The dictionary may be "partial" for engine-4.0
yet "complete" for engine-3.5. It's not for Vdsm to grade its own
output.
I see your point (that's one of the reasons I'm not happy about this solution).
Please see below for the detauls.
> please note that I'm not really happy about this solution,
but, given the
> constraint, I don't see better alternatives.
Please explain the benefits of describing the partial content, as I do
not see them.
The root issue here is the getStats must always succeed, because the engine doesn't
expect otherwise and thus we guarantee this to cope with old engines;
but inside VDSM, getStats can actually fail in a lot of places
(like in this case getBalloonInfo)
So, in VDSM we can end up with a partial response, and now the question
is: what to do with this partial response? And if there are optional fields
in the response, how can the engine distinguish between
* full response, but with an optional field missing
and
* partial response (because of an exception inside VDSM),
without some field, incidentally including an optional one
?
The VDSM 'grading' was an hint from VDSM to help engine to distinguish
between those cases.
Even if we agree this grading idea is bad, the core issue remains open:
what to do if we end up with a partial response?
For example, let's say we handle the getBalloonInfo exception
(
http://gerrit.ovirt.org/#/c/26539/),
the stats object to be returned will lack
* the (mandatory, expected) balloon stats
* the (optional) migration progress - ok, bad example because this makes sense only during
migrations,
but other optional fields may be added later and the issue remains
Again, anyone feel free to correct me if I misunderstood something about engine
(or VDSM <=> engine communication) and to suggest better alternatives :\
Thanks and bests,
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani