[ovirt-devel] [VDSM] cleaning statistics retrieval
Francesco Romani
fromani at redhat.com
Wed Apr 9 12:40:24 UTC 2014
----- Original Message -----
> From: "Dan Kenigsberg" <danken at redhat.com>
> To: "Francesco Romani" <fromani at redhat.com>
> Cc: "vdsm-devel" <vdsm-devel at lists.fedorahosted.org>, devel at ovirt.org
> Sent: Tuesday, April 8, 2014 6:57:15 PM
> Subject: Re: cleaning statistics retrieval
>
> On Tue, Apr 08, 2014 at 06:52:50AM -0400, Francesco Romani wrote:
> > Hello VDSM developers,
>
> Please use devel at ovirt, and mention "vdsm" at the subject. This thread
> in particular involves Engine as well.
Right, I forgot. Sorry about that.
> > I'd like to discuss and explain the plans for cleaning up Vm.getStats()
> > in vdsm/virt/vm.py, and how it affects a bug we have:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1073478
> >
> > Let's start from the bug.
> >
> > To make a long story short, there is a (small) race in VDSM, probably
> > introduced by commit
> > 8fedf8bde3c28edb07add23c3e9b72681cb48e49
> > The race can actually be triggered only if the VM is shutting down, so it
> > is not that bad.
> >
> > Fixing the reported issue in the specific case can be done with a trivial
> > if, and that it what I did
> > in my initial http://gerrit.ovirt.org/#/c/25803/1/vdsm/vm.py,cm
>
> Could you explain how an AttributeError there moved the VM to Down?
This should actually be this bug of engine https://bugzilla.redhat.com/show_bug.cgi?id=1072282
if GetVmStats fails for whatever reason, engine thinks the VM is down.
> > And this is the core of the issue.
> > My initial idea is to either return success and a complete, well formed
> > statistics set, or return an error.
> > However current engine seems to not cope properly with this, and we cannot
> > break backward compatibility.
> Would you be more precise? If getAllVmStats returns an errCode for one
> of the Vms, what happens?
Of course.
For GetAllVmStats, AFAIK, but please correct me if I am wrong, because I'm not really expert on engine side,
engine simply does not expects anything different from a list
of either a RunningVmStats or an ExitedVmStats.
So not sure (will verify just after this mail) if engine can cope with mixed content,
meaning [ stats, errCode, stats, stats ... ]
For GetVmStats, like Michal said, the engine expects the call to succeed otherwise
it puts the VM into the Unknown state.
>
> >
> > Looks like the only way to go is to always return success and to add a
> > field to describe the content of the
> > statistics (partial, complete...). THis is admittedly a far cry from the
> > ideal solution, but it is dictated
> > by the need to preserve the compatibility with current/old engines.
>
> I don't think that I understand your suggestion, but it does not sound
> right to send a partial dictionary and to "appologize" about its
> paritiality elsewhere. The dictionary may be "partial" for engine-4.0
> yet "complete" for engine-3.5. It's not for Vdsm to grade its own
> output.
I see your point (that's one of the reasons I'm not happy about this solution).
Please see below for the detauls.
> > please note that I'm not really happy about this solution, but, given the
> > constraint, I don't see better alternatives.
>
> Please explain the benefits of describing the partial content, as I do
> not see them.
The root issue here is the getStats must always succeed, because the engine doesn't
expect otherwise and thus we guarantee this to cope with old engines;
but inside VDSM, getStats can actually fail in a lot of places
(like in this case getBalloonInfo)
So, in VDSM we can end up with a partial response, and now the question
is: what to do with this partial response? And if there are optional fields
in the response, how can the engine distinguish between
* full response, but with an optional field missing
and
* partial response (because of an exception inside VDSM),
without some field, incidentally including an optional one
?
The VDSM 'grading' was an hint from VDSM to help engine to distinguish
between those cases.
Even if we agree this grading idea is bad, the core issue remains open:
what to do if we end up with a partial response?
For example, let's say we handle the getBalloonInfo exception (http://gerrit.ovirt.org/#/c/26539/),
the stats object to be returned will lack
* the (mandatory, expected) balloon stats
* the (optional) migration progress - ok, bad example because this makes sense only during migrations,
but other optional fields may be added later and the issue remains
Again, anyone feel free to correct me if I misunderstood something about engine
(or VDSM <=> engine communication) and to suggest better alternatives :\
Thanks and bests,
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
More information about the Devel
mailing list