[ovirt-devel] [VDSM] cleaning statistics retrieval

Francesco Romani fromani at redhat.com
Wed Apr 9 12:40:24 UTC 2014


----- Original Message -----
> From: "Dan Kenigsberg" <danken at redhat.com>
> To: "Francesco Romani" <fromani at redhat.com>
> Cc: "vdsm-devel" <vdsm-devel at lists.fedorahosted.org>, devel at ovirt.org
> Sent: Tuesday, April 8, 2014 6:57:15 PM
> Subject: Re: cleaning statistics retrieval
> 
> On Tue, Apr 08, 2014 at 06:52:50AM -0400, Francesco Romani wrote:
> > Hello VDSM developers,
> 
> Please use devel at ovirt, and mention "vdsm" at the subject. This thread
> in particular involves Engine as well.

Right, I forgot. Sorry about that.

> > I'd like to discuss and explain the plans for cleaning up Vm.getStats()
> > in vdsm/virt/vm.py, and how it affects a bug we have:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1073478
> > 
> > Let's start from the bug.
> > 
> > To make a long story short, there is a (small) race in VDSM, probably
> > introduced by commit
> > 8fedf8bde3c28edb07add23c3e9b72681cb48e49
> > The race can actually be triggered only if the VM is shutting down, so it
> > is not that bad.
> > 
> > Fixing the reported issue in the specific case can be done with a trivial
> > if, and that it what I did
> > in my initial http://gerrit.ovirt.org/#/c/25803/1/vdsm/vm.py,cm
> 
> Could you explain how an AttributeError there moved the VM to Down?

This should actually be this bug of engine https://bugzilla.redhat.com/show_bug.cgi?id=1072282
if GetVmStats fails for whatever reason, engine thinks the VM is down.

> > And this is the core of the issue.
> > My initial idea is to either return success and a complete, well formed
> > statistics set, or return an error.
> > However current engine seems to not cope properly with this, and we cannot
> > break backward compatibility.
> Would you be more precise? If getAllVmStats returns an errCode for one
> of the Vms, what happens?

Of course.

For GetAllVmStats, AFAIK, but please correct me if I am wrong, because I'm not really expert on engine side,
engine simply does not expects anything different from a list
of either a RunningVmStats or an ExitedVmStats.

So not sure (will verify just after this mail) if engine can cope with mixed content,
meaning [ stats, errCode, stats, stats ... ]

For GetVmStats, like Michal said, the engine expects the call to succeed otherwise
it puts the VM into the Unknown state.

> 
> > 
> > Looks like the only way to go is to always return success and to add a
> > field to describe the content of the
> > statistics (partial, complete...). THis is admittedly a far cry from the
> > ideal solution, but it is dictated
> > by the need to preserve the compatibility with current/old engines.
> 
> I don't think that I understand your suggestion, but it does not sound
> right to send a partial dictionary and to "appologize" about its
> paritiality elsewhere. The dictionary may be "partial" for engine-4.0
> yet "complete" for engine-3.5. It's not for Vdsm to grade its own
> output.

I see your point (that's one of the reasons I'm not happy about this solution).
Please see below for the detauls.

> > please note that I'm not really happy about this solution, but, given the
> > constraint, I don't see better alternatives.
> 
> Please explain the benefits of describing the partial content, as I do
> not see them.

The root issue here is the getStats must always succeed, because the engine doesn't
expect otherwise and thus we guarantee this to cope with old engines;
but inside VDSM, getStats can actually fail in a lot of places
(like in this case getBalloonInfo)

So, in VDSM we can end up with a partial response, and now the question
is: what to do with this partial response? And if there are optional fields
in the response, how can the engine distinguish between

* full response, but with an optional field missing

and

* partial response (because of an exception inside VDSM),
  without some field, incidentally including an optional one

?

The VDSM 'grading' was an hint from VDSM to help engine to distinguish
between those cases.

Even if we agree this grading idea is bad, the core issue remains open:
what to do if we end up with a partial response?
For example, let's say we handle the getBalloonInfo exception (http://gerrit.ovirt.org/#/c/26539/),
the stats object to be returned will lack

* the (mandatory, expected) balloon stats
* the (optional) migration progress - ok, bad example because this makes sense only during migrations,
  but other optional fields may be added later and the issue remains

Again, anyone feel free to correct me if I misunderstood something about engine
(or VDSM <=> engine communication) and to suggest better alternatives :\

Thanks and bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani



More information about the Devel mailing list