[ovirt-devel] contEIOVMs regression?

Adam Litke alitke at redhat.com
Fri Nov 21 14:46:13 UTC 2014


On 20/11/14 17:37 -0500, Nir Soffer wrote:
>----- Original Message -----
>> From: "Adam Litke" <alitke at redhat.com>
>> To: devel at ovirt.org
>> Cc: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani" <fromani at redhat.com>, "Federico Simoncelli"
>> <fsimonce at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
>> Sent: Thursday, November 20, 2014 9:15:33 PM
>> Subject: contEIOVMs regression?
>>
>> Hi list,
>>
>> I am taking a look at Bug 1157421 [1] which describes a situation
>> where VM's that are paused with an -EIO error are not automatically
>> resumed after the problem with storage has been corrected.  I have
>> some patches [2] on gerrit that resolve the problem.  Since this
>> appears to be a regression I am looking at a non-intrusive way to fix
>> it in the 3.5 branch.  There is some disagreement on the proper way to
>> fix this so I am hoping we can arrive at a solution through an open
>> discussion.
>>
>> The main issue at hand is with the Event/Callback mechanism we use to
>> call clientIF.contEIOVMs.  According to my experiments and this online
>> discussion [3] weakref does not work for instance methods such as
>> clientIF.contEIOVMs.  Our Event class uses weakref to prevent it from
>> holding references to registered callback functions.
>
>Why making event system more correct is required to tix [1]?
>
>>
>> I see two easy ways to fix the regression:
>
>I don't follow, what is the regression?

Assuming that at some point contEIOVMs actually worked and was able to
automatically resume VMs, then we have a regression because given the
weakref problems I am describing herein there is no way that it is
working now.  The only way we don't have a regression is if this code
has never worked to begin with.

>> 1) Treat clientIF as a singleton class (which it is) and make
>> contEIOVMs a module-level method which gets the clientIF instance
>> and calls it's bound contEIOVMs method.  See my patches [2] for the
>> code behind this idea.
>
>This is the wrong direction. There is only one place using that horrible
>getInstance(), and it also could just create the single instance that we
>need. We should remove getInstance() instead of using it in new code.
>
>>
>> 2) Allow Event to maintain a strong reference on the bound
>> clientIF.contEIOVMs method.  This will allow the current code to work
>> as designed but will change the Event implementation to accomodate
>> this specific use case.  Since no one else appears to be using this
>> code, it should have no functional impact.
>
>The code is already holding a strong reference now, no change is
>needed :-)

I disagree.  From vdsm/storage/misc.py:
class Event(object):
...
    def register(self, func, oneshot=False):
        with self._syncRoot:
            self._registrar[id(func)] = (weakref.ref(func), oneshot)
...
            # ^^^ "He's dead Jim"

The function is converted into a weak reference.  Since, in this case,
the function is an instance method, the reference is immediately dead
on arrival.  I have verified this with debugging statements in my
environment.

>> Are there any other ideas I'm missing?  I am aware of plans to
>> refactor this code for 3.6 but I am more interested in a short-term,
>> practical solution to address the current regression.
>
>For the short term - fixing [1] - http://gerrit.ovirt.org/35369
>should be enough, registering self.contEIOVMs.
>
>If you want to make the code more correct, maybe use the same
>method used by StoragePool._upgradeCallback?

Thanks, this looks like it might work.

-- 
Adam Litke



More information about the Devel mailing list