[ovirt-devel] contEIOVMs regression?

Nir Soffer nsoffer at redhat.com
Fri Nov 21 18:48:16 UTC 2014


----- Original Message -----
> From: "Adam Litke" <alitke at redhat.com>
> To: "Nir Soffer" <nsoffer at redhat.com>
> Cc: devel at ovirt.org, "Francesco Romani" <fromani at redhat.com>, "Federico Simoncelli" <fsimonce at redhat.com>, "Dan
> Kenigsberg" <danken at redhat.com>
> Sent: Friday, November 21, 2014 5:28:07 PM
> Subject: Re: contEIOVMs regression?
> 
> On 21/11/14 10:03 -0500, Nir Soffer wrote:
> >----- Original Message -----
> >> From: "Adam Litke" <alitke at redhat.com>
> >> To: "Nir Soffer" <nsoffer at redhat.com>
> >> Cc: devel at ovirt.org, "Francesco Romani" <fromani at redhat.com>, "Federico
> >> Simoncelli" <fsimonce at redhat.com>, "Dan
> >> Kenigsberg" <danken at redhat.com>
> >> Sent: Friday, November 21, 2014 4:46:13 PM
> >> Subject: Re: contEIOVMs regression?
> >>
> >> On 20/11/14 17:37 -0500, Nir Soffer wrote:
> >> >----- Original Message -----
> >> >> From: "Adam Litke" <alitke at redhat.com>
> >> >> To: devel at ovirt.org
> >> >> Cc: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani"
> >> >> <fromani at redhat.com>, "Federico Simoncelli"
> >> >> <fsimonce at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
> >> >> Sent: Thursday, November 20, 2014 9:15:33 PM
> >> >> Subject: contEIOVMs regression?
> >> >>
> >> >> Hi list,
> >> >>
> >> >> I am taking a look at Bug 1157421 [1] which describes a situation
> >> >> where VM's that are paused with an -EIO error are not automatically
> >> >> resumed after the problem with storage has been corrected.  I have
> >> >> some patches [2] on gerrit that resolve the problem.  Since this
> >> >> appears to be a regression I am looking at a non-intrusive way to fix
> >> >> it in the 3.5 branch.  There is some disagreement on the proper way to
> >> >> fix this so I am hoping we can arrive at a solution through an open
> >> >> discussion.
> >> >>
> >> >> The main issue at hand is with the Event/Callback mechanism we use to
> >> >> call clientIF.contEIOVMs.  According to my experiments and this online
> >> >> discussion [3] weakref does not work for instance methods such as
> >> >> clientIF.contEIOVMs.  Our Event class uses weakref to prevent it from
> >> >> holding references to registered callback functions.
> >> >
> >> >Why making event system more correct is required to tix [1]?
> >> >
> >> >>
> >> >> I see two easy ways to fix the regression:
> >> >
> >> >I don't follow, what is the regression?
> >>
> >> Assuming that at some point contEIOVMs actually worked and was able to
> >> automatically resume VMs, then we have a regression because given the
> >> weakref problems I am describing herein there is no way that it is
> >> working now.  The only way we don't have a regression is if this code
> >> has never worked to begin with.
> >
> >The current code is master do work - when I fixed this last time, the
> >problem
> >was that we did not register the callback before starting the monitors, and
> >that the monitors did not issue a state change on the first time a monitor
> >check the domain state.
> >
> >I verified that contEIOVMs is called and that it does try to continue vms.
> 
> Very curious.  I am working with 3.5.0.  The main difference (other
> than branch) is that I am working in an environment with no connected
> storage pool.  Though I still can't see how the weakref stuff could be
> working in master.
> 
> >If this does not break now (with current code), please open a bug.
> >
> >>
> >> >> 1) Treat clientIF as a singleton class (which it is) and make
> >> >> contEIOVMs a module-level method which gets the clientIF instance
> >> >> and calls it's bound contEIOVMs method.  See my patches [2] for the
> >> >> code behind this idea.
> >> >
> >> >This is the wrong direction. There is only one place using that horrible
> >> >getInstance(), and it also could just create the single instance that we
> >> >need. We should remove getInstance() instead of using it in new code.
> >> >
> >> >>
> >> >> 2) Allow Event to maintain a strong reference on the bound
> >> >> clientIF.contEIOVMs method.  This will allow the current code to work
> >> >> as designed but will change the Event implementation to accomodate
> >> >> this specific use case.  Since no one else appears to be using this
> >> >> code, it should have no functional impact.
> >> >
> >> >The code is already holding a strong reference now, no change is
> >> >needed :-)
> >>
> >> I disagree.  From vdsm/storage/misc.py:
> >> class Event(object):
> >> ...
> >>     def register(self, func, oneshot=False):
> >>         with self._syncRoot:
> >>             self._registrar[id(func)] = (weakref.ref(func), oneshot)
> >> ...
> >>             # ^^^ "He's dead Jim"
> >>
> >> The function is converted into a weak reference.  Since, in this case,
> >> the function is an instance method, the reference is immediately dead
> >> on arrival.  I have verified this with debugging statements in my
> >> environment.
> >
> >So you suggest that taking a weakref to an instance method returns
> >a dead reference?
> >
> >I thought that the problem is instance method keep hard reference to
> >the instance, so the weakref is useless.
> 
> Yeah, try out this test program to see what I mean:
> 
> 
> #!/usr/bin/env python
> 
> import weakref
> from functools import partial
> 
> class A(object):
>     def __init__(self):
>         self.r1 = weakref.ref(self.a)
>         self.r2 = partial(A.a, weakref.proxy(self))
> 
> 
>     def a(self):
>         print "Hello from a"
> 
> def main():
>     obj = A()
>     print obj.r1
>     obj.r2()
> 
> if __name__ == '__main__':
>     main()

I tested using this program, which simulate the current code in master:

# weakref_test.py
#!/usr/bin/env python                                                                                                                                                           

import weakref
from functools import partial

store = set()
callbacks = {}

class A(object):
    def __init__(self):
        store.add(self.callback)
    def callback(self):
        print "Hello from callback"

def main():
    obj = A() 
    for cb in store:
        callbacks[id(cb)] = weakref.ref(cb)
    print callbacks
    for ref in callbacks.values():
        cb = ref()
        cb()

if __name__ == '__main__':
    main()

And it works:

$ python weakref_test.py 
{139780423201568: <weakref at 0x7f21241b5100; to 'instancemethod' at 0x7f212a78b320 (callback)>}
Hello from callback

So your patch http://gerrit.ovirt.org/35436 is not required for current code.

But if you try to register directly in init, without having a reference to the bound method
in the callbacks set, it breaks:

#!/usr/bin/env python

import weakref
from functools import partial

callbacks = {}

class A(object):                                                                                                                                                                
    def __init__(self):
        callbacks[id(self.callback)] = weakref.ref(self.callback)
    def callback(self):
        print "Hello from callback"

def main():
    obj = A() 
    print callbacks
    for ref in callbacks.values():
        cb = ref()
        cb()

if __name__ == '__main__':
    main()

# weakref_test2.py
$ python weakref_test2.py 
{139707674977056: <weakref at 0x7f1033f98100; dead>}
Traceback (most recent call last):
  File "weakref_test2.py", line 22, in <module>
    main()
  File "weakref_test2.py", line 19, in main
    cb()
TypeError: 'NoneType' object is not callable


I suggest to change the commit message to explain that it is needed for the next
patch and is not a fix.

Nir



More information about the Devel mailing list