On Thu, Oct 11, 2012 at 3:20 AM, Roy Golan <rgolan(a)redhat.com> wrote:
since the VM run_on_vds was empty, the "confirm host..." didn't clear its
status because its not selected from the DB as one of the host VMs.
I'll try to dig in to see at what point this value was cleared - probably
around the failed migration.
Cool. Let me know if you need anything else from me. Again, I have the
USB stick from cloudhost02 when it failed, if there is a log for vdsmd or
something that might have something useful in it, just send me the path.
Are transactions in use in the system anywhere, either in the DB or the app
layer? If not, have they been considered? I ask because this seems like
the kind of thing they would address nicely. Specifically, if migration
recipient is up and happy, but does not confirm VM is up or migration in
progress, and sender node is no longer responsive, roll back to assuming
the VM is still running on sender. This would avoid the inconsistent state
I had of not-down-but-not-running-on-any-responding-host either and allow
"confirm host..." to clear the unknown state of the VM.
Sorry if I am stating the obvious or over simplifying. It has been a long
time since I wrote any significant code. =)