On Jul 9, 2014, at 15:38 , Nir Soffer <nsoffer(a)redhat.com> wrote:
----- Original Message -----
> From: "Adam Litke" <alitke(a)redhat.com>
> To: "Michal Skrivanek" <michal.skrivanek(a)redhat.com>
> Cc: devel(a)ovirt.org
> Sent: Wednesday, July 9, 2014 4:19:09 PM
> Subject: Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM
>
> On 09/07/14 13:11 +0200, Michal Skrivanek wrote:
>>
>> On Jul 8, 2014, at 22:36 , Adam Litke <alitke(a)redhat.com> wrote:
>>
>>> Hi all,
>>>
>>> As part of the new live merge feature, when vdsm starts and has to
>>> recover existing VMs, it calls VM._syncVolumeChain to ensure that
>>> vdsm's view of the volume chain matches libvirt's. This involves
two
>>> kinds of operations: 1) sync VM object, 2) sync underlying storage
>>> metadata via HSM.
>>>
>>> This means that HSM must be up (and the storage domain(s) that the VM
>>> is using must be accessible. When testing some rather eccentric error
>>> flows, I am finding this to not always be the case.
>>>
>>> Is there a way to have VM recovery wait on HSM to come up? How should
>>> we respond if a required storage domain cannot be accessed? Is there
>>> a mechanism in vdsm to schedule an operation to be retried at a later
>>> time? Perhaps I could just schedule the sync and it could be retried
>>> until the required resources are available.
>>
>> I've briefly discussed with Federico some time ago that IMHO the
>> syncVolumeChain needs to be changed. It must not be part of VM's create
>> flow as I expect this quite a bottleneck in big-scale environment (it is
>> now in fact not executing only on recovery but on all 4 create flows!).
>> I don't know how yet, but we need to find a different way. Now you just
>> added yet another reason.
>>
>> So…I too ask for more insights:-)
>
> Sure, so... We switched to running syncVolumeChain at all times to
> cover a very rare scenario:
>
> 1. VM is running on host A
> 2. User initiates Live Merge on VM
> 3. Host A experiences a catastrophic hardware failure before engine
> can determine if the merge succeeded or failed
> 4. VM is restarted on Host B
>
> Since (in this case) the host cannot know if a live merge was in
> progress on the previous host, it needs to always check.
>
>
> Some ideas to mitigate:
> 1. When engine recreates a VM on a new host and a Live Merge was in
> progress, engine could call a verb to ask the host to synchronize the
> volume chain. This way, it only happens when engine knows it's needed
> and engine can be sure that the required resources (storage
> connections and domains) are present.
This seems like the right approach.
+1
I like the "only when needed", since indeed we can assume the scenario is
unlikely to happen most of the times (but very real indeed)
>
> 2. The syncVolumeChain call runs in the recovery case to ensure that
> we clean up after any missed block job events from libvirt while vdsm
> was stopped/restarting.
can we clean up later on, does it need to be on recovery? Can it be delayed - requested by
engine a little bit later?
We need this since vdsm recover running vms when it starts, before
engine is connected. Actually engine cannot talk with vdsm until
it finished the recovery process.
> In this case, the block job info is saved in
> the vm conf so the recovery flow could be changed to query libvirt for
> block job status on only those disks where we know about a previous
> operation. For those found gone, we'd call syncVolumeChain. In this
> scenario, we still have to deal with the race with HSM initialization
> and storage connectivity issues. Perhaps engine should drive this
> case as well?
We don't have race in this stage, because even if hsm is up, we do
not connect to the storage domains until engine ask to do so, and
engine cannot talk to vdsm until the recovery process and hsm
initialization ends.
So we can check with libvirt and have correct info about the vm
when vdsm starts, but we cannot fix volume metadata at this stage.
I think we should fix volume metadata when engine ask to do so,
based on the state of the live merge.
If we want to do this update without engine control, we can use
the domain monitor state event to detect when domain monitor
becomes available, and modify the volume metadata.
Currently we use this event to unpuse vms that was paused because
of EIO error. See clientIF.py:126
Nir