[ovirt-devel] [vdsm] VM recovery now depends on HSM

Michal Skrivanek michal.skrivanek at redhat.com
Thu Jul 10 06:40:58 UTC 2014


On Jul 9, 2014, at 15:38 , Nir Soffer <nsoffer at redhat.com> wrote:

> ----- Original Message -----
>> From: "Adam Litke" <alitke at redhat.com>
>> To: "Michal Skrivanek" <michal.skrivanek at redhat.com>
>> Cc: devel at ovirt.org
>> Sent: Wednesday, July 9, 2014 4:19:09 PM
>> Subject: Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM
>> 
>> On 09/07/14 13:11 +0200, Michal Skrivanek wrote:
>>> 
>>> On Jul 8, 2014, at 22:36 , Adam Litke <alitke at redhat.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> As part of the new live merge feature, when vdsm starts and has to
>>>> recover existing VMs, it calls VM._syncVolumeChain to ensure that
>>>> vdsm's view of the volume chain matches libvirt's.  This involves two
>>>> kinds of operations: 1) sync VM object, 2) sync underlying storage
>>>> metadata via HSM.
>>>> 
>>>> This means that HSM must be up (and the storage domain(s) that the VM
>>>> is using must be accessible.  When testing some rather eccentric error
>>>> flows, I am finding this to not always be the case.
>>>> 
>>>> Is there a way to have VM recovery wait on HSM to come up?  How should
>>>> we respond if a required storage domain cannot be accessed?  Is there
>>>> a mechanism in vdsm to schedule an operation to be retried at a later
>>>> time?  Perhaps I could just schedule the sync and it could be retried
>>>> until the required resources are available.
>>> 
>>> I've briefly discussed with Federico some time ago that IMHO the
>>> syncVolumeChain needs to be changed. It must not be part of VM's create
>>> flow as I expect this quite a bottleneck in big-scale environment (it is
>>> now in fact not executing only on recovery but on all 4 create flows!).
>>> I don't know how yet, but we need to find a different way. Now you just
>>> added yet another reason.
>>> 
>>> So…I too ask for more insights:-)
>> 
>> Sure, so... We switched to running syncVolumeChain at all times to
>> cover a very rare scenario:
>> 
>> 1. VM is running on host A
>> 2. User initiates Live Merge on VM
>> 3. Host A experiences a catastrophic hardware failure before engine
>> can determine if the merge succeeded or failed
>> 4. VM is restarted on Host B
>> 
>> Since (in this case) the host cannot know if a live merge was in
>> progress on the previous host, it needs to always check.
>> 
>> 
>> Some ideas to mitigate:
>> 1. When engine recreates a VM on a new host and a Live Merge was in
>> progress, engine could call a verb to ask the host to synchronize the
>> volume chain.  This way, it only happens when engine knows it's needed
>> and engine can be sure that the required resources (storage
>> connections and domains) are present.
> 
> This seems like the right approach.

+1
I like the "only when needed", since indeed we can assume the scenario is unlikely to happen most of the times (but very real indeed)

> 
>> 
>> 2. The syncVolumeChain call runs in the recovery case to ensure that
>> we clean up after any missed block job events from libvirt while vdsm
>> was stopped/restarting.

can we clean up later on, does it need to be on recovery? Can it be delayed - requested by engine a little bit later?

> 
> We need this since vdsm recover running vms when it starts, before
> engine is connected. Actually engine cannot talk with vdsm until
> it finished the recovery process.
> 
>> In this case, the block job info is saved in
>> the vm conf so the recovery flow could be changed to query libvirt for
>> block job status on only those disks where we know about a previous
>> operation.  For those found gone, we'd call syncVolumeChain.  In this
>> scenario, we still have to deal with the race with HSM initialization
>> and storage connectivity issues.  Perhaps engine should drive this
>> case as well?
> 
> We don't have race in this stage, because even if hsm is up, we do
> not connect to the storage domains until engine ask to do so, and
> engine cannot talk to vdsm until the recovery process and hsm
> initialization ends.
> 
> So we can check with libvirt and have correct info about the vm
> when vdsm starts, but we cannot fix volume metadata at this stage.
> 
> I think we should fix volume metadata when engine ask to do so,
> based on the state of the live merge.
> 
> If we want to do this update without engine control, we can use
> the domain monitor state event to detect when domain monitor
> becomes available, and modify the volume metadata.
> 
> Currently we use this event to unpuse vms that was paused because
> of EIO error. See clientIF.py:126
> 
> Nir 




More information about the Devel mailing list