Few questions on Backup Restore API implmn.

Wed Jun 5 11:48:13 UTC 2013

On 06/05/2013 02:14 PM, Deepak C Shetty wrote:
> On 06/05/2013 03:58 PM, Michael Pasternak wrote:
>> Hi Deepak,
>>
>> On 05/30/2013 06:07 PM, Deepak C Shetty wrote:
>>> Hi All,
>>>     Per the previous mail from Ayal.. I am refering to the flow below...
>>>
>>> The backup flow should be (all api calls are engine REST API):
>> I disagree with this, by following this pattern you move internal
>> logic (even more than this given implementation specific logic) to
>> the api users instead of abstracting it and creating solid api for
>> backups,
>>
>> +++++++++ several api calls for single backup ++++++++++++++
>>
>>
>> cons:
>> ====
>>
>> 1. user have to be aware of internal (implementation specific) flow of each provider in order to be able creating backup
>>    1.1 call add(a)->do(a)->del(a)->do(b)->etc.
>> 2. no flexibility for backup provider to change the backup flow (as it would be considered an api break)
>> 3. no atomic resolution for the /backup failures
>>    3.1. failure at do(b) of add(a)->do(a)->del(a)->do(b)->etc. will mean
>>         that user have to del(a) manually and if it's deeper in the flow
>>         every api user will have to implement own rollbacks to the steps that took
>>         place before the failure.
>> 4. backward compatibility becomes an issue when exposing internals in such a low level.
>> 5. new features expose becomes much complex task cause they may require performing extra
>>    step/s in a middle.
>>
>> pros:
>> ====
>>
>> can't think of any ..., cause it doesn't give you flexibility, but creates unnatural
>> (to public APIs) complexity.
> 
> I was able to dig the below from one of the older mails. I had exactly this Q on why we need separate APIs Vs 1 and this is what Ayal had to say
> 
> Deepak asked:
> 
> /Wouldn't it be better if there was just 1 REST API for startign backup
> which would take all the necessary inputs (VM uuid, hostid, disk(s)) and
> internally caused the above individual APIs to get called ?
> Why do we have to expose each of the steps in backup process as
> individual REST APIs ?
> /
> 
> Ayal said:
> 
> /1. Because a single API assumes that I always want to backup everything which isn't necessarily true (normally backup policy for system disk and data disks are different)

modelling backup under /api/vms/xxx/disks/yyy/backups solves this.

> 2. Going forward we will have disk level snapshot in which case the backup API would have to change.

same

> 3. You could always return the vm config per disk when calling "prepare backup disk" which would be a bit redundant now but once we have disk level snapshots it would be more relevant and this way remove 1 API call now.
> Later on when we do have disk level snaps it could be an option to the command to take the snap or something I guess.

it's only proves my point, we can/should hide this under api abstraction
rather than forcing users using different flows now and then.

> /
> So i kind of agreed to what Ayal said... from the perspective that it provides flexibility to the end user using the API, as to how he/she wants to script/use it.
> What do you think ?  I just wanted to put the above here... so that we are sure we can considering all aspects before making the decision on 1 Vs many APIs.

i've mentioned this before, - i don't think this is about flexibility, if you want to allow
user doing several things differently, - expose several api for same thing (like method overloads),
but it doesn't mean that we should be exposing implementation internals to end user.

> 
> 1 API provides simplicity, but getting flexibility using 1 API would mean passing addnl params.. for eg> If i want to backup a particular vmdisk only, then the xml below would change, rite ?

no, you misunderstood, see url [1] from my previous eamil, you already doing this for specific disk - yyy

[1] /api/vms/xxx/disks/yyy/backups

> But looks like either way, its possible to get things done... 
> I personally don't have much experience in the area of REST APIs to take the call on whether we need 1 or multiple APIs... so I am still inconclusive here (sorry about that!)
> All i can say is that with 1 API approach too, we can get flexibility.
> 
> 
>>
>>
>> +++++++++ single api call for single backup ++++++++++++++
>>
>> cons:
>> ====
>>
>> nothing comes to my mind, are there any?
>>
>> pros:
>> ====
>>
>> 1. user is not aware of any backup internals and can consume different /backup
>>    providers using very same api and supplying parameters relevant for the provider
>>    he'd like to use.
>> 2. backup providers can change/extend internal flow as much as they want/need, only
>>    by modifying internal engine driver, user using abstraction over the backup api
>>    won't feel the difference.
>> 3. backup providers can execute backup as single atomic operation and take care
>>    for rollback cleanups in the transaction.
>> 4. backup providers can easily maintain backward compatibility over abstraction
>> 5. adding new features that require flow extension can be easily hidden under
>>    the abstraction layer.
>>
>> also this is a common practice to expose abstraction layer to public, while
>> vendors only have to implement the driver in order to be supported.
>>
>>
>>> 1. create vm snapshot (the api call exists today)
>>> 2. get VM Config (new API)
>>> 3. prepare backup disk (new api, should probably accept: hostid, disk;
>>> return: paths to device on host as well as map of changed blocks) -
>>> this should be called for every disk that needs to be backed up. Note
>>> that VM snapshot takes a snap of *all* disks of the VM
>>> 4. attach disk to backup vm (the api call exists today.  This assumes
>>> virt app) - also has to be called per disk to back up
>>> 5. At this point backup app would do the backup
>>>
>> this can be easily done under the single api call:
>> =================================================
>>
>> POST /api/vms/xxx/disks/yyy/backups
>>
>> <backup>
>>   <host id=aaa> #3
>>   <backup_vm id=bbb> #1, #2
>>   <delete_snapshot>true|false<delete_snapshot/> #7
>> <backup/>
>>
>> 1. create vm snapshot => you already in context of vm, trigger snapshot on it
>> 2. get VM Config => you already in context of vm, collect relevant meta
>> 3. prepare backup disk => you in context of the disk already, do whatever is needed
>> 4. attach disk to backup vm => you have backup_vm id in the request body
>> 5. do the actual backup
>>
>> does this makes sense?
>>
>>> 5) detach disk (symmetric to 4)
>>> 6. teardown backup disk (symmetric to 3)
>>> 7. delete snap - This can only be called if VM is down today and is
>>> not mandatory in any event.  Once we have live merge, it should be
>>> policy driven (user should be able to choose whether to keep or delete)
>>>
>>> Questions...
>>>
>>> 1) My current implmn of prepareBackupDisk in VDSM (pls see http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:backup-restore,n,z)
>>> returns drive and cbtinfo as dicts back to engine. The understnading here is that the drive returned will be a dict which has all the necessary info
>>> to represent a NBD drive to libvirt... hence this drive will be passed as-is in the "attach disk to VM" REST API. Is this correct ?
>>>
>>> Note that i cannot return a disk UUID (like VDSM does for create volume case).. because in preparebackupdisk case the image is exported usign qemu-nbd
>>> as a block device over the network... and hence its not a regular vdsm-type disk, hence no UUID, Agree ?
>>>
>>> 2) In "attach disk to VM" REST API user will pass the drive dict which he/she got as part of preparebackupdisk api.. as-is
>>> and VDSM will need to add support for accepting NBD disk as a valid disktype in VM.hotplugDisk() code of VDSM.
> 
>>>
>>> The ability to add a network disk is present in libvirtvm.py... as part of my GlusterSD work, sicne gluster is represented as a NBD to QEMU
>>> but it doesn't work at the API.py level.. its only from prepareImage onwards... hence the need to modify API.py to accept NBD disk type and connect
>>> to the libvirtvm.py appropriately
>>>
>>> Is this acceptable.. otherwise there is no good way to pass the NBD drive's info back to VDSM as part of existing "attach disk to VM" API
>>> Also does the existing "attach disk to VM" API work if a pre-loaded drive dict/hashmap is provided by the user. This will have drive->type = network
>>> and not file/blcok as is currently supported
> 
> Mike,
>     Could you provide your views on this pls ? How do we represent the NBD disk and pass it back to VDSM as part of 'hotplugDisk' API
> Currently hotplugDisk doesn't support drive of type = network!
> 
>>>
>>> 3) After "attach disk to backup VM" REST API step.. the understnading is that some backup vendor specific API will be called to tell the
>>> backup appln....
>>>     -- Block device (attached to the VM) to be used as src for backup
>>>     -- CBT info of this blcok device (which was recd. as part of prepareBackupDisk API)
>>> Is this correct ?
>>>
>>> thanx,
>>> deepak
>>>
>>
> 

-- 

Michael Pasternak
RedHat, ENG-Virtualization R&D