[ovirt-devel] Re: Backup: how to download only used extents from imageio backend

30 Jun 2020

      hi,

On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:
...
On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi@grinser.de> wrote:
...
https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten...
As i failed to find them, are there any existing functions/api calls
that could be used to download only the used extents to a file/fifo
pipe?
To use _internal.io.copy to copy the image to tape, we need to solve
several issues:
1. how do you write the extents to tape so that you can extract them later?
2. provide a backend that knows how to stream data to tape in the right format
3. fix client.download() to consider the number of writers allowed by
the backend,
   since streaming to tape using multiple writers will not be possible.
so, speaking as someone who works for a backup vendor, issue  1 and 2 are
already solved by our software, the backend is there, we just need an
way to extract the data from the api without storing it into a file
first. Something like:

 backup_vm.py full <vm_uuid> pipe

is already sufficient, as our backup client software would simply read
the data from the pipe, sending it to our backend which does all the
stuff regarding tape communication and format.

The old implementation used the snapshot/attach feature, where our
backup client is reading directly from the attached storage device,
sending the data to the backend, which cares about multiplexing to tape,
possible dedpulication, etc..

Tape is not the only use case here, most of the times our customers want
to write data to storage devices which do not expose an regular file
system (such as dedup services, StoreOnce, Virtual Tape solutions etc).
...
To restore this backup, you need to:
1. find the tar in the tape (I have no idea how you would do this)
2. extract backup info from the tar
3. extract extents from the tar
1-3 are not an issue here and handled by our backend
...
4. start an upload transfer
5. for each data extent:
    read data from the tar member, and send to imageio using the right
    offset and size
that is some good information, so it is possible to create an empty disk
with the same size using the API and then directly send the extents with
their propper offset. How does it look with an incremental backup on top
of an just restored full backup. Does the imageio backend automatically
rebase and commit the data from the incremental backup during upload?

As i understand it, requesting the extents directly and writing them to
a file, leaves you with an image in raw format, which then needs to be
properly re-aligned with zeros and converted to qcow2, beeing able to
commit any of the incremental backups i have stored somewhere. As during
upload, an convert is possible, that means we dont have to rebuild the
full/inc chain using a temporary file which we then upload?
...
So the missing part is to create a connection to imageio and reading the data.
The easiest way is to use imageio._internal.backends.http, but note that this
is internal now, so you should not use it outside of imageio. It is fine for
writing proof of concept, and if you can show a good use case we can work
on public API.
yes, that is what i noticed. My current solution would be to use the
interal functions to query the extent information and then continue
extracting them, to be able to pipe the data into our backend.
...
You can write this using http.client.HTTPSConnection without using
the http backend, but it would be a lot of code.
thanks for your example, i will give it a try during POC implementation.
...
We probably need to expose the backends or a simplified interface
in the client public API to make it easier to write such applications.
Maybe something like:
client.coy(src, dst)
Where src and dst are objects implementing imageio backend interface.
But before we do this we need to see some examples of real programs
using imageio, to understand the requirements better.
the main feature for us would be to be able to read the data and
pipe it somewhere, which works by using the _internal api
functions, but having a stable interface for it would be really
good for any kind of backup vendor to implement a client for
the new api into their software.

If anyone is interested to hear more thoughts about that, also from
redhat, dont hesitate to contact me directly for having a call.

bye,
    - michael