Backup: how to download only used extents from imageio backend

Michael Ablassmeier

30 Jun 2020 30 Jun '20

1:13 a.m.

hi, im currently looking at the new incremental backup api that has been part of the 4.4 and RHV 4.4-beta release. So far i was able to create full/incremental backups and restore without any problem. Now, using the backup_vm.py example from the ovirt-engine-sdk i get the following is happening during a full backup: 1) imageio client api requests transfer 2) starts qemu-img to create a local qemu image with same size 3) starts qemu-nbd to serve this image 4) reads used extents from provided imageio source, passes data to qemu-nbd process 5) resulting file is a thin provisioned qcow image with the actual data of the VM's used space. while this works great, it has one downside: if i backup a virtual machine with lots of used extents, or multiple virtual machines at the same time, i may run out of space, if my primary backup target is not a regular disk. Imagine i want to stream the FULL backup to tape directly like backup_vm.py full [..] <vm_uuid> /dev/nst0 thats currently not possible, because qemu-img is not able to open a tape device directly, given its nature of the qcow2 format. So what iam basically looking for, is a way to download only the extents from the imageio server that are really in use, not depending on qemu-* tools, to be able to pipe the data somehwere else. Standard tools, like for example curl, will allways download the full provisioned image from the imageio backend (of course). I noticed is that it is possible to query the extents via: https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten... As i failed to find them, are there any existing functions/api calls that could be used to download only the used extents to a file/fifo pipe? So far, i played around with the _internal.io.copy function, beeing able to at least read the data into a in memory BytesIO stream, but thats not the solution to my "problem" :) bye, - michael

Show replies by date

Nir Soffer

30 Jun 30 Jun

8:49 a.m.

New subject: Backup: how to download only used extents from imageio backend

On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi@grinser.de> wrote:

...

hi,

im currently looking at the new incremental backup api that has been part of the 4.4 and RHV 4.4-beta release. So far i was able to create full/incremental backups and restore without any problem.

Now, using the backup_vm.py example from the ovirt-engine-sdk i get the following is happening during a full backup:

1) imageio client api requests transfer 2) starts qemu-img to create a local qemu image with same size 3) starts qemu-nbd to serve this image 4) reads used extents from provided imageio source, passes data to qemu-nbd process 5) resulting file is a thin provisioned qcow image with the actual data of the VM's used space.

while this works great, it has one downside: if i backup a virtual machine with lots of used extents, or multiple virtual machines at the same time, i may run out of space, if my primary backup target is not a regular disk.

Imagine i want to stream the FULL backup to tape directly like

backup_vm.py full [..] <vm_uuid> /dev/nst0

thats currently not possible, because qemu-img is not able to open a tape device directly, given its nature of the qcow2 format.

So what iam basically looking for, is a way to download only the extents from the imageio server that are really in use, not depending on qemu-* tools, to be able to pipe the data somehwere else.

Standard tools, like for example curl, will allways download the full provisioned image from the imageio backend (of course).

I noticed is that it is possible to query the extents via:

https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten...

As i failed to find them, are there any existing functions/api calls that could be used to download only the used extents to a file/fifo pipe?

So far, i played around with the _internal.io.copy function, beeing able to at least read the data into a in memory BytesIO stream, but thats not the solution to my "problem" :)

To use _internal.io.copy to copy the image to tape, we need to solve several issues: 1. how do you write the extents to tape so that you can extract them later? 2. provide a backend that knows how to stream data to tape in the right format 3. fix client.download() to consider the number of writers allowed by the backend, since streaming to tape using multiple writers will not be possible. I think we can start with a simple implementation using imageio API, and once we have a working solution, we can consider making a backend. A possible solution for 1 is to use tar format, creating one tar per backup. The tar structure can be: - backup info - json file with information about this backup like vm id, disk id, date, checkpoint, etc. - extents - the json returned from imageio as is. Using this json you can restore later every extent to the right location in the restored image - extent 1 - first data extent (zero=False) ... - extent N - last data extent To restore this backup, you need to: 1. find the tar in the tape (I have no idea how you would do this) 2. extract backup info from the tar 3. extract extents from the tar 4. start an upload transfer 5. for each data extent: read data from the tar member, and send to imageio using the right offset and size Other formats are possible, but reusing tar seems like the easiest way and will make it easier to write and read backups from tapes. Creating a tar file and adding items using streaming can be done like this: with tarfile.open("/dev/xxx", "w|") as tar: # Create tarinfo for extent-N # setting other attributes may be needed tarinfo = tarfile.Tarinfo("extent-{}".format(extent_number)) tarinfo.size = extent_size # reader must implement read(n), providing tarinfo.size bytes. tar.addfile(tarinfo, fileObj=reader) I never tried to write directly to tape with python tarfile, but it should work. So the missing part is to create a connection to imageio and reading the data. The easiest way is to use imageio._internal.backends.http, but note that this is internal now, so you should not use it outside of imageio. It is fine for writing proof of concept, and if you can show a good use case we can work on public API. With that backend, you can do this: from imageio._internal.backends impot http with http.Backend(transfer_url, cafile) as backend: extents = list(backend.extents("zero")) # Write extents to tarfile. Assuming you wrote a helper write_to_tar() # doing the Tarinfo dance. extents_data = json.dumps([extent.to_dict() for extent in extents]) write_to_tar("extents", len(extent_data), io.BytesIO(extents_data)) for n, extent in enumerate(e for e in extents if not e.zero): # Seek to start of extent. Reading extent.length bytes will # return extent data. backend.seek(extent.start) # Backends do not implement read() and it would be inefficient to # implement read. This is a quick hack to make it possible to integrate # other code expecting file-like objects. # reader is http.HTTPResponse() instance, implementing read(). reader = backend._get(extent.length) write_to_tar("extent-{}".format(n), extent.length, reader) For incremental backup, you will need to change: extents = list(backend.extents("dirty")) ... for n, extent in enumerate(e for e in extents if e.dirty): You can write this using http.client.HTTPSConnection without using the http backend, but it would be a lot of code. We probably need to expose the backends or a simplified interface in the client public API to make it easier to write such applications. Maybe something like: client.coy(src, dst) Where src and dst are objects implementing imageio backend interface. But before we do this we need to see some examples of real programs using imageio, to understand the requirements better. Nir

Michael Ablassmeier

2:14 p.m.

New subject: Backup: how to download only used extents from imageio backend

hi, On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:

...

On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi@grinser.de> wrote:

...
https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten...

As i failed to find them, are there any existing functions/api calls that could be used to download only the used extents to a file/fifo pipe?

To use _internal.io.copy to copy the image to tape, we need to solve several issues:

1. how do you write the extents to tape so that you can extract them later? 2. provide a backend that knows how to stream data to tape in the right format 3. fix client.download() to consider the number of writers allowed by the backend, since streaming to tape using multiple writers will not be possible.

so, speaking as someone who works for a backup vendor, issue 1 and 2 are already solved by our software, the backend is there, we just need an way to extract the data from the api without storing it into a file first. Something like: backup_vm.py full <vm_uuid> pipe is already sufficient, as our backup client software would simply read the data from the pipe, sending it to our backend which does all the stuff regarding tape communication and format. The old implementation used the snapshot/attach feature, where our backup client is reading directly from the attached storage device, sending the data to the backend, which cares about multiplexing to tape, possible dedpulication, etc.. Tape is not the only use case here, most of the times our customers want to write data to storage devices which do not expose an regular file system (such as dedup services, StoreOnce, Virtual Tape solutions etc).

...

To restore this backup, you need to: 1. find the tar in the tape (I have no idea how you would do this) 2. extract backup info from the tar 3. extract extents from the tar

1-3 are not an issue here and handled by our backend

...

4. start an upload transfer 5. for each data extent: read data from the tar member, and send to imageio using the right offset and size

that is some good information, so it is possible to create an empty disk with the same size using the API and then directly send the extents with their propper offset. How does it look with an incremental backup on top of an just restored full backup. Does the imageio backend automatically rebase and commit the data from the incremental backup during upload? As i understand it, requesting the extents directly and writing them to a file, leaves you with an image in raw format, which then needs to be properly re-aligned with zeros and converted to qcow2, beeing able to commit any of the incremental backups i have stored somewhere. As during upload, an convert is possible, that means we dont have to rebuild the full/inc chain using a temporary file which we then upload?

...

So the missing part is to create a connection to imageio and reading the data.

The easiest way is to use imageio._internal.backends.http, but note that this is internal now, so you should not use it outside of imageio. It is fine for writing proof of concept, and if you can show a good use case we can work on public API.

yes, that is what i noticed. My current solution would be to use the interal functions to query the extent information and then continue extracting them, to be able to pipe the data into our backend.

...

You can write this using http.client.HTTPSConnection without using the http backend, but it would be a lot of code.

thanks for your example, i will give it a try during POC implementation.

...

We probably need to expose the backends or a simplified interface in the client public API to make it easier to write such applications.

Maybe something like:

client.coy(src, dst)

Where src and dst are objects implementing imageio backend interface.

But before we do this we need to see some examples of real programs using imageio, to understand the requirements better.

the main feature for us would be to be able to read the data and pipe it somewhere, which works by using the _internal api functions, but having a stable interface for it would be really good for any kind of backup vendor to implement a client for the new api into their software. If anyone is interested to hear more thoughts about that, also from redhat, dont hesitate to contact me directly for having a call. bye, - michael

Nir Soffer

1 Jul 1 Jul

12:43 p.m.

New subject: Backup: how to download only used extents from imageio backend

On Tue, Jun 30, 2020 at 10:22 PM Michael Ablassmeier <abi@grinser.de> wrote:

...

hi,

On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:

...
On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi@grinser.de> wrote:

...
https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten...

As i failed to find them, are there any existing functions/api calls that could be used to download only the used extents to a file/fifo pipe?

To use _internal.io.copy to copy the image to tape, we need to solve several issues:

1. how do you write the extents to tape so that you can extract them later? 2. provide a backend that knows how to stream data to tape in the right format 3. fix client.download() to consider the number of writers allowed by the backend, since streaming to tape using multiple writers will not be possible.

so, speaking as someone who works for a backup vendor, issue 1 and 2 are already solved by our software, the backend is there, we just need an way to extract the data from the api without storing it into a file first. Something like:

backup_vm.py full <vm_uuid> pipe

is already sufficient, as our backup client software would simply read the data from the pipe, sending it to our backend which does all the stuff regarding tape communication and format.

Great, but piping the data is not so simple, see below.

...

The old implementation used the snapshot/attach feature, where our backup client is reading directly from the attached storage device, sending the data to the backend, which cares about multiplexing to tape, possible dedpulication, etc..

In this case you read a complete disk, including the unallocated areas which read as zeroes. This is not efficient, creating lots of I/O and network bandwidth on the way to the backup software, where you do deduplication etc.

...

Tape is not the only use case here, most of the times our customers want to write data to storage devices which do not expose an regular file system (such as dedup services, StoreOnce, Virtual Tape solutions etc).

...
To restore this backup, you need to: 1. find the tar in the tape (I have no idea how you would do this) 2. extract backup info from the tar 3. extract extents from the tar

1-3 are not an issue here and handled by our backend

...
4. start an upload transfer 5. for each data extent: read data from the tar member, and send to imageio using the right offset and size

that is some good information, so it is possible to create an empty disk with the same size using the API and then directly send the extents with their propper offset. How does it look with an incremental backup on top of an just restored full backup. Does the imageio backend automatically rebase and commit the data from the incremental backup during upload?

No, during upload you get the similar interface - you can write to any offset or zero a byte range. imageio API is mostly like a remote file descriptor. Instead of integer (fd=42) you get a random URL (https://host:port/images/efb761c6-2b06-4b46-bf50-2c40677ea419). Using URL you can read, write or zero a byte range. During restore, you need to write back the data that should be on the disk at a specific point in time. Ideally your backup software can provide a similar interface to pull data for a specific point in time, so you can push it to storage. If your backup software can only return data from specific backup, you can restore the disk state using this flow 1. Copy data from the last full backup before the restore point to storage 2. For each incremental backup since that full backup: copy data from the backup to storage 3. Zero all the areas that were not written in the previous steps. This is not not the most efficient way since you may copy the same area several times, so this should ideally be handled by the backup software.

...

As i understand it, requesting the extents directly and writing them to a file, leaves you with an image in raw format, which then needs to be properly re-aligned with zeros and converted to qcow2, beeing able to commit any of the incremental backups i have stored somewhere.

If you write the extent to a file in raw format, you will have holes in the image. If you want to pipe the data you cannot have holes, unless you want to generate zeroes for the holes, and pipe the zeroes, which is not efficient. Example: [ {"start": 0, "length": 65536, "zero": False}, {"start: 65536, "length": 1073741824, "zero": True}, ] If you pipe the zeros you are going to push 1g of zeros to your pipe. This can not work for incremental backup since in this case you get only the extents that were modified since the last backup, and you cannot fill the space between these extents with zeros. [ {"start": 0, "length": 65536, "dirty": True}, {"start: 65536, "length": 1073741824, "dirty": False}, ] You must preserve the hole, so when you restore you can skip this extent. If you want to pipe the data, you must encode the data in some way so you can push the data and the holes to your pipe. One way that we considered in the past is to support a chunked-like format, stream of data extents and hole extents. For example: data 0000000040000000\r\n <1 GiB of data>\r\n hole 0000000000100000\r\n \r\n This is similar to the incremental backup provided by ceph: https://docs.ceph.com/docs/master/dev/rbd-diff/ We did not implement it since providing a list of extents and a way to read the extents seems a more generic solution that can make it easier to integrate with many backup vendors that may use different solutions to storage and manage the data. So you can read data from imageio and push it to your pipe in similar format. If you do this the http backend a better way to pipe the data would be: backend.write_to(writer, length, buf) which accept an object implementing write(buf), and push length bytes from the server to this object. Your writer can be sys.stdout if you want to pipe the backup to some other process. In this case your backup loop may be: for extent in extents: write_extent_header(writer, extent) if not extent.zero: backend.write_to(writer, extents.length, buf) And you restore loop would be something like: for extent in extents: backend.seek(extent.start) if extent.zero: backend.zero(extent.length) else: backend.read_from(reader, extent.length, buf) read_from() like write_to(), but works in the other direction.

...

As during upload, an convert is possible, that means we dont have to rebuild the full/inc chain using a temporary file which we then upload?

If your backup backend can stream the data for a specific point in time, considering all the backups since the last full backup, you don't need any temporary files. The convert step in upload is done on the server side. The upload pipeline is: backup storage -> restore program -> imageio server -> qemu-nbd -> volume imageio server accepts write and zero request and convert them to NBD_CMD_WRITE and NBD_CMD_WRITE_ZEROES to qemu-nbd, and qemu-nbd is writing the data and the zeros to the image using qcow2 or raw drivers. The backup pipeline is similar: volume -> qemu -> imageio server -> backup program -> backup storage imageio server accepts extents request and converts it to NBD_CMD_BLOCK_STATUS requests to qemu. Then it accept read requests and convert it to NBD_CMD_READ to qemu and return the data qemu returns.

...

...
So the missing part is to create a connection to imageio and reading the data.

The easiest way is to use imageio._internal.backends.http, but note that this is internal now, so you should not use it outside of imageio. It is fine for writing proof of concept, and if you can show a good use case we can work on public API.

yes, that is what i noticed. My current solution would be to use the interal functions to query the extent information and then continue extracting them, to be able to pipe the data into our backend.

...
You can write this using http.client.HTTPSConnection without using the http backend, but it would be a lot of code.

thanks for your example, i will give it a try during POC implementation.

...
We probably need to expose the backends or a simplified interface in the client public API to make it easier to write such applications.

Maybe something like:

client.coy(src, dst)

Where src and dst are objects implementing imageio backend interface.

But before we do this we need to see some examples of real programs using imageio, to understand the requirements better.

the main feature for us would be to be able to read the data and pipe it somewhere, which works by using the _internal api functions, but having a stable interface for it would be really good for any kind of backup vendor to implement a client for the new api into their software.

The main challenge is to find a generic format supporting streaming that most vendors can use. If we will have such a format we can support it in the client and in the server. For example we can provide: GET /images/xxx-yyy?format=sparse&context=zero This can return stream of data/zero extents that can be piped using standard tools like curl. And we can support restore using: PUT /images/xxx-yyy?format=sparse So you can pussh the same stream back - using one request. The disadvantage is that your system must understand this sprse format. Parse it during backup, and maybe construct it during restore. If this looks like a useful way please file RFE to implement it.

...

If anyone is interested to hear more thoughts about that, also from redhat, dont hesitate to contact me directly for having a call.

Good idea. Cheers, Nir

Nir Soffer

2 Jul 2 Jul

7:28 a.m.

New subject: Backup: how to download only used extents from imageio backend

Michael, Continuing the discussion, I posted RFC patch adding a public ImageioClient class: https://gerrit.ovirt.org/c/110068 This is an early version for discussion, the API may change based on the feedback we get from users. I posted an example showing how the client can used: https://gerrit.ovirt.org/c/110069 Because the streaming use case seems to be what you want, I used the stream format mentioned in the previous mail for this example. You can review the patches here on in https://gerrit.ovirt.org/ (I think you need to create a user). If you want to test this code, you can use git: $ git clone https://gerrit.ovirt.org/ovirt-imageio $ git fetch https://gerrit.ovirt.org/ovirt-imageio refs/changes/69/110069/1 && git checkout FETCH_HEAD Then build and install imageio: $ make rpm $ dnf upgrade daemon/dist/*.rpm Or you can install this build: https://jenkins.ovirt.org/job/ovirt-imageio_standard-check-patch/3228/artifa... By adding this repo file: $ cat /etc/yum.repos.d/imageio-testing.repo [ovirt-imageio-testing] name=ovirt-imageio testing repo baseurl=https://jenkins.ovirt.org/job/ovirt-imageio_standard-check-patch/3228/artifa... enabled=1 gpgcheck=0 If you want to test latest commits before they are released, you can enable ovirt-imageio-prevew repo: $ dnf copr enable nsoffer/ovirt-imageio-preview Looking forward to your feedback. Nir On Wed, Jul 1, 2020 at 8:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...

On Tue, Jun 30, 2020 at 10:22 PM Michael Ablassmeier <abi@grinser.de> wrote:

...
hi,

On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:

...
On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi@grinser.de> wrote:

...
https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/exten...

As i failed to find them, are there any existing functions/api calls that could be used to download only the used extents to a file/fifo pipe?

To use _internal.io.copy to copy the image to tape, we need to solve several issues:

1. how do you write the extents to tape so that you can extract them later? 2. provide a backend that knows how to stream data to tape in the right format 3. fix client.download() to consider the number of writers allowed by the backend, since streaming to tape using multiple writers will not be possible.

so, speaking as someone who works for a backup vendor, issue 1 and 2 are already solved by our software, the backend is there, we just need an way to extract the data from the api without storing it into a file first. Something like:

backup_vm.py full <vm_uuid> pipe

is already sufficient, as our backup client software would simply read the data from the pipe, sending it to our backend which does all the stuff regarding tape communication and format.

Great, but piping the data is not so simple, see below.

...
The old implementation used the snapshot/attach feature, where our backup client is reading directly from the attached storage device, sending the data to the backend, which cares about multiplexing to tape, possible dedpulication, etc..

In this case you read a complete disk, including the unallocated areas which read as zeroes. This is not efficient, creating lots of I/O and network bandwidth on the way to the backup software, where you do deduplication etc.

...
Tape is not the only use case here, most of the times our customers want to write data to storage devices which do not expose an regular file system (such as dedup services, StoreOnce, Virtual Tape solutions etc).

...
To restore this backup, you need to: 1. find the tar in the tape (I have no idea how you would do this) 2. extract backup info from the tar 3. extract extents from the tar

1-3 are not an issue here and handled by our backend

...
4. start an upload transfer 5. for each data extent: read data from the tar member, and send to imageio using the right offset and size

that is some good information, so it is possible to create an empty disk with the same size using the API and then directly send the extents with their propper offset. How does it look with an incremental backup on top of an just restored full backup. Does the imageio backend automatically rebase and commit the data from the incremental backup during upload?

No, during upload you get the similar interface - you can write to any offset or zero a byte range.

imageio API is mostly like a remote file descriptor. Instead of integer (fd=42) you get a random URL (https://host:port/images/efb761c6-2b06-4b46-bf50-2c40677ea419). Using URL you can read, write or zero a byte range.

During restore, you need to write back the data that should be on the disk at a specific point in time.

Ideally your backup software can provide a similar interface to pull data for a specific point in time, so you can push it to storage. If your backup software can only return data from specific backup, you can restore the disk state using this flow

1. Copy data from the last full backup before the restore point to storage 2. For each incremental backup since that full backup: copy data from the backup to storage 3. Zero all the areas that were not written in the previous steps.

This is not not the most efficient way since you may copy the same area several times, so this should ideally be handled by the backup software.

...
As i understand it, requesting the extents directly and writing them to a file, leaves you with an image in raw format, which then needs to be properly re-aligned with zeros and converted to qcow2, beeing able to commit any of the incremental backups i have stored somewhere.

If you write the extent to a file in raw format, you will have holes in the image. If you want to pipe the data you cannot have holes, unless you want to generate zeroes for the holes, and pipe the zeroes, which is not efficient.

Example:

[ {"start": 0, "length": 65536, "zero": False}, {"start: 65536, "length": 1073741824, "zero": True}, ]

If you pipe the zeros you are going to push 1g of zeros to your pipe.

This can not work for incremental backup since in this case you get only the extents that were modified since the last backup, and you cannot fill the space between these extents with zeros.

[ {"start": 0, "length": 65536, "dirty": True}, {"start: 65536, "length": 1073741824, "dirty": False}, ]

You must preserve the hole, so when you restore you can skip this extent.

If you want to pipe the data, you must encode the data in some way so you can push the data and the holes to your pipe.

One way that we considered in the past is to support a chunked-like format, stream of data extents and hole extents.

For example:

data 0000000040000000\r\n <1 GiB of data>\r\n hole 0000000000100000\r\n \r\n

This is similar to the incremental backup provided by ceph: https://docs.ceph.com/docs/master/dev/rbd-diff/

We did not implement it since providing a list of extents and a way to read the extents seems a more generic solution that can make it easier to integrate with many backup vendors that may use different solutions to storage and manage the data.

So you can read data from imageio and push it to your pipe in similar format. If you do this the http backend a better way to pipe the data would be:

backend.write_to(writer, length, buf)

which accept an object implementing write(buf), and push length bytes from the server to this object. Your writer can be sys.stdout if you want to pipe the backup to some other process.

In this case your backup loop may be:

for extent in extents: write_extent_header(writer, extent) if not extent.zero: backend.write_to(writer, extents.length, buf)

And you restore loop would be something like:

for extent in extents: backend.seek(extent.start) if extent.zero: backend.zero(extent.length) else: backend.read_from(reader, extent.length, buf)

read_from() like write_to(), but works in the other direction.

...
As during upload, an convert is possible, that means we dont have to rebuild the full/inc chain using a temporary file which we then upload?

If your backup backend can stream the data for a specific point in time, considering all the backups since the last full backup, you don't need any temporary files.

The convert step in upload is done on the server side. The upload pipeline is:

backup storage -> restore program -> imageio server -> qemu-nbd -> volume

imageio server accepts write and zero request and convert them to NBD_CMD_WRITE and NBD_CMD_WRITE_ZEROES to qemu-nbd, and qemu-nbd is writing the data and the zeros to the image using qcow2 or raw drivers.

The backup pipeline is similar:

volume -> qemu -> imageio server -> backup program -> backup storage

imageio server accepts extents request and converts it to NBD_CMD_BLOCK_STATUS requests to qemu. Then it accept read requests and convert it to NBD_CMD_READ to qemu and return the data qemu returns.

...
...
So the missing part is to create a connection to imageio and reading the data.

The easiest way is to use imageio._internal.backends.http, but note that this is internal now, so you should not use it outside of imageio. It is fine for writing proof of concept, and if you can show a good use case we can work on public API.

yes, that is what i noticed. My current solution would be to use the interal functions to query the extent information and then continue extracting them, to be able to pipe the data into our backend.

...
You can write this using http.client.HTTPSConnection without using the http backend, but it would be a lot of code.

thanks for your example, i will give it a try during POC implementation.

...
We probably need to expose the backends or a simplified interface in the client public API to make it easier to write such applications.

Maybe something like:

client.coy(src, dst)

Where src and dst are objects implementing imageio backend interface.

But before we do this we need to see some examples of real programs using imageio, to understand the requirements better.

the main feature for us would be to be able to read the data and pipe it somewhere, which works by using the _internal api functions, but having a stable interface for it would be really good for any kind of backup vendor to implement a client for the new api into their software.

The main challenge is to find a generic format supporting streaming that most vendors can use. If we will have such a format we can support it in the client and in the server.

For example we can provide:

GET /images/xxx-yyy?format=sparse&context=zero

This can return stream of data/zero extents that can be piped using standard tools like curl.

And we can support restore using:

PUT /images/xxx-yyy?format=sparse

So you can pussh the same stream back - using one request.

The disadvantage is that your system must understand this sprse format. Parse it during backup, and maybe construct it during restore.

If this looks like a useful way please file RFE to implement it.

...
If anyone is interested to hear more thoughts about that, also from redhat, dont hesitate to contact me directly for having a call.

Good idea.

Cheers, Nir

Michael Ablassmeier

9:36 a.m.

New subject: Backup: how to download only used extents from imageio backend

hi again! On Thu, Jul 02, 2020 at 03:28:45PM +0300, Nir Soffer wrote:

...

If you want to test this code, you can use git: $ git clone https://gerrit.ovirt.org/ovirt-imageio $ git fetch https://gerrit.ovirt.org/ovirt-imageio refs/changes/69/110069/1 && git checkout FETCH_HEAD

i just adopted my POC for using the new functions and it worked flawlessly, backing a virtual machine full/inc and restoring it. Im going have more thoughts the coming days, but for now that seems a good and usuable interface for us! Just as a side note if anyone whos trying with full/inc chains: the backup stream of the full chain includes the stop frame, so you cant just do "upload < full inc", as it will always exit the loop if the first stop marker is hit, not attempting to upload the changes from the inc file ;) thanks for all your input! bye, - michael

Nir Soffer

10:02 a.m.

New subject: Backup: how to download only used extents from imageio backend

On Thu, Jul 2, 2020 at 5:36 PM Michael Ablassmeier <abi@grinser.de> wrote:

...

hi again!

On Thu, Jul 02, 2020 at 03:28:45PM +0300, Nir Soffer wrote:

...
If you want to test this code, you can use git: $ git clone https://gerrit.ovirt.org/ovirt-imageio $ git fetch https://gerrit.ovirt.org/ovirt-imageio refs/changes/69/110069/1 && git checkout FETCH_HEAD

i just adopted my POC for using the new functions and it worked flawlessly, backing a virtual machine full/inc and restoring it. Im going have more thoughts the coming days, but for now that seems a good and usuable interface for us!

Just as a side note if anyone whos trying with full/inc chains: the backup stream of the full chain includes the stop frame, so you cant just do "upload < full inc", as it will always exit the loop if the first stop marker is hit, not attempting to upload the changes from the inc file ;)

Well this is just an example how to use the API. Real code will probably have some "start" frame with metadata, and will continue after the stop frame if needed. But it will probably be more useful if the backup software can generate a restore stream on the fly, instead of playing back the recorded data, so you don't write the same area multiple times.

...

thanks for all your input!

bye, - michael

Eyal Shenitzky

11:45 a.m.

New subject: Backup: how to download only used extents from imageio backend

Thanks for the feedback Michael. Great to hear that things work well also from users! Please let us know if you encounter some issues or you have some suggestions for improvements. On Thu, 2 Jul 2020 at 17:42, Michael Ablassmeier <abi@grinser.de> wrote:

...

hi again!

On Thu, Jul 02, 2020 at 03:28:45PM +0300, Nir Soffer wrote:

...
If you want to test this code, you can use git: $ git clone https://gerrit.ovirt.org/ovirt-imageio $ git fetch https://gerrit.ovirt.org/ovirt-imageio refs/changes/69/110069/1 && git checkout FETCH_HEAD

i just adopted my POC for using the new functions and it worked flawlessly, backing a virtual machine full/inc and restoring it. Im going have more thoughts the coming days, but for now that seems a good and usuable interface for us!

Just as a side note if anyone whos trying with full/inc chains: the backup stream of the full chain includes the stop frame, so you cant just do "upload < full inc", as it will always exit the loop if the first stop marker is hit, not attempting to upload the changes from the inc file ;)

thanks for all your input!

bye, - michael

-- Regards, Eyal Shenitzky

1868

Age (days ago)

1870

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Eyal Shenitzky
Michael Ablassmeier
Nir Soffer

Backup: how to download only used extents from imageio backend

tags

participants (3)