Michael,
Continuing the discussion, I posted RFC patch adding a public
ImageioClient class:
This is an early version for discussion, the API may change based on
the feedback
we get from users.
I posted an example showing how the client can used:
Because the streaming use case seems to be what you want, I used
the stream format mentioned in the previous mail for this example.
You can review the patches here on in
(I
think you need to create a user).
If you want to test this code, you can use git:
$ git clone
refs/changes/69/110069/1 && git checkout FETCH_HEAD
Then build and install imageio:
$ make rpm
$ dnf upgrade daemon/dist/*.rpm
Or you can install this build:
By adding this repo file:
$ cat /etc/yum.repos.d/imageio-testing.repo
[ovirt-imageio-testing]
name=ovirt-imageio testing repo
enabled=1
gpgcheck=0
If you want to test latest commits before they are released, you can
enable ovirt-imageio-prevew repo:
$ dnf copr enable nsoffer/ovirt-imageio-preview
Looking forward to your feedback.
Nir
On Wed, Jul 1, 2020 at 8:43 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Tue, Jun 30, 2020 at 10:22 PM Michael Ablassmeier <abi(a)grinser.de> wrote:
>
> hi,
>
> On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:
> > On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <abi(a)grinser.de>
wrote:
> > >
https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/ex...
> > >
> > > As i failed to find them, are there any existing functions/api calls
> > > that could be used to download only the used extents to a file/fifo
> > > pipe?
> >
> > To use _internal.io.copy to copy the image to tape, we need to solve
> > several issues:
> >
> > 1. how do you write the extents to tape so that you can extract them later?
> > 2. provide a backend that knows how to stream data to tape in the right format
> > 3. fix client.download() to consider the number of writers allowed by
> > the backend,
> > since streaming to tape using multiple writers will not be possible.
>
> so, speaking as someone who works for a backup vendor, issue 1 and 2 are
> already solved by our software, the backend is there, we just need an
> way to extract the data from the api without storing it into a file
> first. Something like:
>
> backup_vm.py full <vm_uuid> pipe
>
> is already sufficient, as our backup client software would simply read
> the data from the pipe, sending it to our backend which does all the
> stuff regarding tape communication and format.
Great, but piping the data is not so simple, see below.
> The old implementation used the snapshot/attach feature, where our
> backup client is reading directly from the attached storage device,
> sending the data to the backend, which cares about multiplexing to tape,
> possible dedpulication, etc..
In this case you read a complete disk, including the unallocated areas which
read as zeroes. This is not efficient, creating lots of I/O and
network bandwidth
on the way to the backup software, where you do deduplication etc.
> Tape is not the only use case here, most of the times our customers want
> to write data to storage devices which do not expose an regular file
> system (such as dedup services, StoreOnce, Virtual Tape solutions etc).
>
> > To restore this backup, you need to:
> > 1. find the tar in the tape (I have no idea how you would do this)
> > 2. extract backup info from the tar
> > 3. extract extents from the tar
>
> 1-3 are not an issue here and handled by our backend
>
> > 4. start an upload transfer
> > 5. for each data extent:
> > read data from the tar member, and send to imageio using the right
> > offset and size
>
> that is some good information, so it is possible to create an empty disk
> with the same size using the API and then directly send the extents with
> their propper offset. How does it look with an incremental backup on top
> of an just restored full backup. Does the imageio backend automatically
> rebase and commit the data from the incremental backup during upload?
No, during upload you get the similar interface - you can write to any offset
or zero a byte range.
imageio API is mostly like a remote file descriptor. Instead of integer (fd=42)
you get a random URL
(
https://host:port/images/efb761c6-2b06-4b46-bf50-2c40677ea419).
Using URL you can read, write or zero a byte range.
During restore, you need to write back the data that should be on the disk
at a specific point in time.
Ideally your backup software can provide a similar interface to pull data for
a specific point in time, so you can push it to storage. If your backup software
can only return data from specific backup, you can restore the disk state
using this flow
1. Copy data from the last full backup before the restore point to storage
2. For each incremental backup since that full backup:
copy data from the backup to storage
3. Zero all the areas that were not written in the previous steps.
This is not not the most efficient way since you may copy the same area
several times, so this should ideally be handled by the backup software.
> As i understand it, requesting the extents directly and writing them to
> a file, leaves you with an image in raw format, which then needs to be
> properly re-aligned with zeros and converted to qcow2, beeing able to
> commit any of the incremental backups i have stored somewhere.
If you write the extent to a file in raw format, you will have holes
in the image.
If you want to pipe the data you cannot have holes, unless you want to generate
zeroes for the holes, and pipe the zeroes, which is not efficient.
Example:
[
{"start": 0, "length": 65536, "zero": False},
{"start: 65536, "length": 1073741824, "zero": True},
]
If you pipe the zeros you are going to push 1g of zeros to your pipe.
This can not work for incremental backup since in this case you get only the
extents that were modified since the last backup, and you cannot fill the space
between these extents with zeros.
[
{"start": 0, "length": 65536, "dirty": True},
{"start: 65536, "length": 1073741824, "dirty": False},
]
You must preserve the hole, so when you restore you can skip this extent.
If you want to pipe the data, you must encode the data in some way so you
can push the data and the holes to your pipe.
One way that we considered in the past is to support a chunked-like format,
stream of data extents and hole extents.
For example:
data 0000000040000000\r\n
<1 GiB of data>\r\n
hole 0000000000100000\r\n
\r\n
This is similar to the incremental backup provided by ceph:
https://docs.ceph.com/docs/master/dev/rbd-diff/
We did not implement it since providing a list of extents and a way to
read the extents
seems a more generic solution that can make it easier to integrate
with many backup
vendors that may use different solutions to storage and manage the data.
So you can read data from imageio and push it to your pipe in similar format.
If you do this the http backend a better way to pipe the data would be:
backend.write_to(writer, length, buf)
which accept an object implementing write(buf), and push length bytes from
the server to this object. Your writer can be sys.stdout if you want to pipe the
backup to some other process.
In this case your backup loop may be:
for extent in extents:
write_extent_header(writer, extent)
if not extent.zero:
backend.write_to(writer, extents.length, buf)
And you restore loop would be something like:
for extent in extents:
backend.seek(extent.start)
if extent.zero:
backend.zero(extent.length)
else:
backend.read_from(reader, extent.length, buf)
read_from() like write_to(), but works in the other direction.
> As during
> upload, an convert is possible, that means we dont have to rebuild the
> full/inc chain using a temporary file which we then upload?
If your backup backend can stream the data for a specific point in
time, considering
all the backups since the last full backup, you don't need any temporary files.
The convert step in upload is done on the server side. The upload pipeline is:
backup storage -> restore program -> imageio server -> qemu-nbd -> volume
imageio server accepts write and zero request and convert them to
NBD_CMD_WRITE and
NBD_CMD_WRITE_ZEROES to qemu-nbd, and qemu-nbd is writing the data and the zeros
to the image using qcow2 or raw drivers.
The backup pipeline is similar:
volume -> qemu -> imageio server -> backup program -> backup storage
imageio server accepts extents request and converts it to
NBD_CMD_BLOCK_STATUS requests
to qemu. Then it accept read requests and convert it to NBD_CMD_READ
to qemu and return
the data qemu returns.
> > So the missing part is to create a connection to imageio and reading the data.
> >
> > The easiest way is to use imageio._internal.backends.http, but note that this
> > is internal now, so you should not use it outside of imageio. It is fine for
> > writing proof of concept, and if you can show a good use case we can work
> > on public API.
>
> yes, that is what i noticed. My current solution would be to use the
> interal functions to query the extent information and then continue
> extracting them, to be able to pipe the data into our backend.
>
> > You can write this using http.client.HTTPSConnection without using
> > the http backend, but it would be a lot of code.
>
> thanks for your example, i will give it a try during POC implementation.
>
> > We probably need to expose the backends or a simplified interface
> > in the client public API to make it easier to write such applications.
> >
> > Maybe something like:
> >
> > client.coy(src, dst)
> >
> > Where src and dst are objects implementing imageio backend interface.
> >
> > But before we do this we need to see some examples of real programs
> > using imageio, to understand the requirements better.
>
> the main feature for us would be to be able to read the data and
> pipe it somewhere, which works by using the _internal api
> functions, but having a stable interface for it would be really
> good for any kind of backup vendor to implement a client for
> the new api into their software.
The main challenge is to find a generic format supporting streaming
that most vendors
can use. If we will have such a format we can support it in the client
and in the server.
For example we can provide:
GET /images/xxx-yyy?format=sparse&context=zero
This can return stream of data/zero extents that can be piped using
standard tools
like curl.
And we can support restore using:
PUT /images/xxx-yyy?format=sparse
So you can pussh the same stream back - using one request.
The disadvantage is that your system must understand this sprse format. Parse it
during backup, and maybe construct it during restore.
If this looks like a useful way please file RFE to implement it.
> If anyone is interested to hear more thoughts about that, also from
> redhat, dont hesitate to contact me directly for having a call.
Good idea.
Cheers,
Nir