On Wed, May 30, 2018 at 6:33 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 30.05.2018 um 17:05 hat Eric Blake geschrieben:
> If I understood the question, we start with a local:
>
> T (any format) <- S (qcow2) <- V (qcow2)
>
> and want to create a remote tar file:
>
> dest.tar == | header ... | qcow2 image |
>
> where we write a single collapsed view of the T<-S<-V chain as a qcow2 image
> in the subset of the remote tar file.

I think the problem is that we're talking about two different things in
one thread. If I understand correctly, what oVirt does today is:

1. qemu-img convert to create a temporary qcow2 image that merges the
   whole backing chain in a single file

2. tar to create an temporary OVA archive that contains, amongst others,
   the temporary qcow2 image. This is a second temporary file.

3. Stream this temporary OVA archive over HTTP

Well, today we suggest users to mount a shared storage to multiple hosts that reside in different oVirt/RHV deployments so they could export VMs/templates as OVAs to that shared storage and import these OVAs from the shared storage to a destination deployment. This process involves only #1 and #2.

The technique you proposed earlier for writing disks directly into an OVA, assuming that the target size can be retrieved with 'qemu-img measure', sounds like a nice approach to accelerate this process. I think we should really consider doing that if that's as easy as it sounds.

But #3 is definitely something we are interested in because we expect the next step to be exporting the OVAs to a remote instance of Glance that serves as a shared repository for the different deployments. Being able to stream the collapsed form of a volume chain without writing anything to the storage device would be fantastic. I think that even at the expense of iterating the chain twice - once to map the structure of the jump tables (right?) and once to stream the whole data.
 

Your proposal is about getting rid of the temporary file from step 1,
but keeping the temporary file from step 2. I was kind of ignoring
step 2 and answering how you can avoid a temporary file by creating and
streaming a qcow2 file in a single step, but if you already have the
code to create a qcow2 image as a stream, adding a tar header as well
shouldn't be that hard...

I think Nir was talking about both.

Ideally, we'd somehow get rid of HTTP, which introduces the requirement
of a non-seekable stream.

> So, first use qemu-img to learn how big to size the collapsed qcow2 image,
> and by extension, the overall tar image
> $ qemu-img measure -f qcow2 -O qcow2 V
>
> then pre-create a large enough tar file on the destination
> $ create header
> $ truncate --size=XXX dest.qcow2
> $ tar cf dest.tar header dest.qcow2
>
> (note that I explicitly did NOT use tar --sparse; dest.qcow2 is sparse and
> occupies practically no disk space, but dest.tar must NOT be sparse because
> neither tar nor NBD work well with after-the-fact resizing)
>
> then set up an NBD server on the destination that can write to the subset of
> the tar file:
>
> $ learn the offset of dest.qcow2 within dest.tar (probably a multiple of
> 10240, given default GNU tar options)
> $ qemu-nbd --image-opts
> driver=raw,offset=YYY,size=XXX,file.driver=file,file.filename=dest.tar
>
> (I'm not sure if I got the --image-opts syntax exactly correct.  nbdkit has
> more examples of learning offsets within a tar file, and may be a better
> option as a server than qemu-nbd - but the point remains: serve up the
> subset of the dest.tar file as raw bytes)
>
> finally set up qemu as an NBD client on the source:
> $ qemu-img convert -f qcow2 V -O qcow2 nbd://remote
>
> (now the client collapses the qcow2 chain onto the source, and writes that
> into a qcow2 subset of the tar file on the destination, where the
> destination was already sized large enough to hold the qcow2 image, and
> where no other temporary storage was needed other than the sparse dest.qcow2
> used in creating a large enough tar file)

You added another host into the mix, which just receives the image
content via NBD and then re-exports it as HTTP. Does this host actually
exist or is it the same host where the original images are located?

Because if you stay local for this step, there is no need to use NBD at
all:

$ ./qemu-img measure -O qcow2 ~/images/hd.img
required size: 67436544
fully allocated size: 67436544
$ ./qemu-img create -f file /tmp/test.qcow2 67436544
Formatting '/tmp/test.qcow2', fmt=file size=67436544
$ ./qemu-img convert -n --target-image-opts ~/images/hd.img driver=raw,file.driver=file,file.filename=/tmp/test.qcow2,offset=65536

hexdump verifies that this does the expected thing.

> > Exporting to a stream is possible if we're allowed to make two passes
> > over the source, but the existing QEMU code is useless for that because
> > it inherently requires seeking. I think if I had to get something like
> > this, I'd probably implement such an exporter as a script external to
> > QEMU.
>
> Wait. What are we trying to stream?  A qcow2 file, or what the guest would
> see?  If you stream just what the guest sees, then 'qemu-img map' tells you
> which portions of which source files to read in order to reconstruct data in
> the order it would be seen by the guest.

I think the requirement was that the HTTP client downloads a qcow2
image. Did I get this wrong?

> But yeah, an external exporter that takes a raw file, learns its size
> and where the holes are, and then writes a trivial qcow2 header and
> appends L1/L2/refcount tables on the end to convert the raw file into
> a slightly-larger qcow2 file, might be a valid way to create a qcow2
> file from a two-pass read.

Right. It may have to calculate the size of the L1 and refcount table
first so it can write the right offsets into the header, so maybe it's
easiest to precreate the whole metadata. But that's an implementation
detail.

Anyway, I don't think the existing QEMU code helps you with this.

Kevin