[Kimchi-devel] [PATCH 3/4] Upload storage volume
Aline Manera
alinefm at linux.vnet.ibm.com
Tue May 12 19:41:37 UTC 2015
On 12/05/2015 16:31, Crístian Deives wrote:
> On 12-05-2015 13:59, Aline Manera wrote:
>>> Is there a specific reason to why "chunk_size" is not a number? It
>>> should be a number.
>>>
>>
>> It is because the data is sent in the request body. So all data is
>> encoded as unicode.
>> As you can see in the tests, we chunk_size is actually an integer,
>> but it will be encoded as unicode for the request.
>
> Still, this doesn't look good to me :-(
Sorry! But as it is on request body we need to deal with it.
>
>>> "4M"? Why? I'd say the volume capacity there is the size of
>>> COPYING.LGPL, which is about ~7 KiB, not 4 MiB. I understand that
>>> the test below will fail anyway, but the comment doesn't seem right.
>>
>> The max body size is a server configuration. It represents how much
>> data the request body can have.
>>
>> The test below is not related to the COPYING.LGPL file. It creates a
>> new file '/tmp/5m-file' which has 5M of size. (it is an empty file).
>> And then tries to upload all its content using the upload API.
>> The test will fail as the max body size is 4M and the content has 5M.
>
> OK, thanks for the explanation! I thought this test was trying to
> write more than what the storage volume can hold (KCHVOL0028E), but
> this doesn't seem to have nothing to do with the upload operation
> itself, this is a generic limitation for all REST commands.
You are correct when say it is a generic limitation for all REST
commands. But the upload API is the only one which accepts the request
body, so the test is made using it.
>
>>
>>>
>>>
>>>> + newfile = '/tmp/5m-file'
>>>> + with open(newfile, 'wb') as fd:
>>>> + fd.seek(5*1024*1024-1)
>>>> + fd.write("\0")
>>>> + rollback.prependDefer(os.remove, newfile)
>>>> +
>>>> + with open(newfile, 'rb') as fd:
>>>> + with open(newfile + '.tmp', 'wb') as tmp_fd:
>>>> + data = fd.read()
>>>> + tmp_fd.write(data)
>>>> +
>>>> + with open(newfile + '.tmp', 'rb') as tmp_fd:
>>>> + r = requests.put(url, data={'chunk_size':
>>>> len(data)},
>>>> + files={'chunk': tmp_fd},
>>>> + verify=False,
>>>> + headers=fake_auth_header())
>>>> + self.assertEquals(r.status_code, 413)
>>>
>>> Why going all the trouble of creating one file with \0s, then
>>> copying it to another file? Why not using only one?
>>
>> It is a requirement on how to build the request. The file descriptor
>> used for the request must have the data write otherwise, it will
>> assume it is empty.
>>
>>>
>>>> + # Do upload
>>>> + index = 0
>>>> + chunk_size = 2 * 1024
>>>> +
>>>> + with open(filepath, 'rb') as fd:
>>>> + while True:
>>>> + with open(filepath + '.tmp', 'wb') as tmp_fd:
>>>> + fd.seek(index*chunk_size)
>>>> + data = fd.read(chunk_size)
>>>> + tmp_fd.write(data)
>>>
>>> Again, why do you need to create a new file here with a copy of the
>>> original data?
>>>
>>
>> Same as I said before. And also, the first file descriptor will point
>> to the entire file and the second one will point to a piece of
>> content to be uploaded.
>
> So you mean the browser will also need to create one file for each
> data chunk and copy its contents locally before sending it to the server?
On UI we will use the FormData object.
Which means we got the file, reads what we need to insert in the
FormData object and send the request. And then read the next chunk and
do everything again.
> I.e. will uploading a 4 GiB ISO file copy 4 GiB of local data + send 4
> GiB of remote data?
>
> Isn't it possible to pass the original file handler seek'd to the
> desired offset, and the server only reads "chunk_size" bytes?
>
Passing the original file handler means passing a large amount of data
which will hang the browser.
More information about the Kimchi-devel
mailing list