Re: [Kimchi-devel] [PATCH 3/4] Upload storage volume

12 May 2015


      On 12/05/2015 13:38, Crístian Deives wrote:
...
On 11-05-2015 15:34, Aline Manera wrote:
...
+                "chunk_size": {
+                    "description": "Chunk size of uploaded storage 
volume",
+                    "type": "string",
+                    "error": "KCHVOL0024E",
+                    "required": true
+                }
Is there a specific reason to why "chunk_size" is not a number? It 
should be a number.
It is because the data is sent in the request body. So all data is 
encoded as unicode.
As you can see in the tests, we chunk_size is actually an integer, but 
it will be encoded as unicode for the request.
...
...
+
+        lock = vol_data['lock']
+        offset = vol_data['offset']
+        if (offset + chunk_size) > vol_capacity:
+            raise OperationFailed("KCHVOL0028E")
+
+        with lock:
+            self.doUpload(vol, offset, chunk_data, chunk_size)
+
+        vol_data['offset'] += chunk_size
+        if vol_data['offset'] == vol_capacity:
+            del upload_volumes[vol_path]
+
I think everything from "lock = vol_data['lock']" down to "del 
upload_volumes[vol_path]" should be enclosed by the lock, not just the 
call to "doUpload". Suppose two PUT commands are sent one right after 
the other: both of them will use "offset" with the value it had before 
both calls, so the second command will actually write over what the 
first command has written. You're locking the access to "doUpload", 
but "offset" (which controls where "doUpload" will write) isn't locked 
from concurrent access.
Agree. I will change it and send V2.
...
...
+        # Refresh to make sure volume can be found in following lookup
+        StoragePoolModel.get_storagepool(pool, self.conn).refresh(0)
There's no need to call "refresh" here. The volume already existed 
before (and after...) the call to "update". That function is only 
needed when you add a file manually to a storage pool directory and 
you want libvirt to see it. This is not the case here.
ACK.
...
...
+
+            # Create a file with 5M to upload
+            # Max body size is set to 4M so the upload will fail 
with 413
"4M"? Why? I'd say the volume capacity there is the size of 
COPYING.LGPL, which is about ~7 KiB, not 4 MiB. I understand that the 
test below will fail anyway, but the comment doesn't seem right.
The max body size is a server configuration. It represents how much data 
the request body can have.

The test below is not related to the COPYING.LGPL file. It creates a new 
file '/tmp/5m-file' which has 5M of size. (it is an empty file).
And then tries to upload all its content using the upload API.
The test will fail as the max body size is 4M and the content has 5M.
...
...
+            newfile = '/tmp/5m-file'
+            with open(newfile, 'wb') as fd:
+                fd.seek(5*1024*1024-1)
+                fd.write("\0")
+            rollback.prependDefer(os.remove, newfile)
+
+            with open(newfile, 'rb') as fd:
+                with open(newfile + '.tmp', 'wb') as tmp_fd:
+                    data = fd.read()
+                    tmp_fd.write(data)
+
+                with open(newfile + '.tmp', 'rb') as tmp_fd:
+                    r = requests.put(url, data={'chunk_size': 
len(data)},
+                                     files={'chunk': tmp_fd},
+                                     verify=False,
+ headers=fake_auth_header())
+                    self.assertEquals(r.status_code, 413)
Why going all the trouble of creating one file with \0s, then copying 
it to another file? Why not using only one?
It is a requirement on how to build the request. The file descriptor 
used for the request must have the data write otherwise, it will assume 
it is empty.
...
...
+            # Do upload
+            index = 0
+            chunk_size = 2 * 1024
+
+            with open(filepath, 'rb') as fd:
+                while True:
+                    with open(filepath + '.tmp', 'wb') as tmp_fd:
+                        fd.seek(index*chunk_size)
+                        data = fd.read(chunk_size)
+                        tmp_fd.write(data)
Again, why do you need to create a new file here with a copy of the 
original data?
Same as I said before. And also, the first file descriptor will point to 
the entire file and the second one will point to a piece of content to 
be uploaded.
...
...
+            with open(filepath) as fd:
+                content = fd.read()
Why don't you use the existing loop above to set the variable 
"content", one chunk at a time? The code has already read the file 
once, its contents haven't changed. I feel that there's too much 
unnecessary I/O going on here...
ACK.