Re: [Kimchi-devel] [kimchi-devel][PATCHv3 3/6] Add lock facility for storage volume upload

4 Mar 2015

On 03/03/2015 12:07 PM, Aline Manera wrote:
...
On 03/03/2015 04:50, Royce Lv wrote:
...
OK, let me explain this lock scheme.
1. Every uploading volume needs a lock (thread locking) to make sure 
no one else is manipulating this volume at the same time to avoid 
write competition.
2. Consider about we cannot spend too much bandwidth on upload, we 
need to control concurrent uploading volumes to a reasonable number 
(here it is assigned to be 5).
If as 1. requires each volume created with a lock, but 2. we just 
have 5 active uploading volumes, this will be a resource waste.
so we have a lock pool called "upload_lock_pool" to store active 
uploading locks. This is to say, a volume upload request comes in and 
allocate a lock, store it in this pool, if a lock for volume "A" is 
not used any more(all blocking PUT requests are finished and the lock 
"ref_cnt" is 0), it will be deleted to make room in 
"upload_lock_pool" for other volumes "PUT" requests.
So scenarios are below:
a. An uploading volume request comes in, concurrent thread cnt is 
less than 5, lock pool does not has this lock for this volume yet. 
Kimchi will allocate a volume lock for it to use.
b. An uploading volume request comes in, concurrent thread cnt is 
less than 5, lock pool already has this lock for this volume already. 
This means others is using this lock now, then this thread just add 
lock ref_cnt, and wait on this lock until it can get it.
c. An uploading volume request comes in, concurrent thread cnt 
exceeds 5, request blocks on semaphore for the pool.
The reason lock pool need a lock is we need to allocate lock in and 
delete lock from pool, this is a write operation, we need a lock 
protect pool r/w competition.
Ok - Thanks for explaining!
I think I have a better idea on it now.
See my comments below.
...
On 02/09/2015 09:49 AM, Aline Manera wrote:
...
On 28/01/2015 11:20, lvroyce@linux.vnet.ibm.com wrote:
...
From: Royce Lv <lvroyce@linux.vnet.ibm.com>
This lock facility guarentees 5 concurrent volume upload
and make sure locks are created and reclaimed when needed.
Signed-off-by: Royce Lv <lvroyce@linux.vnet.ibm.com>
---
  src/kimchi/i18n.py        |  1 +
  src/kimchi/model/utils.py | 46 
++++++++++++++++++++++++++++++++++++++++++++++
  2 files changed, 47 insertions(+)

diff --git a/src/kimchi/i18n.py b/src/kimchi/i18n.py
index 0a31cd5..af912bd 100644
--- a/src/kimchi/i18n.py
+++ b/src/kimchi/i18n.py
@@ -211,6 +211,7 @@ messages = {
      "KCHVOL0022E": _("Unable to access file %(url)s. Please, 
check it."),
      "KCHVOL0023E": _("Unable to clone storage volume '%(name)s' 
in pool '%(pool)s'. Details: %(err)s"),
      "KCHVOL0024E": _("Upload volume chunk index, size and total 
size must be integer"),
...
+    "KCHVOL0026E": _("Inconsistent upload count"),
What does it mean for the user?
If this message will be shown to user we need to detail more it.
It is an internal error, see comments below.
...
...
...
...
"KCHIFACE0001E": _("Interface %(name)s does not exist"),
diff --git a/src/kimchi/model/utils.py b/src/kimchi/model/utils.py
index 9896289..010966a 100644
--- a/src/kimchi/model/utils.py
+++ b/src/kimchi/model/utils.py
@@ -19,6 +19,7 @@
import libvirt
  import socket
+import threading
  import urlparse
  from lxml import etree, objectify
  from lxml.builder import E, ElementMaker
@@ -28,6 +29,11 @@ from kimchi.model.featuretests import FeatureTests
KIMCHI_META_URL = "https://github.com/kimchi-project/kimchi"
  KIMCHI_NAMESPACE = "kimchi"
+UPLOAD_THREADS = 5  # Concurrent upload volume counts at the same 
time
+
+upload_semaphore = threading.BoundedSemaphore(value=UPLOAD_THREADS)
+upload_lock_pool = dict()
+pool_lock = threading.Lock()
def get_vm_name(vm_name, t_name, name_list):
@@ -161,3 +167,43 @@ def get_metadata_node(dom, tag, 
metadata_support, mode="current"):
          if node is not None:
              return etree.tostring(node)
      return ""
+
+
+class UpdateLock(object):
+    def __init__(self):
+        self.ref_cnt = 0
+        self.lock = threading.Lock()
+
...
+    def get_lock(self):
+        self.ref_cnt += 1
+        return self.lock
+
It always return the lock without checking it has the maximum number
So what is it for?
The max number verification must be done here and don't return the 
lock when it is reached.
The upper limit is checked by upload_semaphore.acquire() before calling 
get_lock, this ref_cnt is to decide if the volume lock need to be 
reclaim or not.
...
...
...
...
+    def release_lock(self):
+        delete = False
+        self.ref_cnt -= 1
+        if (self.ref_cnt == 0):
+            delete = True
+        return delete
+
So first time I will use release_lock(), self.ref_cnt will be zero 
so self.ret_cnt -= 1 turns to -1 and it returns False.
...
+
+def get_vol_update_lock(vol_path):
+    # upload_semaphore controls the max upload count
+    upload_semaphore.acquire()
+
+    # pool lock make sure lock list get/store action is atomic
+    with pool_lock:
+        vol_lock = upload_lock_pool.get(vol_path)
...
+        if vol_lock:
+            return vol_lock.get_lock()
...
...
...
+        if len(upload_lock_pool.keys()) > (UPLOAD_THREADS - 1):
+            raise OperationFailed("KCHVOL0026E")
This max number verification must be the first thing to do when 
returning the lock.
As I suggested above we should do that in the get_lock() function.
This to check if for some reason, the cnt in upload_semaphore is less 
than upper limit but the lock cnt is more than it. It means some one 
called "get_vol_update_lock", but forget to call release_vol_update_lock 
or un-handled exception raised before release_vol_update_lock.
...
...
...
When the exception will be raised?
From my view, it will always return the lock in the "if" right above
Is the lock for the pool or the volume?
It is confused to to review.
...
+        lock = upload_lock_pool[vol_path] = UpdateLock()
+        return lock.get_lock()
+
+
...
...
...
+def release_vol_update_lock(vol_path):
+    with pool_lock:
+        vol_lock = upload_lock_pool.get(vol_path)
If, for any reason, I call this function prior to get the lock it will 
fail as vol_lock will be None.
I suggest to do this kind of verification to avoid problems.
ACK.
...
...
...
...
+        if vol_lock.release_lock():
+            upload_lock_pool.pop(vol_path, None)
+    upload_semaphore.release()