[Kimchi-devel] [kimchi-devel][PATCHv3 3/6] Add lock facility for storage volume upload

Tue Mar 3 17:07:18 UTC 2015

On 03/03/2015 04:50, Royce Lv wrote:
> OK, let me explain this lock scheme.
>
> 1. Every uploading volume needs a lock (thread locking) to make sure 
> no one else is manipulating this volume at the same time to avoid 
> write competition.
> 2. Consider about we cannot spend too much bandwidth on upload, we 
> need to control concurrent uploading volumes to a reasonable number 
> (here it is assigned to be 5).
>
> If as 1. requires each volume created with a lock, but 2. we just have 
> 5 active uploading volumes, this will be a resource waste.
>
> so we have a lock pool called "upload_lock_pool" to store active 
> uploading locks. This is to say, a volume upload request comes in and 
> allocate a lock, store it in this pool, if a lock for volume "A" is 
> not used any more(all blocking PUT requests are finished and the lock 
> "ref_cnt" is 0), it will be deleted to make room in "upload_lock_pool" 
> for other volumes "PUT" requests.
>
> So scenarios are below:
> a. An uploading volume request comes in, concurrent thread cnt is less 
> than 5, lock pool does not has this lock for this volume yet. Kimchi 
> will allocate a volume lock for it to use.
> b. An uploading volume request comes in, concurrent thread cnt is less 
> than 5, lock pool already has this lock for this volume already. This 
> means others is using this lock now, then this thread just add lock 
> ref_cnt, and wait on this lock until it can get it.
> c. An uploading volume request comes in, concurrent thread cnt exceeds 
> 5, request blocks on semaphore for the pool.
>
> The reason lock pool need a lock is we need to allocate lock in and 
> delete lock from pool, this is a write operation, we need a lock 
> protect pool r/w competition.
>

Ok - Thanks for explaining!
I think I have a better idea on it now.

See my comments below.

> On 02/09/2015 09:49 AM, Aline Manera wrote:
>>
>> On 28/01/2015 11:20, lvroyce at linux.vnet.ibm.com wrote:
>>> From: Royce Lv <lvroyce at linux.vnet.ibm.com>
>>>
>>> This lock facility guarentees 5 concurrent volume upload
>>> and make sure locks are created and reclaimed when needed.
>>>
>>> Signed-off-by: Royce Lv <lvroyce at linux.vnet.ibm.com>
>>> ---
>>>   src/kimchi/i18n.py        |  1 +
>>>   src/kimchi/model/utils.py | 46 
>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 47 insertions(+)
>>>
>>> diff --git a/src/kimchi/i18n.py b/src/kimchi/i18n.py
>>> index 0a31cd5..af912bd 100644
>>> --- a/src/kimchi/i18n.py
>>> +++ b/src/kimchi/i18n.py
>>> @@ -211,6 +211,7 @@ messages = {
>>>       "KCHVOL0022E": _("Unable to access file %(url)s. Please, check 
>>> it."),
>>>       "KCHVOL0023E": _("Unable to clone storage volume '%(name)s' in 
>>> pool '%(pool)s'. Details: %(err)s"),
>>>       "KCHVOL0024E": _("Upload volume chunk index, size and total 
>>> size must be integer"),
>>
>>> +    "KCHVOL0026E": _("Inconsistent upload count"),
>>
>> What does it mean for the user?

If this message will be shown to user we need to detail more it.

>>
>>>
>>>       "KCHIFACE0001E": _("Interface %(name)s does not exist"),
>>>
>>> diff --git a/src/kimchi/model/utils.py b/src/kimchi/model/utils.py
>>> index 9896289..010966a 100644
>>> --- a/src/kimchi/model/utils.py
>>> +++ b/src/kimchi/model/utils.py
>>> @@ -19,6 +19,7 @@
>>>
>>>   import libvirt
>>>   import socket
>>> +import threading
>>>   import urlparse
>>>   from lxml import etree, objectify
>>>   from lxml.builder import E, ElementMaker
>>> @@ -28,6 +29,11 @@ from kimchi.model.featuretests import FeatureTests
>>>
>>>   KIMCHI_META_URL = "https://github.com/kimchi-project/kimchi"
>>>   KIMCHI_NAMESPACE = "kimchi"
>>> +UPLOAD_THREADS = 5  # Concurrent upload volume counts at the same time
>>> +
>>> +upload_semaphore = threading.BoundedSemaphore(value=UPLOAD_THREADS)
>>> +upload_lock_pool = dict()
>>> +pool_lock = threading.Lock()
>>>
>>>
>>>   def get_vm_name(vm_name, t_name, name_list):
>>> @@ -161,3 +167,43 @@ def get_metadata_node(dom, tag, 
>>> metadata_support, mode="current"):
>>>           if node is not None:
>>>               return etree.tostring(node)
>>>       return ""
>>> +
>>> +
>>> +class UpdateLock(object):
>>> +    def __init__(self):
>>> +        self.ref_cnt = 0
>>> +        self.lock = threading.Lock()
>>> +
>>
>>
>>> +    def get_lock(self):
>>> +        self.ref_cnt += 1
>>> +        return self.lock
>>> +
>>
>> It always return the lock without checking it has the maximum number
>> So what is it for?

The max number verification must be done here and don't return the lock 
when it is reached.

>>
>>> +    def release_lock(self):
>>> +        delete = False
>>> +        self.ref_cnt -= 1
>>> +        if (self.ref_cnt == 0):
>>> +            delete = True
>>> +        return delete
>>> +
>>
>> So first time I will use release_lock(), self.ref_cnt will be zero so 
>> self.ret_cnt -= 1 turns to -1 and it returns False.
>>
>>> +
>>> +def get_vol_update_lock(vol_path):
>>> +    # upload_semaphore controls the max upload count
>>> +    upload_semaphore.acquire()
>>> +
>>> +    # pool lock make sure lock list get/store action is atomic
>>> +    with pool_lock:
>>> +        vol_lock = upload_lock_pool.get(vol_path)
>>
>>> +        if vol_lock:
>>> +            return vol_lock.get_lock()

>>> +        if len(upload_lock_pool.keys()) > (UPLOAD_THREADS - 1):
>>> +            raise OperationFailed("KCHVOL0026E")

This max number verification must be the first thing to do when 
returning the lock.
As I suggested above we should do that in the get_lock() function.

>>
>> When the exception will be raised?
>> From my view, it will always return the lock in the "if" right above
>>
>> Is the lock for the pool or the volume?
>> It is confused to to review.
>>
>>> +        lock = upload_lock_pool[vol_path] = UpdateLock()
>>> +        return lock.get_lock()
>>> +
>>> +

>>> +def release_vol_update_lock(vol_path):
>>> +    with pool_lock:
>>> +        vol_lock = upload_lock_pool.get(vol_path)

If, for any reason, I call this function prior to get the lock it will 
fail as vol_lock will be None.
I suggest to do this kind of verification to avoid problems.

>>> +        if vol_lock.release_lock():
>>> +            upload_lock_pool.pop(vol_path, None)
>>> +    upload_semaphore.release()
>>
>