[Kimchi-devel] [kimchi-devel][PATCHv3 3/6] Add lock facility for storage volume upload

Royce Lv lvroyce at linux.vnet.ibm.com
Wed Mar 4 07:02:06 UTC 2015


On 03/03/2015 12:07 PM, Aline Manera wrote:
>
> On 03/03/2015 04:50, Royce Lv wrote:
>> OK, let me explain this lock scheme.
>>
>> 1. Every uploading volume needs a lock (thread locking) to make sure 
>> no one else is manipulating this volume at the same time to avoid 
>> write competition.
>> 2. Consider about we cannot spend too much bandwidth on upload, we 
>> need to control concurrent uploading volumes to a reasonable number 
>> (here it is assigned to be 5).
>>
>> If as 1. requires each volume created with a lock, but 2. we just 
>> have 5 active uploading volumes, this will be a resource waste.
>>
>> so we have a lock pool called "upload_lock_pool" to store active 
>> uploading locks. This is to say, a volume upload request comes in and 
>> allocate a lock, store it in this pool, if a lock for volume "A" is 
>> not used any more(all blocking PUT requests are finished and the lock 
>> "ref_cnt" is 0), it will be deleted to make room in 
>> "upload_lock_pool" for other volumes "PUT" requests.
>>
>> So scenarios are below:
>> a. An uploading volume request comes in, concurrent thread cnt is 
>> less than 5, lock pool does not has this lock for this volume yet. 
>> Kimchi will allocate a volume lock for it to use.
>> b. An uploading volume request comes in, concurrent thread cnt is 
>> less than 5, lock pool already has this lock for this volume already. 
>> This means others is using this lock now, then this thread just add 
>> lock ref_cnt, and wait on this lock until it can get it.
>> c. An uploading volume request comes in, concurrent thread cnt 
>> exceeds 5, request blocks on semaphore for the pool.
>>
>> The reason lock pool need a lock is we need to allocate lock in and 
>> delete lock from pool, this is a write operation, we need a lock 
>> protect pool r/w competition.
>>
>
> Ok - Thanks for explaining!
> I think I have a better idea on it now.
>
> See my comments below.
>
>> On 02/09/2015 09:49 AM, Aline Manera wrote:
>>>
>>> On 28/01/2015 11:20, lvroyce at linux.vnet.ibm.com wrote:
>>>> From: Royce Lv <lvroyce at linux.vnet.ibm.com>
>>>>
>>>> This lock facility guarentees 5 concurrent volume upload
>>>> and make sure locks are created and reclaimed when needed.
>>>>
>>>> Signed-off-by: Royce Lv <lvroyce at linux.vnet.ibm.com>
>>>> ---
>>>>   src/kimchi/i18n.py        |  1 +
>>>>   src/kimchi/model/utils.py | 46 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 47 insertions(+)
>>>>
>>>> diff --git a/src/kimchi/i18n.py b/src/kimchi/i18n.py
>>>> index 0a31cd5..af912bd 100644
>>>> --- a/src/kimchi/i18n.py
>>>> +++ b/src/kimchi/i18n.py
>>>> @@ -211,6 +211,7 @@ messages = {
>>>>       "KCHVOL0022E": _("Unable to access file %(url)s. Please, 
>>>> check it."),
>>>>       "KCHVOL0023E": _("Unable to clone storage volume '%(name)s' 
>>>> in pool '%(pool)s'. Details: %(err)s"),
>>>>       "KCHVOL0024E": _("Upload volume chunk index, size and total 
>>>> size must be integer"),
>>>
>>>> +    "KCHVOL0026E": _("Inconsistent upload count"),
>>>
>>> What does it mean for the user?
>
> If this message will be shown to user we need to detail more it.
It is an internal error, see comments below.
>
>>>
>>>>
>>>>       "KCHIFACE0001E": _("Interface %(name)s does not exist"),
>>>>
>>>> diff --git a/src/kimchi/model/utils.py b/src/kimchi/model/utils.py
>>>> index 9896289..010966a 100644
>>>> --- a/src/kimchi/model/utils.py
>>>> +++ b/src/kimchi/model/utils.py
>>>> @@ -19,6 +19,7 @@
>>>>
>>>>   import libvirt
>>>>   import socket
>>>> +import threading
>>>>   import urlparse
>>>>   from lxml import etree, objectify
>>>>   from lxml.builder import E, ElementMaker
>>>> @@ -28,6 +29,11 @@ from kimchi.model.featuretests import FeatureTests
>>>>
>>>>   KIMCHI_META_URL = "https://github.com/kimchi-project/kimchi"
>>>>   KIMCHI_NAMESPACE = "kimchi"
>>>> +UPLOAD_THREADS = 5  # Concurrent upload volume counts at the same 
>>>> time
>>>> +
>>>> +upload_semaphore = threading.BoundedSemaphore(value=UPLOAD_THREADS)
>>>> +upload_lock_pool = dict()
>>>> +pool_lock = threading.Lock()
>>>>
>>>>
>>>>   def get_vm_name(vm_name, t_name, name_list):
>>>> @@ -161,3 +167,43 @@ def get_metadata_node(dom, tag, 
>>>> metadata_support, mode="current"):
>>>>           if node is not None:
>>>>               return etree.tostring(node)
>>>>       return ""
>>>> +
>>>> +
>>>> +class UpdateLock(object):
>>>> +    def __init__(self):
>>>> +        self.ref_cnt = 0
>>>> +        self.lock = threading.Lock()
>>>> +
>>>
>>>
>>>> +    def get_lock(self):
>>>> +        self.ref_cnt += 1
>>>> +        return self.lock
>>>> +
>>>
>>> It always return the lock without checking it has the maximum number
>>> So what is it for?
>
> The max number verification must be done here and don't return the 
> lock when it is reached.
The upper limit is checked by upload_semaphore.acquire() before calling 
get_lock, this ref_cnt is to decide if the volume lock need to be 
reclaim or not.

>
>>>
>>>> +    def release_lock(self):
>>>> +        delete = False
>>>> +        self.ref_cnt -= 1
>>>> +        if (self.ref_cnt == 0):
>>>> +            delete = True
>>>> +        return delete
>>>> +
>>>
>>> So first time I will use release_lock(), self.ref_cnt will be zero 
>>> so self.ret_cnt -= 1 turns to -1 and it returns False.
>>>
>>>> +
>>>> +def get_vol_update_lock(vol_path):
>>>> +    # upload_semaphore controls the max upload count
>>>> +    upload_semaphore.acquire()
>>>> +
>>>> +    # pool lock make sure lock list get/store action is atomic
>>>> +    with pool_lock:
>>>> +        vol_lock = upload_lock_pool.get(vol_path)
>>>
>>>> +        if vol_lock:
>>>> +            return vol_lock.get_lock()
>
>>>> +        if len(upload_lock_pool.keys()) > (UPLOAD_THREADS - 1):
>>>> +            raise OperationFailed("KCHVOL0026E")
>
> This max number verification must be the first thing to do when 
> returning the lock.
> As I suggested above we should do that in the get_lock() function.
This to check if for some reason, the cnt in upload_semaphore is less 
than upper limit but the lock cnt is more than it. It means some one 
called "get_vol_update_lock", but forget to call release_vol_update_lock 
or un-handled exception raised before release_vol_update_lock.
>
>>>
>>> When the exception will be raised?
>>> From my view, it will always return the lock in the "if" right above
>>>
>>> Is the lock for the pool or the volume?
>>> It is confused to to review.
>>>
>>>> +        lock = upload_lock_pool[vol_path] = UpdateLock()
>>>> +        return lock.get_lock()
>>>> +
>>>> +
>
>>>> +def release_vol_update_lock(vol_path):
>>>> +    with pool_lock:
>>>> +        vol_lock = upload_lock_pool.get(vol_path)
>
> If, for any reason, I call this function prior to get the lock it will 
> fail as vol_lock will be None.
> I suggest to do this kind of verification to avoid problems.
ACK.
>
>>>> +        if vol_lock.release_lock():
>>>> +            upload_lock_pool.pop(vol_path, None)
>>>> +    upload_semaphore.release()
>>>
>>
>




More information about the Kimchi-devel mailing list