
On 04/10/2014 03:47 PM, Zhou Zheng Sheng wrote:
on 2014/04/08 21:18, shaohef@linux.vnet.ibm.com wrote:
+PATTERN = re.compile("%\([^\)]*\)[#0\-\+ ]?(\d+|\*)?(\.(\d+|\*))?" + "[lLh]?[cdeEfFgGioursxX%]") +BAD_PATTERN = re.compile("%\([^\)]*\)") There are some problems in the regular expressions.
The first problem is that "\" is a special character in Python string. For example, we all know "\" gives a special meaning to "\n" and "\t". If we want to use it in a regular expression, we have to escape itself, for example re.compile("%\\(\\)"). This is tedious and error-prone. Usually we just use a prefix "r" before the string literal to stop Python from translating "\", for example, re.compile(r"%\(\)").
The second problem is that "+", "*", "(", ")" automatically lose their special meaning inside "[ ]", we can use them directly in "[ ]".
The third problem is that we should comment each part of a complicated regular expression.
As a result, I suggest the following the regular expression.
# Match all conversion specifier with mapping key PATTERN = re.compile(r'''%\([^)]+\) # Mapping key [#0\-+]? # Conversion flags (optional) (\d+|\*)? # Minimum field width (optional) (\.(\d+|\*))? # Precision (optional) [lLh]? # Length modifier (optional) [cdeEfFgGioursxX%] # Conversion type''', re.VERBOSE) BAD_PATTERN = re.compile(r"%\([^)]*?\)")
Notice I drop the space from the conversion flags, and leave only "#", "0", "-" and "+". It is fully legal for someone to write the following, "%(k) done" % {"k": 100} and the result is ' 100one'
This is because Python interpreters the space between "%(k)" and "done" as the conversion flag, then Python eats "d" from "done" as the conversion type. Though it's legal, it's error prone. So I think it's better not to support this rare usage of space in the conversion flags, and report it as error.
ACK. -- Thanks and best regards! Sheldon Feng(冯少合)<shaohef@linux.vnet.ibm.com> IBM Linux Technology Center