
on 2014/04/08 21:18, shaohef@linux.vnet.ibm.com wrote:
+PATTERN = re.compile("%\([^\)]*\)[#0\-\+ ]?(\d+|\*)?(\.(\d+|\*))?" + "[lLh]?[cdeEfFgGioursxX%]") +BAD_PATTERN = re.compile("%\([^\)]*\)")
There are some problems in the regular expressions. The first problem is that "\" is a special character in Python string. For example, we all know "\" gives a special meaning to "\n" and "\t". If we want to use it in a regular expression, we have to escape itself, for example re.compile("%\\(\\)"). This is tedious and error-prone. Usually we just use a prefix "r" before the string literal to stop Python from translating "\", for example, re.compile(r"%\(\)"). The second problem is that "+", "*", "(", ")" automatically lose their special meaning inside "[ ]", we can use them directly in "[ ]". The third problem is that we should comment each part of a complicated regular expression. As a result, I suggest the following the regular expression. # Match all conversion specifier with mapping key PATTERN = re.compile(r'''%\([^)]+\) # Mapping key [#0\-+]? # Conversion flags (optional) (\d+|\*)? # Minimum field width (optional) (\.(\d+|\*))? # Precision (optional) [lLh]? # Length modifier (optional) [cdeEfFgGioursxX%] # Conversion type''', re.VERBOSE) BAD_PATTERN = re.compile(r"%\([^)]*?\)") Notice I drop the space from the conversion flags, and leave only "#", "0", "-" and "+". It is fully legal for someone to write the following, "%(k) done" % {"k": 100} and the result is ' 100one' This is because Python interpreters the space between "%(k)" and "done" as the conversion flag, then Python eats "d" from "done" as the conversion type. Though it's legal, it's error prone. So I think it's better not to support this rare usage of space in the conversion flags, and report it as error. -- Thanks and best regards! Zhou Zheng Sheng / 周征晟 E-mail: zhshzhou@linux.vnet.ibm.com Telephone: 86-10-82454397