On 04/10/2014 03:47 PM, Zhou Zheng Sheng wrote:
on 2014/04/08 21:18, shaohef(a)linux.vnet.ibm.com wrote:
> +PATTERN = re.compile("%\([^\)]*\)[#0\-\+ ]?(\d+|\*)?(\.(\d+|\*))?"
> + "[lLh]?[cdeEfFgGioursxX%]")
> +BAD_PATTERN = re.compile("%\([^\)]*\)")
There are some problems in the regular expressions.
The first problem is that "\" is a special character in Python string.
For example, we all know "\" gives a special meaning to "\n" and
"\t".
If we want to use it in a regular expression, we have to escape itself,
for example re.compile("%\\(\\)"). This is tedious and error-prone.
Usually we just use a prefix "r" before the string literal to stop
Python from translating "\", for example, re.compile(r"%\(\)").
The second problem is that "+", "*", "(", ")"
automatically lose their
special meaning inside "[ ]", we can use them directly in "[ ]".
The third problem is that we should comment each part of a complicated
regular expression.
As a result, I suggest the following the regular expression.
# Match all conversion specifier with mapping key
PATTERN = re.compile(r'''%\([^)]+\) # Mapping key
[#0\-+]? # Conversion flags (optional)
(\d+|\*)? # Minimum field width (optional)
(\.(\d+|\*))? # Precision (optional)
[lLh]? # Length modifier (optional)
[cdeEfFgGioursxX%] # Conversion type''',
re.VERBOSE)
BAD_PATTERN = re.compile(r"%\([^)]*?\)")
Notice I drop the space from the conversion flags, and leave only "#",
"0", "-" and "+". It is fully legal for someone to write
the following,
"%(k) done" % {"k": 100}
and the result is
' 100one'
This is because Python interpreters the space between "%(k)" and
"done"
as the conversion flag, then Python eats "d" from "done" as the
conversion type. Though it's legal, it's error prone. So I think it's
better not to support this rare usage of space in the conversion flags,
and report it as error.
ACK.
--
Thanks and best regards!
Sheldon Feng(冯少合)<shaohef(a)linux.vnet.ibm.com>
IBM Linux Technology Center