tags:

views:

153

answers:

3

With Python's re module, why do the following act differently:

>>> r = re.compile(r'[][]')
>>> r.findall(r'[]')
['[', ']']
>>> r = re.compile(r'[[]]')
>>> r.findall(r'[]')
['[]']
>>> r.findall(r'][')
[]
+16  A: 

The regular expression "[[]]" matches the substring "[]". The first [ in the expression begins a character class, and the first ] ends it. There is only one character ([) in the class, and then it has to be followed by the second ]. So the expression is "any of the characters in "[", followed by a "]".

Kieron
thanks, it really helps.
gray
A: 

and r'[][]' forms a character class {'[',']'}, and match either '[' or ']'.

gray
Nope, that defines two empty character classes, r'[a][b]' for example wont match a][b, it'll match a, then b
dbr
You’re right. r'[][]' form a character class containing ']' and '[' as character classes must not be empty.
Gumbo
+4  A: 

Character classes begin with a [ and end with the first ].

So the expression [][] is a character class with the characters ] and [ as character classes must not be empty: [][]
And the expression [[]] is a character class with just [ and the single character ] after that: [[]]

Gumbo