With Python's re module, why do the following act differently:
>>> r = re.compile(r'[][]')
>>> r.findall(r'[]')
['[', ']']
>>> r = re.compile(r'[[]]')
>>> r.findall(r'[]')
['[]']
>>> r.findall(r'][')
[]
With Python's re module, why do the following act differently:
>>> r = re.compile(r'[][]')
>>> r.findall(r'[]')
['[', ']']
>>> r = re.compile(r'[[]]')
>>> r.findall(r'[]')
['[]']
>>> r.findall(r'][')
[]
The regular expression "[[]]" matches the substring "[]". The first [ in the expression begins a character class, and the first ] ends it. There is only one character ([) in the class, and then it has to be followed by the second ]. So the expression is "any of the characters in "[", followed by a "]".
and r'[][]' forms a character class {'[',']'}, and match either '[' or ']'.
Character classes begin with a [
and end with the first ]
.
So the expression [][]
is a character class with the characters ]
and [
as character classes must not be empty: [
][
]
And the expression [[]]
is a character class with just [
and the single character ]
after that: [
[
]
]