With Python's re module, why do the following act differently:
>>> r = re.compile(r'[][]')
>>> r.findall(r'[]')
['[', ']']
>>> r = re.compile(r'[[]]')
>>> r.findall(r'[]')
['[]']
>>> r.findall(r'][')
[]
With Python's re module, why do the following act differently:
>>> r = re.compile(r'[][]')
>>> r.findall(r'[]')
['[', ']']
>>> r = re.compile(r'[[]]')
>>> r.findall(r'[]')
['[]']
>>> r.findall(r'][')
[]
The regular expression "[[]]" matches the substring "[]". The first [ in the expression begins a character class, and the first ] ends it. There is only one character ([) in the class, and then it has to be followed by the second ]. So the expression is "any of the characters in "[", followed by a "]".
and r'[][]' forms a character class {'[',']'}, and match either '[' or ']'.
Character classes begin with a [ and end with the first ].
So the expression [][] is a character class with the characters ] and [ as character classes must not be empty: [][]
And the expression [[]] is a character class with just [ and the single character ] after that: [[]]