>>> foo = re.compile( r"(?<=\(K\()[^\)]*" )
>>> foo.findall( r"http://sampleurl.com/(K(ThinkCode))/profile/view.aspx" )
['ThinkCode']
Explanation
In regex-world, a lookbehind is a way of saying "I want to match ham
, but only if it's preceded by spam
. We write this as (?<=spam)ham
. So in this case, we want to match [^\)]*
, but only if it's preceded by \(K\(
.
Now \(K\(
is a nice, easy regex, because it's plain text! It means, match exactly the string (K(
. Notice that we have to escape the brackets (by putting \
in front of them), since otherwise the regex parser would think they were part of the regex instead of a character to match!
Finally, when you put something in square brackets in regex-world, it means "any of the characters in here is OK". If you put something inside square brackets where the first character is ^
, it means "any character not in here is OK". So [^\)]
means "any character that isn't a right-bracket", and [^\)]*
means "as many characters as possible that aren't right-brackets".
Putting it all together, (?<=\(K\()[^\)]*
means "match as many characters as you can that aren't right-brackets, preceded by the string (K(
.
Oh, one last thing. Because \
means something inside strings in Python as well as inside regexes, we use raw strings -- r"spam"
instead of just "spam"
. That tells Python to ignore the \
's.
Another way
If lookbehind is a bit complicated for you, you can also use capturing groups. The idea behind those is that the regex matches patterns, but can also remember subpatterns. That means that you don't have to worry about lookaround, because you can match the entire pattern and then just extract the subpattern inside it!
To capture a group, simply put it inside brackets: (foo)
will capture foo
as the first group. Then, use .groups()
to spit out all the groups that you matched! This is the way the other answer works.