ansaurus

Question

Answer 1

A:

If you want to have special characters in a regex, you need to escape them, such as \(, \/, \\.

Matching things inside of nested parenthesis is quite a bit of a pain in regex. if that format is always the same, you could use this:

\(.*?\((.*?)\).*?\)

Basically: find a open paren, match characters until you find another open paren, group characters until I see a close paren, then make sure there are two more close paren somewhere in there.

orangeoctopus 2010-07-28 18:53:15

Answer 2

+2 A:

>>> foo = re.compile( r"(?<=\(K\()[^\)]*" )
>>> foo.findall( r"http://sampleurl.com/(K(ThinkCode))/profile/view.aspx" )
['ThinkCode']

Explanation

In regex-world, a lookbehind is a way of saying "I want to match ham, but only if it's preceded by spam. We write this as (?<=spam)ham. So in this case, we want to match [^\)]*, but only if it's preceded by \(K\(.

Now \(K\( is a nice, easy regex, because it's plain text! It means, match exactly the string (K(. Notice that we have to escape the brackets (by putting \ in front of them), since otherwise the regex parser would think they were part of the regex instead of a character to match!

Finally, when you put something in square brackets in regex-world, it means "any of the characters in here is OK". If you put something inside square brackets where the first character is ^, it means "any character not in here is OK". So [^\)] means "any character that isn't a right-bracket", and [^\)]* means "as many characters as possible that aren't right-brackets".

Putting it all together, (?<=\(K\()[^\)]* means "match as many characters as you can that aren't right-brackets, preceded by the string (K(.

Oh, one last thing. Because \ means something inside strings in Python as well as inside regexes, we use raw strings -- r"spam" instead of just "spam". That tells Python to ignore the \'s.

Another way

If lookbehind is a bit complicated for you, you can also use capturing groups. The idea behind those is that the regex matches patterns, but can also remember subpatterns. That means that you don't have to worry about lookaround, because you can match the entire pattern and then just extract the subpattern inside it!

To capture a group, simply put it inside brackets: (foo) will capture foo as the first group. Then, use .groups() to spit out all the groups that you matched! This is the way the other answer works.

katrielalex 2010-07-28 18:54:29

Could you please explain the regex, this works great. I want to learn how and what the regex does. Thanks for a quick reply (:

ThinkCode 2010-07-28 18:59:32

Certainly. Two secs.

katrielalex 2010-07-28 19:02:46

OK, so it was about ten minutes. Enjoy! =p

katrielalex 2010-07-28 19:12:02

Awesome my friend! Thanks much!

ThinkCode 2010-07-28 19:16:07

Answer 3

+1 A:

It's not too hard, especially since / isn't actually a special character in Python regular expressions. You just backslash the literal parens you want. How about this:

s = "http://sampleurl.com/(K(ThinkCode))/profile/view.aspx"
mo = re.match(r"http://sampleurl\.com/\(K\(([^)]+)\)\)/profile.view\.aspx", s);
print mo.group(1)

Note the use of r"" raw strings to preserve the backslashes in the regular expression pattern string.

Walter Mundt 2010-07-28 18:55:10

Answer 4

A:

mystr = "http://sampleurl.com/(K(ThinkCode))/profile/view.aspx"
import re
re.sub(r'^.*\((\w+)\).*',r'\1',mystr)

sleepynate 2010-07-28 18:58:49

ansaurus

tags:

views:

answers:

Please help with Python Regex

Explanation

Another way

related questions