tags:

views:

41

answers:

2

How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?

re.split('\W+',text)

works, but will also split on apostrophes. How do I add an exception to this rule?

Thanks!

+3  A: 

Try this:

re.split(r"[^\w']+",text)

Note the w is now lowercase, because it represents all alphanumeric characters (note that that includes the underscore). The character class [^\w'] refers to anything that's not (^) either alphanumeric (\w) or an apostrophe.

David Zaslavsky
+2  A: 
re.split(r"[^\w']+",text)

By starting a character class with ^, it inverts the definition, so [^\w'] is the inverse of [\w'], which would match an alphanumeric/underscore/apostrophe.

Amber