How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?
re.split('\W+',text)
works, but will also split on apostrophes. How do I add an exception to this rule?
Thanks!
How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?
re.split('\W+',text)
works, but will also split on apostrophes. How do I add an exception to this rule?
Thanks!
Try this:
re.split(r"[^\w']+",text)
Note the w
is now lowercase, because it represents all alphanumeric characters (note that that includes the underscore). The character class [^\w']
refers to anything that's not (^
) either alphanumeric (\w
) or an apostrophe.
re.split(r"[^\w']+",text)
By starting a character class with ^
, it inverts the definition, so [^\w']
is the inverse of [\w']
, which would match an alphanumeric/underscore/apostrophe.