i want it match only the end of every word
example:
"i am test-ing., i am test.ing-, i am_, test_ing,"
output should be:
"i am test-ing i am test.ing i am test_ing"
i want it match only the end of every word
example:
"i am test-ing., i am test.ing-, i am_, test_ing,"
output should be:
"i am test-ing i am test.ing i am test_ing"
>>> import re
>>> test = "i am test-ing., i am test.ing-, i am_, test_ing,"
>>> re.sub(r'([^\w\s]|_)+(?=\s|$)', '', test)
'i am test-ing i am test.ing i am test_ing'
Matches one or more non-alphanumeric characters ([^\w\s]|_
) followed by either a space (\s
) or the end of the string ($
). The (?= )
construct is a lookahead assertion: it makes sure that a matching space is not included in the match, so it doesn't get replaced; only the [\W_]+
gets replaced.
Okay, but why [^\w\s]|_
, you ask? The first part matches anything that's not alphanumeric or an underscore ([^\w]
) or whitespace ([^\s]
), i.e. punctuation characters. Except we do want to eliminate underscores, so we then include those with |_
.