views:

163

answers:

2

I want to match all punctuations, but not "'", as in "I'm". For example, in the sentence below:

I'm a student, but I'm also working. 
 ^not match  ^match ^not           ^match

I can use "[[:punct:]]+" to match all punctuations, but I'm having hard time to exclude "'" from the matching pattern.

Of course, I could use someting like the following to express by enumeration, but it's much tedious, especially considering all those punctuations for Chinese as well. "[,.?!]"

Please suggest a more elegant solution.

Thanks in advance,

Yu

+2  A: 

If your regex flavor supports look-arounds, you could do this:

(?!')[[:punct:]]

In plain English: if there's no single quote when looking ahead, match any punctuation mark.

Bart Kiers
Emacs Lisp has its own, ahem, *unique* regex syntax, and I doubt it supports lookarounds. :-)
Ken
I scanned the question to find the regex-implementation Yu was using (which doesn't mention any), but forgot to look at the title of the post... :)
Bart Kiers
Yeah, actually I'm confused, too: the Emacs manual at gnu.org says "Character classes are not supported, so for example you would need to use ‘[0-9]’ instead of ‘[[:digit:]]’.", yet Yu says "[[:punct:]]+" works.
Ken
+1  A: 

Thanks to Bart's answer and all of your comments. Inspired by Bart's, I checked that emacs seems still not supporting look-ahead yet. But in the spirit, I coded the following:

(defun string-match-but-exclude (regexp string exclusion &optional start)

"Return index of start of first match for regexp in string, or nil, but exclude the regular express in exclusion. Matching ignores case if case-fold-search' is non-nil. If third arg start is non-nil, start search at that index in string. For index of first char beyond the match, do (match-end 0). match-end' and `match-beginning' also give indices of substrings matched by parenthesis constructs in the pattern.

You can use the function `match-string' to extract the substrings matched by the parenthesis constructions in regexp."

(let ((data nil))

(and (string-match regexp string start)

   ;; keep the match-data for recovery at the end. 

   (setq data (match-data))

   (not (string-match (concat "[" exclusion "]") (match-string 0 string)))

   (progn (set-match-data data) t) ; To recover the match data, and make sure it produces t as returned value

   (match-beginning 0)

   ))

)

So for the equivalent expression of (?!')[[:punct:]] string "'")

it would be

(string-match-but-exclude "[[:punct:]]" string "'")

This would do the job, but not as elegant. It should be a minor addition to emacs to make this a built-in support.

emacs does support character class now.

Thanks again.

Yu

Yu Shen