tags:

views:

70

answers:

3

Can some please explain the following regexp, which I found in ediff-trees.el as a specification for which files/directories to exclude from its comparison process.

"\\`\\(\\.?#.*\\|.*,v\\|.*~\\|\\.svn\\|CVS\\|_darcs\\)\\'"

Although I am somewhat familiar with regular expressions encountering this elisp string-based variant has thrown me off.

+1  A: 

Parentheses in elisp regexes need to be escaped. Backslashes in strings need to be escaped, so you end up with \\( and \\) when any sensible regex parser would just use ( and ). Don't get me wrong, I love Emacs, but having to escape parentheses in a regex was a really bad idea. The pipes and periods and backticks are also being escaped - that's why you've got this hell of double backslashes. Strip out those and you get (in regex literal form):

`(.?#.*|.*,v|.*~|\.svn|CVS|_darcs)'

See this question for more discussion on the subject of escaped parens in elisp.

Skilldrick
+4  A: 

First thing, remember that elisp's regexes have to be string-escaped, which created a lot of extra backslashes. Removing them, we get

\`\(\.?#.*\|.*,v\|.*~\|\.svn\|CVS\|_darcs\)\'

Then, \( and \) mean grouping, "foo\|bar" means "either foo, or bar".

So, piece by piece, this regexp matches: either an emacs temporary file (something starting with #, possibly preceded by a period: .?#.), or an RCS file (ending in ,v: .,v), or an emacs backup file (ending in ~: .*~), or an svn directory (.svn), or a cvs directory (CVS), or a darcs directory (_darcs).

Edit to correct: as andre-r correctly points out, the backtick \` and single quote \' basically mean "beginning and end of the string" (respectively). So this means that the regexp finds strings which match exactly one of the choices I've outlined above (i.e., the string starts, then comes one of those choices, then the string ends). I had previously said they meant quoting, I don't know what I was thinking :). Thanks andre-r!

rbp
Just some correction, \\` and \' "matches the empty string, but only at the beginning (and end respectively) of the buffer or string being matched against."
andre-r
andre-r: of course! Thank you very much, I've been using too much latex :P I'm editing my answer to reflect that.
rbp
+1  A: 

Sorry, this isn't really an answer; it's merely a comment to rbp's answer. But I can't figure out how to get the code sample to render nicely inside a comment, whereas it looks fine here in this answer.

Anyway:

I dunno about you, but I find

(rx bos (group (or (and (zero-or-one ".") "#" (zero-or-more nonl))
                   (and (zero-or-more nonl) ",v" )
                   (and (zero-or-more nonl) "~" )
                   ".svn"
                   "CVS"
                   "_darcs"
                   ))
    eos)

a lot easier to read -- and it's exactly equivalent.

offby1
Wow, I didn't know about rx.el - it looks very useful indeed and by far more readable. Thanks you very much for sharing this.
landstatic