tags:

views:

210

answers:

2

I am confused by the docs:

\%(\) A pattern enclosed by escaped parentheses. */\%(\)* */\%(* *E53* Just like \(\), but without counting it as a sub-expression. This allows using more groups and it's a little bit faster.

Can someone explain the reason for the difference? Is it because of backtracking or something else?

+7  A: 

The 'a little bit faster' comment is accurate in that there is a little less bookkeeping to be done, but the emphasis is on 'little bit' rather than 'faster'. Basically, normally, the material matched by \(pattern\) has to be kept so that you can use \3 (for the appropriate number) to refer to it in the replacement. The % notation means that vim does not have to keep track of the match - so it is doing a little less work.


@SimpleQuestions asks:

What do you mean by "keep track of the match"? How does it affect speed?

You can use escaped parentheses to 'capture' parts of the matched pattern. For example, suppose we're playing with simple C function declarations - no pointers to functions or other sources of parentheses - then we might have a substitute command such as the following:

s@\<\([a-zA-Z_][a-zA-Z_0-9]*\)(\([^)]*\))@xyz_\1(int nargs) /* \2 */@

Given an input line such as:

int simple_function(int a, char *b, double c)

The output will be:

int xyz_simple_function(int nargs) /* int a, char *b, double c */

(Why might you want to do that? I'm imagining that I need to wrap the C function simple_function so that it can be called from a language compiled to C that uses a different interface convention - it is based on Informix 4GL, to be precise. I'm using it to get an example - not because you really need to know why it was a good change to make.)

Now, in the example, the \1 and \2 in the replacement text refer to the captured parts of the regular expression - the function name (a sequence of alphanumerics starting with an alphabetic character - counting underscore as 'alphabetic') and the function argument list (everything between the parentheses, but not including the parentheses).

If I'd used the \%(....\) notation around the function identifier, then \1 would refer to the argument list and there would be no \2. Because vim would not have to keep track of one of the two captured parts of the regular expression, it has marginally less bookkeeping to do than if it had to keep track of two captured parts. But, as I said, the difference is tiny; you could probably never measure it in practice. That's why the manual says 'it allows more groups'; if you needed to group parts of your regular expression but didn't need to refer to them again, then you could work with longer regular expressions. However, by the time you have more than 9 remembered (captured) parts to the regular expression, your brain is usually doing gyrations and your fingers will make mistakes anyway - so the effort is not usually worth it. But that is, I think, the argument for using the \%(...\) notation. It matches the Perl (PCRE) notation '(?:...)' for a non-capturing regular expression.

Jonathan Leffler
What do you mean by "keep track of the match"? How does it affect speed?
Masi
+1 for the very cool regex! It takes me some time to delve into your answer and godlygeek's answer. I love it :)
Masi
I've actually checked - and it works as stated. And I checked the \%(\) version - not shown above - and that worked too. Phew! It doesn't happen every time that everything works correctly. I was confident of the concept...but it is still a good idea to check reality.
Jonathan Leffler
+2  A: 

I asked in #Vim, whether the other is faster because of backtracking. The user godlygeek answered:

No, it's faster because the thing that's matched doesn't need to be strdup'ed -- any unnecessary work is a bad thing for a syntax file.

He continued:

[The speed] depends on how big the string is. For 3 characters, it doesn't matter much, for 3000 it probably does. And keep in mind that it needs to be strdup'ed every time it matches.... including during backtracking... which means that even the 3 characters could be strdup'ed 1000 times over the course of matching a single regex. -- the syntax files are in $VIMRUNTIME/syntax

Masi