ansaurus

Question

Answer 1

+1 A:

Without doing detailed analysis, I'd guess that it's faster because of the question marks. These allow the regular expression to be "lazy," and stop as soon as they have enough to match, rather than checking if the rest of the input matches.

I'm not entirely happy with this answer though, because this mostly applies to question marks after * or +. If I were more familiar with the input, it might make more sense to me.

(Also, for the code formatting, you can select all of your code and press Ctrl-K to have it add the four spaces required.)

Ryan Fox 2008-08-07 15:57:16

Answer 2

A:

@Ryan Fox: The input is plain text, with codes indicating when to switch modes for the text following, eg:

<ESC>[1;32mThis is bright green<ESC>[0mThis is the default colour

I'm also a bit confused myself as even if the expression is 'lazy', I'm doing a find and replace operation, so it has to continue until it gets the 'best' match regardless so it knows what to replace no?

Nidonocu 2008-08-07 16:05:24

Answer 3

A:

Nope, I'm pretty sure a lazy expression will stop as soon as it can.

Ryan Fox 2008-08-07 16:07:57

Answer 4

+3 A:

The reason why #1 is slower is that [\d;]+ is a greedy quantifier. Using +? or *? is going to do lazy quantifing. See MSDN - Quantifiers for more info.

You may want to try:

"(\e\[(\d{1,2};)*?[mz]?)?"

That may be faster for you.

Jon Works 2008-08-07 16:24:02

Answer 5

A:

@Jon Works: Afraid that reg expression doesn't seem to work with find and replace, it stops to early and the #;#m part is not matched and gets left behind.

@modesty: I'll try that in future posts.

@Justin Standard: Thanks for fixing my post. Not sure what you did different to me, but it looks fine now at least.

Nidonocu 2008-08-09 09:39:50

Answer 6

+2 A:

Do you really want to do run the regexp twice? Without having checked (bad me) I would have thought that this would work well:

public static string StripStringFormating(string formattedString) {
return rTest.Replace(formattedString, string.Empty); }

If it does, you should see it run ~twice as fast...

Oskar 2008-09-09 21:36:28

Thinking about it now, that does make sense, running a regexp on a line with no matches is the same as running a check first to see if it matches at all. You get the same result!

Nidonocu 2008-09-13 07:17:10

Answer 7

+1 A:

I'm not sure if this will help with what you are working on, but long ago I wrote a regular expression to parse ANSI graphic files.

(?s)(?:\e\[(?:(\d+);?)*([A-Za-z])(.*?))(?=\e\[|\z)

It will return each code and the text associated with it. I will try to provide an example in a few minutes.

Input string:

<ESC>[1;32mThis is bright green.<ESC>[0m This is the default color.

Results:

[ [1, 32], m, This is bright green.]
[0, m, This is the default color.]

lordscarlet 2008-09-17 15:31:46

Thanks for this reply, I'll keep this expression on hand when I no doubt go back and review the code later for possible improvements. :) As I've discovered, 'larger' regexps tend to be faster than smaller ones.

Nidonocu 2008-09-17 21:22:00

I am also interested in anything you're doing with ANSI codes in .NET. I am currently redoing my site in rails rather than .NET, but I am always curious to see how people are able to leverage .NET for interpreting ANSI.

lordscarlet 2008-09-17 21:25:02

ansaurus

tags:

views:

answers:

Why is this regular expression faster?

related questions