tags:

views:

100

answers:

3

Let's say I have the following string:

this is a test for the sake of testing. this is only a test. The end.

and I want to select this is a test and this is only a test. What in the world do I need to do?

The following Regex I tried yields a goofy result:

this(.*)test (I also wanted to capture what was between it)

returns this is a test for the sake of testing. this is only a test

It seems like this is probably something easy I'm forgetting.

+6  A: 

The regex is greedy meaning it will capture as many characters as it can which fall into the .* match. To make it non-greedy try:

this(.*?)test

The ? modifier will make it capture as few characters as possible in the match.

Andy E
Thanks... that's what I thought. I tested that out on a regex tester and it works. so the app (EditPlus) I'm using to do some find and replace magic apparently doesn't recognize the ? quantifier.
blesh
As per my answer, you might not get perfect results if "this" and "test" are embedded in other words. Do consider looking into it, if that might be an issue.
Platinum Azure
+2  A: 

* is a greedy quantifier. That means it matches as much as possible, i.e. what you are seeing. Depending on the specific language support for regex, you will need to find a non-greedy quantifier. Usually this is a trailing question mark, like this: *?. That means it will stop consuming letters as soon as the rest of the regex can be satisfied.

There is a good explanation of greediness here.

Ipsquiggle
+5  A: 

Andy E and Ipsquiggle have the right idea, but I want to point out that you might want to add a word boundary assertion, meaning you don't want to deal with words that have "this" or "test" in them-- only the words by themselves. In Perl and similar that's done with the "\b" marker.

As it is, this(.*?)test would match "thistles are the greatest", which you probably don't want.

The pattern you want is something like this: \bthis\b(.*?)\btest\b

Platinum Azure
+1, definitely something worth thinking about
Andy E