tags:

views:

202

answers:

8

I have simple regex

"\".*\""

for me its says select everything between " and ", but it also catches

"text") != -1 || file.indexOf(".exe"

for me its two strings, for regex its one. how can i make regex to see that its two strings?

P.S. I'm using Java.

A: 

Look into Lazy Quantification

RB
A: 

Find the way to specify non-greedy behaviour in regexp for Java.

Keltia
+6  A: 

Regular expressions are "greedy". What you want to do is exclude quotes from the middle of the match, like

"\"[^\"]*\""
Paul Tomblin
+2  A: 

Instead of . use [^\"] so that the regex can't match "

Douglas Leeder
A: 

Do you know how many characters will be, or at least a max? If so you can use \".{n,}\" where n is the max or leave out the ',' if you know the exact length.

Knoth23
+12  A: 

That's the non-greedy form:

".*?"

The *? means: "Match as little as possible", while the * alone means "Match as much as possible".

The latter basically goes on until the end of the string, giving characters back one by one so the final " can match. That's why you get everything between the first and the very last quote in your string.

// for the sake of completeness: Java would need this pattern string
"\".*?\""
Tomalak
Actually, "[^"\r\n]*" is more efficient, and better practice. See my article about using the lazy dot sparingly at http://www.regular-expressions.info/dot.html
Jan Goyvaerts
+2  A: 

You are using a greedy quantifier. You want a reluctant quantifier instead.

The Javadocs for Pattern should help: http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html

On that page you'll find this:

Greedy quantifiers
X?  X, once or not at all
X*  X, zero or more times
X+  X, one or more times
X{n}    X, exactly n times
X{n,}   X, at least n times
X{n,m}  X, at least n but not more than m times

Reluctant quantifiers
X??     X, once or not at all
X*?     X, zero or more times
X+?     X, one or more times
X{n}?   X, exactly n times
X{n,}?  X, at least n times
X{n,m}?     X, at least n but not more than m times
Steve McLeod
+1  A: 

As the other answers point out, the quantifier () is greedy and tries to match as much characters as possible. One workaround is "\"[^\"]\"", so that no " is matched in the middle. But you really want a reluctant quantifier, that tries as few characters as possible. In your case "\".*?\"" The reluctant quantifier is *?.

Read more about this here. 'Differences Among Greedy, Reluctant, and Possessive Quantifiers' may especially interesting here.

Mnementh