tags:

views:

450

answers:

6
+1  Q: 

Regex greedy issue

I'm sure this one is easy but I've tried a ton of variations and still cant match what I need. The thing is being too greedy and I cant get it to stop being greedy.

Given the text:

test=this=that=more text follows

I want to just select:

test=

I've tried the following regex

(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...

Thanks all.

+9  A: 

here:

// matches "test=, test"
(\S+?)=

or

// matches "test=, test" too
(\S[^=]+)=

you should consider using the second version over the first. given your string "test=this=that=more text follows", version 1 will match test=this=that= then continue parsing to the end of the string. it will then backtrack, and find test=this=, continue to backtrack, and find test=, continue to backtrack, and settle on test= as it's final answer.

version 2 will match test= then stop. you can see the efficiency gains in larger searches like multi-line or whole document matches.

Owen
+1  A: 

You should be able to use this:

(\S+?)=(\S.*)
chills42
I coulda swore I tried all these variants. I switched to a gui regex editor for testing and it does not seem to "work right". I added the ? into my code and all is well. Thanks all!
Matt P
This regex does not work for me. Owen's does: ((\S+?)=)
Joe Lencioni
This will actually get "text" in the first group and "this=that=more text follows" in the second. Owen's would get "text=" and "text" in the two groups. I assumed that he wanted the = stripped out based on his previous tries.
chills42
+1  A: 

You probably want something like

^(\S+?=)

The caret ^ anchors the regex to the beginning of the string. The ? after the + makes the + non-greedy.

Keith Twombley
+1  A: 

You might be looking for lazy quantifiers *?, +?, ??, and {n, n}?

Glenn
+1  A: 

Lazy quantifiers work, but they also can be a performance hit because of backtracking.

Consider that what you really want is "a bunch of non-equals, an equals, and a bunch more non-equals."

([^=]+)=([^=]+)

Your examples of [^=]{1} only matches a single non-equals character.

Andy Lester
A: 

if you want only "text=", I think that a simply:

^(\w+=)

should be fine if you are shure about that the string "text=" will always start the line.

the real problem is when the string is like this:

this=that= more test= text follows

if you use the regex above the result is "this=" and if you modify the above with the reapeater qualifiers at the end, like this:

^(\w+=)*

you find a tremendous "this=that=", so I could only imagine the trivial:

[th\w+=]*test=

Bye.