tags:

views:

157

answers:

4

I have a RegEx that is working for me but I don't know WHY it is working for me. I'll explain.

RegEx: \s*<in.*="(<?.*?>)"\s*/>\s*


Text it finds (it finds the white-space before and after the input tag):

<td class="style9">
      <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value="<?php echo $data[guarantor4]; ?>"  />    </td>
</tr>


The part I don't understand:

<in.*=" <--- As I understand it, this should only find up to the first =" as in it should only find <input name="

It actually finds: <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value=" which happened to be what I was trying to do.

What am I not understanding about this RegEx?

+6  A: 

.* is greedy. You want .*? to find up to only the first =.

eyelidlessness
.*=" will match everything between the previous match and the last =", yes.
eyelidlessness
+3  A: 

.* is greedy, so it'll find up to the last =. If you want it non-greedy, add a question mark, like so: .*?

Stavros Korokithakis
+7  A: 

You appear to be using 'greedy' matching.

Greedy matching says "eat as much as possible to make this work"

try with

<in[^=]*=

for starters, that will stop it matching the "=" as part of ".*"

but in future, you might want to read up on the

.*?

and

.+?

notation, which stops at the first possible condtion that matches instead of the last.

The use of 'non-greedy' syntax would be better if you were trying to only stop when you saw TWO characters,

ie:

<in.*?=id

which would stop on the first '=id' regardless of whether or not there are '=' in between.

Kent Fredric
+2  A: 

As I understand it, this should only find up to the first =" as in it should only find <input name="

You don't say what language you're writing in, but almost all regular expression systems are "greedy matchers" - that is, they match the longest possible substring of the input. In your case, that means everything everying from the start of the input tag to the last equal-quote sequence.

Most regex systems have a way to specify that the patter only match the shortest possible substring, not the longest - "non-greedy matching".

As an aside, don't assume the first parameter will be name= unless you have full control over the construction of the input. Both HTML and XML allow attributes to be specified in any order.

Ross Patterson