views:

190

answers:

3

Hi folks,

I use

(?<!value=\")##(.*)##

to match string like ##MyString## that's not in the form of:

<input type="text" value="##MyString##">

This works for the above form, but not for this: (It still matches, should not match)

<input type="text" value="Here is my ##MyString## coming..">

I tried:

(?<!value=\").*##(.*)##

with no luck. Any suggestions will be deeply appreciated.

Edit: I am using PHP preg_match() function

A: 

here is a starting point at least, it works for the given examples.

(?<!<[^>]*value="[^>"]*)##(.*)##
Paul Creasey
Warning: preg_match(): Compilation failed: lookbehind assertion is not fixed length
Mark Byers
It fails with "Compilation failed: lookbehind assertion is not fixed length at offset 23" I am using PHP preg_match function
Dali
@mark, I think .net is the only engine to support this kind of lookbehind now you mention it! I concede that this problem is actually pretty challenging in any other language, my point above wasn't aimed specifically at you, you are in fact probably right in this case, but i still say that alot of people jump on the bandwangon without understanding.
Paul Creasey
A: 

@OP, you can do it simply without regex.

$text = '<input type="text" value="   ##MyString##">';
$text = str_replace(" ","",$text);
if (strpos($text,'value="##' ) !==FALSE ){
    $s = explode('value="##',$text);
    $t = explode("##",$s[1]);
    print "$t[0]\n";
}
ghostdog74
I believe there's too much overhead in this. When it comes to replace, let's say 50 strings, it will consume too much resource. And it is not always whitespaces before ##MyString##, it may be anything
Dali
if its anything but spaces before `##Mystring##` , then it shouldn't match, as per your criteria correct? As for overheads, there's no way to tell unless you do some benchmarks.
ghostdog74
+1  A: 

This is not perfect (that's what HTML parsers are for), but it will work for the vast majority of HTML files:

(^|>)[^<>]*##[^#]*##[^<>]*(<|$)

The idea is simple. You're looking for a string that is outside of tags. To be outside of tags, the closest preceding angled bracket to it must be closing (or there's no bracket at all), and the closest following one must be opening (or none). This assumes that angled brackets are not used in attribute values.

If you actually care that the attribute name be "value", then you can match for:

value\s*=\s*"([^\"]|\\\")*##[^#]*##([^\"]|\\\")*\"

... and then simply negate the match (!preg_match(...)).

Max Shawabkeh
thank you this is very close
Dali