ansaurus

Question

regex (regular expressions) pattern NOT containing a string

Answer 1

+2 A:

Please take a look this question

S.Mark 2010-02-25 09:09:53

Yeah, I didn't find that. it started with matching a line, and i must have skipped reading the rest of it ;)

naugtur 2010-02-25 09:32:43

Answer 2

+3 A:

Did you read my answer to that question? It gives a more general solution. In your case it would look like this:

(?s)<script>(?:(?!</?script>).)*</script>

In other words: match the opening sequence; then match one character at a time, after ensuring that it's not the beginning of the closing sequence; then match the closing sequence.

Alan Moore 2010-02-25 09:22:27

I still don't understand what is going on in the parentheses and why they don't match, but I'll figure it out. thanx

naugtur 2010-02-25 09:31:56

This regex has unbalanced paranthesis. When I fix the expression, it doesn't match either of the strings.

Otto Allmendinger 2010-02-25 09:32:26

@naugtur, I fixed the missing parenthesis. It might still not work, in which case your start and end tags are probably on separate lines. Try appending `(?s)` in front of the proposed regex, which will let the DOT meta char also match lines breaks: `(?s)<script>(?:(?!</script>).)*</script>`

Bart Kiers 2010-02-25 10:30:13

Mea culpa! I should have tested it, even if I *have* posted it a dozen times before. Thanks, Bart.

Alan Moore 2010-02-25 10:39:41

No problem Alan, it's comforting to see guys like you also make these (little) mistakes! ;)

Bart Kiers 2010-02-25 10:49:54

The negative lookahead should be for `<script>` not `</script>`

Otto Allmendinger 2010-02-25 11:55:16

@Otto: actually, it should be for both: `(?!</?script>)`; that matches the innermost set of possibly nested tags. Of course, `<script>` tags shouldn't *be* nested, but apparently the OP isn't really matching those. I should have read the question more closely. Fixing it now.

Alan Moore 2010-02-25 12:38:03

In practice it's for </script> if You assume the tags have any sense. It's my example that is rather silly ;) First thing I've changed when using it was looking for !</script instead of !<script. If somebody nested a script it's better to remove all heading tags.

naugtur 2010-02-25 22:37:47

Answer 3

+1 A:

The correct expression for your problem is

"^<script>((?!<script>).)*</script>$"

This shouldn't be used for html manipulation. This doesn't address cases like

<script> foo <script type="javascript"> bar </script>

and many others. A parser is the correct solution here.

The more general expression for matching strings beginning with START, ending with END without the specific character sequence foobar in-between is:

"^START((?!foobar).)*END$"

Otto Allmendinger 2010-02-25 09:53:11

I tuned it up and the input is a bit different, so there is no need to worry about html content.

naugtur 2010-02-25 13:54:10

Answer 4

+1 A:

Use negative lookahead. Lookarounds give zero width matches - meaning that they don't consume any characters in the source string.

var s1 = "some long string with the CENSORED word";
var s2 = "some long string without that word";
console.log(s1.match(/^(?!.*CENSORED).*$/));//no match
console.log(s2.match(/^(?!.*CENSORED).*$/));//matches the whole string

The syntax for negative lookahead is (?!REGEX). It searches for the REGEX and returns false if a match is found. Positive lookahead (?=REGEX) returns true if a match is found.

Amarghosh 2010-02-25 10:24:39

ansaurus

tags:

views:

answers:

regex (regular expressions) pattern NOT containing a string

related questions