ansaurus

Question

Answer 1

+10 A:

I don't think anything else needs to be said other than http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454.

JSBangs 2009-11-17 17:49:39

That is probably the best answer I've seen to any question!

Andy 2009-11-17 17:54:55

That’s rather a comment than an answer.

Gumbo 2009-11-17 17:58:23

Should be wiki.

Michael Myers 2009-11-17 18:01:10

This is a terrible answer. Look, I'm not trying to use regex to parse <b>XHTML</b>. I'm trying to match the <b>string</b> <script ...></script>. That is perfectly within the capabilities of regex.

JamesBrownIsDead 2009-11-17 18:03:41

JamesBrownIsDead, except that you need to care for case, whitespaces, HTML comments, strings inside embedded scripts, `<pre>` regions... Parsing HTML is a solved problem.

Svante 2009-11-17 18:19:48

Again, I'm not parsing HTML.

JamesBrownIsDead 2009-11-17 18:28:51

You *are* parsing HTML. If you weren't, there wouldn't be <script> tags in it.

Carl Smotricz 2009-11-17 19:41:22

Answer 2

+8 A:

You really would be better off using the DOM to process HTML for this reason and all sorts of others.

Andy 2009-11-17 17:50:02

Why did this get a downvote? +1

Daniel 2009-11-17 17:55:51

I'm not processing HTML.

JamesBrownIsDead 2009-11-17 18:13:01

If you're not processing HTML, why did you tag your question as HTML-related?

TrueWill 2009-11-17 18:21:30

Because it's HTML-[i]related[/i].

JamesBrownIsDead 2009-11-17 18:29:27

Answer 3

+4 A:

change your first * to *?

This is the non-greedy 'match all', so it will match the smallest set of characters before the next '>'.

TheSean 2009-11-17 17:50:45

while i agree with JS Bangs' link, im pretty sure this will fix his problem

Galen 2009-11-17 17:57:15

If someone comes to a gunfight with a dull knife, will sharpening it fix his problem?

Svante 2009-11-17 18:05:19

@Svante: yes, as long as there are no bullets :)

TheSean 2009-11-17 19:19:40

@TheSean: And I guess with "bullets", you mean things like javascript strings containing '</script>'? Basically, you are *assuming* there are no bullets. But if you value your life: Run if you see a gun pointed at you!

soulmerge 2009-11-18 08:12:27

Answer 4

A:

try to exclude any '<' from the content

 <script (.|\n)*>(.|\n|[^<])*?</script>

Pierre 2009-11-17 17:50:45

Even if it's technically not valid valid HTML, people often write code like: `<script>if(a < b) { /* code */ }</script>`

intgr 2009-11-17 17:52:50

Good thing I'm not parsing code.

JamesBrownIsDead 2009-11-17 18:15:50

You're not excluding `<` from the content with `(.|\n|[^<])*?`. The negated character class will never be reached when an occurrence of a `<` is stumbled upon since the `.` meta character already matches it. In fact, the only character will be `\r` (carriage feed) that `[^<]` is going to match.

Bart Kiers 2009-11-17 18:20:14

Answer 5

A:

<script[\s\S]*?>[\s\S]*?</script>

This matches most common situations, but it's very important to consider JS Bangs answer.

Rubens Farias 2009-11-17 17:53:09

Answer 6

+2 A:

I'll keep posting links to my previous answers until this question type has been wiped from this planet's surface (hopefully in 10 years or so): Don't user regular expressions for irregular languages like html or xml. Use a parser instead.

soulmerge 2009-11-17 17:55:04

I'm not parsing a language.

JamesBrownIsDead 2009-11-17 18:12:28

Any regular expression you create will match a closing script tag in your javascript, for example, so: Yes, you *are* parsing a language.

soulmerge 2009-11-18 04:49:54

Another approach: You are parsing XML, which *is* a language. (or a sub-set of XML - XML documents must have a single root node, which your string doesn't)

soulmerge 2009-11-18 04:51:06

Answer 7

+7 A:

Also see this week's Coding Horror: Parsing Html The Cthulhu Way, inspired by the epic answer by @bobince that @JS Bangs links to.

Bill Karwin 2009-11-17 17:56:34

+1: you beat me to it!

Steve Folly 2009-11-17 18:10:58

ansaurus

tags:

views:

answers:

RegEx to match <script> tag?

related questions