Finding start and end tags in string using regex

views:

answers:

+2 Q:

Finding start and end tags in string using regex

I am trying to parse an HTML file ( non strict one) using JavaScript

my output should be the same HTML file, but I need to process the internal content of any <script></script> tag. I have a method processScript(script) that does that..

I can assume that there will be no <script/> tags.

I have a pretty clear idea how to it using just split() but I wonder if I can do it better using regex?

+2 A:

Parsing HTML with Regex is generally not the best way to do it. Look into DOM parsing instead, using methods like getElementsByName('script') and such. I'd also suggest looking at the w3schools examples on HTML DOM Objects to get you started in the right direction.

There are a lot of reasons why this is a better approach, a few of them being that 1) Javascript has this DOM Object support already, and it is much easier than using Regex and 2) The language of matching open/close tags (similar to matching parens/brackets/etc) is not a regular language.

eldarerathis 2010-08-04 22:37:40

w3schools is nothing to do with the W3C.

Tim Down 2010-08-04 23:01:44

Bleh, that was what I meant. Thanks for catching that.

eldarerathis 2010-08-04 23:05:21

what would I do with HTML pages that aren't exactly following the XML rules? would it still work? I am running my JS script outside a browser..

2010-08-05 05:35:49

How are they not "following the rules"? Do you mean that they are not valid XML/HTML? If you search SO for your question, you'll find lots of posts that explain ways to parse HTML without using regex, and possibly one that fits your specific situation.

eldarerathis 2010-08-05 16:50:48

ansaurus

tags:

views:

answers:

Finding start and end tags in string using regex

related questions