views:

49

answers:

1

I am trying to parse an HTML file ( non strict one) using JavaScript

my output should be the same HTML file, but I need to process the internal content of any <script></script> tag. I have a method processScript(script) that does that..

I can assume that there will be no <script/> tags.

I have a pretty clear idea how to it using just split() but I wonder if I can do it better using regex?

+2  A: 

Parsing HTML with Regex is generally not the best way to do it. Look into DOM parsing instead, using methods like getElementsByName('script') and such. I'd also suggest looking at the w3schools examples on HTML DOM Objects to get you started in the right direction.

There are a lot of reasons why this is a better approach, a few of them being that 1) Javascript has this DOM Object support already, and it is much easier than using Regex and 2) The language of matching open/close tags (similar to matching parens/brackets/etc) is not a regular language.

eldarerathis
w3schools is nothing to do with the W3C.
Tim Down
Bleh, that was what I meant. Thanks for catching that.
eldarerathis
what would I do with HTML pages that aren't exactly following the XML rules? would it still work? I am running my JS script outside a browser..
How are they not "following the rules"? Do you mean that they are not valid XML/HTML? If you search SO for your question, you'll find lots of posts that explain ways to parse HTML without using regex, and possibly one that fits your specific situation.
eldarerathis