I'm using this regex to find <script> tags:
<script (.|\n)*>(.|\n)*?</script>
The problem is, it matches the ENTIRE string below, not just each tag separately:
<script src="crap2.js"></script><script src="crap2.js"></script>
I'm using this regex to find <script> tags:
<script (.|\n)*>(.|\n)*?</script>
The problem is, it matches the ENTIRE string below, not just each tag separately:
<script src="crap2.js"></script><script src="crap2.js"></script>
I don't think anything else needs to be said other than http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454.
You really would be better off using the DOM to process HTML for this reason and all sorts of others.
change your first * to *?
This is the non-greedy 'match all', so it will match the smallest set of characters before the next '>'.
try to exclude any '<' from the content
<script (.|\n)*>(.|\n|[^<])*?</script>
<script[\s\S]*?>[\s\S]*?</script>
This matches most common situations, but it's very important to consider JS Bangs answer.
I'll keep posting links to my previous answers until this question type has been wiped from this planet's surface (hopefully in 10 years or so): Don't user regular expressions for irregular languages like html or xml. Use a parser instead.
Also see this week's Coding Horror: Parsing Html The Cthulhu Way, inspired by the epic answer by @bobince that @JS Bangs links to.