tags:

views:

19

answers:

1

Hi,

I'm looking for regular expression that will filter out:

  • javascript: <script></script> and everything that is in between
  • in between javascript also contains: iframe and hostads.cn url

Thanks. I plan to use that regexp in simple bash script that will remove part of the code from the files in the directory.

A: 

Regexes aren't well suited to parsing HTML. It is hard to implement, easy to mess up, and generally not very efficient or accurate. Consider parsing each file as HTML and then explicitly searching for and removing the elements you wish to filter. Also, know that if you are doing this filtering for security reasons, there is the possibility that malicious scripts and JavaScript can still sneak through this type of filter.

SimpleCoder