Best way to find illegal characters in a bunch of ISO-889-1 web pages?

views:

223

answers:

Best way to find illegal characters in a bunch of ISO-889-1 web pages?

I have a bunch of html files in a site that were created in the year 2000 and have been maintained to this day. We've recently began an effort to replace illegal characters with their html entities. Going page to page looking for copyright symbols and trademark tags seems like quite a chore. Do any of you know of an app that will take a bunch of html files and tell me where I need to replace illegal characters with html entities?

Any good text editor will do a file contents search for you and return a list of matches.

I do this with EditPlus. There are several editors like Notepad++, TextPad, etc that will easily help you do this.

You do not have to open the files. You just specify a path where the files are stored and the Mask (*.html) and the contents to search for "©" and the editor will come back with a list of matches and when you double click, it opens the file and brings up the matching line.

Raj More 2009-11-04 15:47:40

This is true, but I want a way to do it without opening 200+ files. Thanks for replying though.

wwilkins 2009-11-04 15:54:48

@wwilkins: answer edited

Raj More 2009-11-04 16:23:22

You could write a PHP script (if you can; if not, I'd be happy to help), but I assume you already converted some of the "special characters", so that does make the task a little harder (although I still think it's possible)...

Franz 2009-11-04 15:52:49

**EDIT**: Fixed something not making sense ;)

Franz 2009-11-04 15:57:41

Thanks for responding; Yes, I could write a program to solve this but I just have a gut feeling that somebody else already has. One line of thought was I should just send all the files into the w3.org validation utility and catch all of the encoding errors, but if a solution already exists, even _that_ is too much code.

wwilkins 2009-11-04 16:05:16

ansaurus

tags:

views:

answers:

Best way to find illegal characters in a bunch of ISO-889-1 web pages?

related questions