I have a webapplication, developed and ready to be deployed. The web part of it was designed using M$ FrontPage. None of the developers cared about proprietary weird tags that FrontPage inserts into HTML. I don't remember tags on top of my head, but I remember seeing tags such as <webbot>
and etc. Now, my client doesn't want to see a bunch of useless tags obscuring HTML when a view source is done. This is not good from a application maintenance perspective too.
I tried googling for tools that would remove these tags from html without unknown side effects and I haven't really found anything useful. Has anyone dealt with this kind of problem before? If you did, did you use any tool for this? or Did you write your own regex based replace utility or something?
Please share your thoughts on this.
views:
137answers:
4The final solution to this problem is:
Do not use FrontPage!
I think the reason for not finding any conversion tools is that almost every developer that would care enough to filter the MS-specific tags, has moved on to another editor.
If it is important enough for your client that the source looks reasonably clean, it should definitely be important enough for your fellow developers.
For an online solution, you should check out Webmaster-toolkit's Frontpage Code Cleaner.
You can remove the FP proprietary tags. I used my own regex to remove starting and ending garbage tags: <\?xx[^>]*>
change 'xx' to the tag you are removing.
Are you breaking totally away from FrontPage? If the site is edited in page view, FP will put the tags back.
Also FP likes to control everything and writes a _vti_cnf
file for each file it uploads. It gets testy if you ftp from a program that is not FP and that file is missing (especially if you are using FP extensions).
Make sure you put in a DOCTYPE - I don't think FP does that automatically.
HTML Tidy will do a wonderful job of cleaning up just about any mess you can find.