views:

36

answers:

2

Hi,

I need a way to strip the uncompatible characters in the .doc/.docx files my clients send me for content. Right now I've been relying on find/replace which is really annoying.

How do you take your .doc/.docx and convert them into html friend text?

thank you!


Here are some of the special characters that are messing up...

” - “ - ’

There are some other characters that I cannot seem to filter through. Including some kind of space character (maybe tab)


Another big hang up is lists... Converting copy that contains a list is nothing short of tedious.

A: 

If you only need to do this once or twice, you could send a copy of the documents to a gmail account, which supports the ability to view word documents as html. When you have an attachment in gmail, click view, and then at the top of the new page, click "plain HTML." This is a bit cumbersome if you need to do it very often, but it works well in a pinch.

Wade Tandy
A: 

Are you talking about tags when the export as a web page from Word or when you open up in word copy and then paste somewhere into a web (say textarea) field?

If copy and paste try pasting to an interim text editor, notepad etc, that will usually get rid of the extraneous garbage if it is an export then Dreamweaver (obviously you may not have that though) used to have a "Clean up word markup" utility.

There is also a free utility from MS that will -- "The Office HTML Filter is a tool you can use to remove Office-specific markup tags embedded in Office 2000 documents saved as HTML." OK Office 2000 is a bit out of date but a) it might work and b) it's a starting point to locate something else.

PurplePilot