views:

205

answers:

3

I want to copy and paste text from a Office 2007 document (docx) into a textarea. On Window, using Firefox 3, there is additional jiberish that gets put into the field:

...Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Normal 
0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Normal 0 false 
false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 <!--[if gte mso 9]>...

Seems to be the style information and conditional comments from the newer document structure. Any ideas on how to parse this out, or prevent this from happening? Possibilities are Javascript on the front side, or Java on the back side.

A: 

I find the easiest way to eliminate this random jibberish is to copy the text you want, paste it into notepad or a similar plaintext editor, copy it from notepad, then paste it into the field.

Also, running it through a script or application that strips out the "smart" quotes and em/en dashes isn't a bad idea either.

Lincoln Johnson
This is not what the question is. You can't expect the user of the web app to do the copy-pasting through Notepad.
Alex
A: 

There are third party tools that will strip out the erraneous 'Microsoft creep' stuff. You can even register some on the server and use them in your own code ('clean the crap' button anyone?)

Kolten
Which third party tool(s) are you referring to?
edosoft
+1  A: 

Similar to Lincoln's idea, you can use PureText to automate the process. Basically, you press its hotkey instead of Ctrl+V (I have mine set to Win+V), and it pastes the plain text version of whatever is on your clipboard. I'm not sure if that will remove the extra data that Office has added, but it's worth a try.

Andy