As already mentioned it would be best to export the Word contents to a parsable format (either RTF or XML would do).
There might be a specific reason for using copy-and-paste to add the material to your CMS but with copying and pasting you probably will always end up with some kind of visual check and fix round unless you create a tool that monitors the clipboard.
When copying and pasting from (a recent version) of Word the clipboard has several different formats that can be used, one of the formats is XML based.
It would be possible to create something that will cleanup the Word XML on the clipboard and "set" the text version (that you probably paste to the CMS) to the cleaned up format.
You could use the Word.interop that comes with office and standard C# clipboard functions to create this. The tool could work on top (in the background) of Word while adding content to the CMS.