Word is very "dirty" with its own coding. It can have nested bold tags, empty bold tags and all kinds of nastiness depending on whether theuser used the built-in styles (heading 1, heading 2 etc.) vs changing font sizes. Anything that takes the Word doc and tries to "convert" it to HTML will inherit the same markup problems as well.
The best thing to do is record a macro in Word to perform multiple search-and-replace actions on obvious things, such as M-dashes, tabs, ellipsis etc.
Then replace paragraph breaks ^p^p with a placeholder (like ~), then replace all single breaks (^p) with a space, then replace ~ with </p>^p</p>
to generate HTML paragraphs.
Then copy the entire document, paste it into Notepad to remove any non-ascii markup, then copy and paste that into your HTML editor, and manually mark-up the 10% that's left over, like bold italics, mismatched paragraph tags etc.
Nothing will ever be as good as hand-coding, so with this technique most of the grunt work is done, and you have clean text to start from.