tags:

views:

520

answers:

3

Whenever I see a problem that would be shared by others, with a solution that would be fun to implement, it usually turns out to have been solved already. I think it's best to stop myself and do a search before I dive into the coding.

Here's the situation: You can copy and paste sections of an office document into the visual studio HTML editor. The problem is, it creates HTML text that looks like this:

<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
                <td style="border:solid windowtext 1.0pt;mso-border-alt:solid windowtext .5pt;
   padding:0cm 5.4pt 0cm 5.4pt" valign="top">
                    <p align="left" class="MsoNormal" 
                        style="text-align:left;tab-stops:center 216.0pt right 432.0pt">
                        <b style="mso-bidi-font-weight:normal"><span lang="EN-US">ID<o:p></o:p></span></b></p>
                </td>
                <td style="border:solid windowtext 1.0pt;border-left:none;
   mso-border-left-alt:solid windowtext .5pt;mso-border-alt:solid windowtext .5pt;
   padding:0cm 5.4pt 0cm 5.4pt" valign="top">

Fine for a machine, but this is not really human-readable. I bet this could be cleaned up by finding the repeating styles and creating CSS classes out of them. A computer program could do that really easy.

I could run this program, and then I would have nice-looking, easy to maintain HTML that looks just like my Word document.

(Yes, I know I can just edit my Word document and then copy-and-paste it into HTML, or just save it as an HTML file. But it just wouldn't be the same as hand-editing it after the fact).

Anyway, does anyone know of a program that does this?


(later edit) I discovered the question I asked is a duplicate of this one.

+5  A: 

HTML Tidy does this! It also integrates with common text editors (such as Notepad++ or UltraEdit) and provides the option to clean up Office web markup. You will need to set the word-2000 boolean flag to true

Additionally, Jeff Atwood has blogged about this problem and presented his own C# 2.0 solution in this article.

Cerebrus
+3  A: 

I would try using HTML Tidy: http://tidy.sourceforge.net/ , another option is pasting your word document into TinyMCE and then saving your HTML.

Michal Rogozinski
A: 

You might want to seriously consider Paste as Plain Text as your simplifier tool. Weigh up how long it would take you to reapply the markup... you might find it's less painful than you think.

CurtainDog