I need an automated process for creating docx files from xhtml source. The xhtml files contain images (<img>
elements) whose "src" attributes point to an external reference. But the docx files need to be readable without a network connection, so I need to find a way to embed the images directly into the docx package (namely, in the /media folder).
So far I've used the altChunk method (as described by Eric White) to create the .docx file. I had hoped to use the OpenXML SDK to insert the image parts into the package. But to do that I need to insert paragraphs (<p>
nodes) into the document. Unfortunately the document part contains nothing but a reference to the altChunk (stored separately in the docx package). Of course, once the docx is opened, edited and saved, the altChunk part is removed and it’s contents are embedded properly in the document.xml. But I don’t know of any way to do that programatically, so that doesn't help.
Other options I’ve considered:
- Partitioning the xhtml into segments, separated between each image, then adding each altChunk one at a time, with the appropriate image reference between each one. (Tedious but seems possible)
- Inserting the images into the media folder, and then find way to embed WordProcessingML directly into the xhtml so that the
<img>
references the packaged image file. (Questionable at best) Can anyone think of a better approach?