tags:

views:

182

answers:

4

I want a user to be able to upload a word document and my program then parses the document into separate word documents. The problem is that the splitting will need to be manual as all the word documents are not formatted the same way. My initial thought is before the user uploads, the user tags the sections with a beginning and end tag (of some sort maybe a comment) that my program can then parse and split the document into separate documents. (This also needs to work for .doc and .docx so a common solution is desirable)

Ex. Input:

Doc1

Chapter 1

Blah Blah Blah

Chapter 2

Blah blah

/end Doc1

Ex. Output:

Doc1

Chapter 1

Blah Blah Blah

/end Doc1

Doc 2

Chapter 2

Blah blah

/end Doc2

Any ideas? I have been struggling with this for awhile

A: 

I'd say your best bet is to investigate the VSTO or VBA macros to accomplish this. Both will give you full access to the object model in whatever version the document is.

No Refunds No Returns
+3  A: 

What you want to do is non-trivial! I have done my fair share of document manipulation, that said if you are working with a DOCX these days it is not too bad due to the supporting libraries, see:

http://openxmldeveloper.org/

Older version get more difficult, you would need to source a library for that, or as suggested use macros.

Is the "program" a web site? If so make sure you do not use COM interop!

Paul Kohler
yes its a website
Holograham
A: 

Something that may help is HTML Transit. It's incredibly old software and incredibly expensive, and from an initial search, it may not be supported anymore. But, it did have the ability to take one Word document, and split it up into smaller pieces (of course, it converted it to HTML as well). Something to look into, maybe. Google "HTML Transit" for more research and free demo.

bryanjonker
A: 

I've had great success with Aspose.Words for document manipulation and generation.

John L.