tags:

views:

532

answers:

3

How can I replace a string/word in a Word Document via ASP.NET? I just need to replace a couple words in the document, so I would like to stay AWAY from 3rd party plugins & interop. I would like to do this by opening the file and replacing the text.

The following attempts were made:

I created a StreamReader and Writer to read the file but I think that I am reading and writing in the wrong format. I think that Word Documents are stored in binary?? If word documents are binary, how would I read and write the file in binary?

    Dim template As String = Request.MapPath("documentName.doc")
    If File.Exists(template) Then
        Dim sr As New StreamReader(template)
        Dim content As String = sr.ReadToEnd()
        sr.Close()
        Dim sw As New StreamWriter(template)        
        content = content.Replace("@ T O D A Y S D A T E", Date.Now.ToString("MM/dd/yyyy"))
        sw.Write(content)
        sw.Close()
    Else
A: 

If word documents are binary, how would I read and write the file in binary?

They are, and that's why you should use a third party library to program against them.

I would like to stay AWAY from 3rd party plugins & interop

This requirement makes the task extremely hard. If your documents are in the "old Word format" (.doc), I will almost say that you are out of luck. If you can use Word 2007 documents (.docx) instead, you should be able to solve the problem by unzipping the file (it's essentially a ZIP archive), do search/replace in contained XML files and zip the document up again.

See also: Generating a Word Document with C#

Jørn Schou-Rode
Thank you for your input. Unfortunately I am working with XP Word documents and cannot upgrade them to go the XML route (company can't upgrade all their XP Office to newer version). I know for a fact that this can be done and I created a program to do something similar to this back in the day using VB3.
jreedinc
@jReedInc, there exists a plugin to make Word/XP read and write DOCX.
Henk Holterman
@jreedinc: Of course it can be done, but it might just be very very hard. How much do you charge per hour? How much does Aspose Words cost? :)
Jørn Schou-Rode
$899 for Aspose, THEN learning the syntax and dealing with a another animal (even more hours). Jørn Schou-Rode, have you done this before? I really do not think that it's hard at all. Like I said before, A LONG TIME AGO I wrote a simple script to edit a string in a doc. Maybe I should just dig through old hard drives...
jreedinc
A: 

You could perform Word automation on the server to easily do it, but that route is fraught with danger. Automation is not designed to run server side and you will find it regularly hangs when Word pop's up a prompt or confirmation box waiting for input that nobody can see.

You have to make a trade off, use Word automation and accept it may hang pretty regularly (anything from daily to weekly), or buy a third party solution. I use Aspose and it has solved a lot of problems.

Craig
+1  A: 

Word binary format is proprietary to Microsoft. The specification to read the binary format is complex and will take you ages to learn about the document structure and the internal bit and byte structure. I really dont think you will save yourself anytime going down this path, so consider the below:

  • Use Open XML
  • Automate Word
  • Use third party library like Aspose
  • Use RTF rather than Doc. You can then look for specific RTF tag with your text and replace it with another set of RTF text block. This is probably the simplest for what you want to do if RTF is an acceptable format.

Personal experience, automating Word isn't as bad as it sounds. It is really not suitable for server high volume environment, but for smaller load, it works well of course if you write your code well to manage the application object and handling exceptions.

EDITED: Corrected about my initial NDA comment mentioned. This was the case when I worked on this back in 2005/6 and didnt realize Microsoft had decided to publish that in the recent year.

Fadrian Sudaman
I will use the RTF format. Thank you all for input.
jreedinc