views:

911

answers:

5

I have to perform a large number of replacements in some documents, and the thing is, I would like to be able to automate that task. Some of the documents contain common strings, and this would be pretty useful if it could be automated. From what I read so far, COM could be one way of doing this, but I don't know if text replacement is supported. I'd like to be able to perform this task in python? Is it possible? Could you post a code snippet showing how to access the document's text?

Thanks!

+2  A: 

Checkout this link: http://python.net/crew/pirx/spam7/

The links on the left side point to the documentation.

You can generalize this using the object model, which is found here:

http://msdn.microsoft.com/en-us/library/kw65a0we(VS.80).aspx

Christopher
Thanks for the answer!
Geo
+2  A: 

If this mailing list post is right, accessing the document's text is a simple as:

MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0 
MSWord.Documents.Open(filename)
docText = MSWord.Documents[0].Content

Also see How to: Search for and Replace Text in Documents. The examples use VB and C#, but the basics should apply to Python too.

Matthew Flaschen
Thanks for the answer!
Geo
+7  A: 

See if this gives you a start on word automation using python.

Once you open a document, you could do the following.
After the following code, you can Close the document & open another.

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
    .Text = "test"
    .Replacement.Text = "test2"
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchKashida = False
    .MatchDiacritics = False
    .MatchAlefHamza = False
    .MatchControl = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

The above code replaces the text "test" with "test2" and does a "replace all".
You can turn other options true/false depending on what you need.

The simple way to learn this is to create a macro with actions you want to take, see the generated code & use it in your own example (with/without modified parameters).

EDIT: After looking at some code by Matthew, you could do the following

MSWord.Documents.Open(filename)
Selection = MSWord.Selection

And then translate the above VB code to Python.
Note: The following VB code is shorthand way of assigning property without using the long syntax.

(VB)

With Selection.Find
    .Text = "test"
    .Replacement.Text = "test2"
End With

Python

find = Selection.Find
find.Text = "test"
find.Replacement.Text = "test2"

Pardon my python knowledge. But, I hope you get the idea to move forward.
Remember to do a Save & Close on Document, after you are done with the find/replace operation.

In the end, you could call MSWord.Quit (to release Word object from memory).

shahkalpesh
Thanks for the answer!
Geo
+2  A: 

I like the answers so far;
here's a tested example (slightly modified from here)
that replaces all occurrences of a string in a Word document:

import win32com.client

app = win32com.client.Dispatch("Word.Application")
app.Visible = 0
app.DisplayAlerts = 0

def search_replace_all(file, find_str, replace_str):
    ''' replace all occurrences in doc '''
    wdFindContinue = 1
    wdReplaceAll = 2

    app.Documents.Open(file)
    # expression.Execute(FindText, MatchCase, MatchWholeWord,
    #   MatchWildcards, MatchSoundsLike, MatchAllWordForms, Forward, 
    #   Wrap, Format, ReplaceWith, Replace)
    app.Selection.Find.Execute(find_str, False, False, False, False, False, \
        True, wdFindContinue, False, replace_str, wdReplaceAll)
    app.ActiveDocument.Close(SaveChanges=True)

f = 'c:/path/to/my/word.doc'
search_replace_all(f, 'string_to_be_replaced', 'replacement_str')

app.Quit()
Adam Bernier
Thanks for the answer!
Geo
+1  A: 
Ra