Is it possible to extract all of the VBA code from a Word 2007 "docm" document using the API?

I have found how to insert VBA code at runtime, and how to delete all VBA code, but not pull the actual code out into a stream or string that I can store (and insert into other documents in the future).

Any tips or resources would be appreciated.

Edit: thanks to everyone, Aardvark's answer was exactly what I was looking for. I have converted his code to C#, and was able to call it from a class library using Visual Studio 2008.

using Microsoft.Office.Interop.Word;
using Microsoft.Vbe.Interop;


public List<string> GetMacrosFromDoc()
    Document doc = GetWordDoc(@"C:\Temp\test.docm");

    List<string> macros = new List<string>();

    VBProject prj;
    CodeModule code;
    string composedFile;

    prj = doc.VBProject;
    foreach (VBComponent comp in prj.VBComponents)
        code = comp.CodeModule;

        // Put the name of the code module at the top
        composedFile = comp.Name + Environment.NewLine;

        // Loop through the (1-indexed) lines
        for (int i = 0; i < code.CountOfLines; i++)
            composedFile += code.get_Lines(i + 1, 1) + Environment.NewLine;

        // Add the macro to the list


    return macros;
+5  A: 

You'll have to add a reference to Microsoft Visual Basic for Applications Extensibility 5.3 (or whatever version you have). I have the VBA SDK and such on my box - so this may not be exactly what office ships with.

Also you have to enable access to the VBA Object Model specifically - see the "Trust Center" in Word options. This is in addition to all the other Macro security settings Office provides.

This example will extract code from the current document it lives in - it itself is a VBA macro (and will display itself and any other code as well). There is also a Application.vbe.VBProjects collection to access other documents. While I've never done it, I assume an external application could get to open files using this VBProjects collection as well. Security is funny with this stuff so it may be tricky.

I also wonder what the docm file format is now - XML like the docx? Would that be a better approach?

Sub GetCode()

    Dim prj As VBProject
    Dim comp As VBComponent
    Dim code As CodeModule
    Dim composedFile As String
    Dim i As Integer

    Set prj = ThisDocument.VBProject
        For Each comp In prj.VBComponents
            Set code = comp.CodeModule

            composedFile = comp.Name & vbNewLine

            For i = 1 To code.CountOfLines
                composedFile = composedFile & code.Lines(i, 1) & vbNewLine

            MsgBox composedFile

End Sub
This is awesome, thanks! I converted your code to C#, and am able to call it from a WPF application and print out the macro code in a listbox. I will post the C# version of the code in my edited answer.
Guy Starbuck
If you have forms in your project, this isn't ideal. It doesn't export the control definition part of the form.
+7  A: 

You could export the code to files and then read them back in.

I've been using the code below to help me keep some Excel macros under source control (using Subversion & TortoiseSVN). It basically exports all the code to text files any time I save with the VBA editor open. I put the text files in subversion so that I can do diffs. You should be able to adapt/steal some of this to work in Word.

The registry check in CanAccessVBOM() corresponds to the "Trust access to Visual Basic Project" in the security setting.

Sub ExportCode()

    If Not CanAccessVBOM Then Exit Sub ' Exit if access to VB object model is not allowed
    If (ThisWorkbook.VBProject.VBE.ActiveWindow Is Nothing) Then
        Exit Sub ' Exit if VBA window is not open
    End If
    Dim comp As VBComponent
    Dim codeFolder As String

    codeFolder = CombinePaths(GetWorkbookPath, "Code")
    On Error Resume Next
    MkDir codeFolder
    On Error GoTo 0
    Dim FileName As String

    For Each comp In ThisWorkbook.VBProject.VBComponents
        Select Case comp.Type
            Case vbext_ct_ClassModule
                FileName = CombinePaths(codeFolder, comp.Name & ".cls")
                DeleteFile FileName
                comp.Export FileName
            Case vbext_ct_StdModule
                FileName = CombinePaths(codeFolder, comp.Name & ".bas")
                DeleteFile FileName
                comp.Export FileName
            Case vbext_ct_MSForm
                FileName = CombinePaths(codeFolder, comp.Name & ".frm")
                DeleteFile FileName
                comp.Export FileName
            Case vbext_ct_Document
                FileName = CombinePaths(codeFolder, comp.Name & ".cls")
                DeleteFile FileName
                comp.Export FileName
        End Select

End Sub
Function CanAccessVBOM() As Boolean
    ' Check resgistry to see if we can access the VB object model
    Dim wsh As Object
    Dim str1 As String
    Dim AccessVBOM As Long

    Set wsh = CreateObject("WScript.Shell")
    str1 = "HKEY_CURRENT_USER\Software\Microsoft\Office\" & _
        Application.Version & "\Excel\Security\AccessVBOM"
    On Error Resume Next
    AccessVBOM = wsh.RegRead(str1)
    Set wsh = Nothing
    CanAccessVBOM = (AccessVBOM = 1)
End Function

Sub DeleteFile(FileName As String)
    On Error Resume Next
    Kill FileName
End Sub

Function GetWorkbookPath() As String
    Dim fullName As String
    Dim wrkbookName As String
    Dim pos As Long

    wrkbookName = ThisWorkbook.Name
    fullName = ThisWorkbook.fullName

    pos = InStr(1, fullName, wrkbookName, vbTextCompare)

    GetWorkbookPath = Left$(fullName, pos - 1)
End Function

Function CombinePaths(ByVal Path1 As String, ByVal Path2 As String) As String
    If Not EndsWith(Path1, "\") Then
        Path1 = Path1 & "\"
    End If
    CombinePaths = Path1 & Path2
End Function

Function EndsWith(ByVal InString As String, ByVal TestString As String) As Boolean
    EndsWith = (Right$(InString, Len(TestString)) = TestString)
End Function
Jim Harte
This is nice since you'll get forms and such intact. +1

How do you get all the macros out, and the names of the corresponding feilds, sections.

For e.g. Workbook 1 has the sheet to calculate all the accounts payable. While Workbook 2 has Assets.

How can you extract them into their own subsections in C# or VB, so that you can turn them into classes.