views:

1095

answers:

7

I would like to be able to parse vb.net code files, so I can examine the collection of Subs, Functions (and their contents, including comments), private variables, etc.

I can be open the actual source code files.

So for example, if I have:

Public Function FunctionOne(arg1 As String, arg2 as String) as Integer
   here is some code
   ''//here are some comments
End Function

Public Sub FunctionOne(arg1 As integer, arg2 as integer)
   here is some code
   ''//here are some comments
End Sub

I would like to be able to parse out all of the subs and functions and all of the code between the Public Function and End Function (actually, it would be nice to have the option to either include just the code within, or the entire function definition.

This would seem to call for some sort of a parsing library, or else fairly decent regex skills.

Any suggestions?

UPDATE: The main thing I'm trying to achieve is the parsing of the source code, so reflection is just fine perhaps for getting the list of functions and what not and I know how to do that, but it is a proper way of parsing the source code I am trying to figure out.

+6  A: 

What about compiling those at runtime from your program, and then using reflection on the compiled library?

look at this microsoft thread for details on how to do that !

Brann
I think this is a neat idea. Who's better at parsing the code than the compiler?
Ben S
That might be the way to go to read the function arguments and types, return type, etc, but it doesn't help with obtaining the underlying source code, which is the main thing I'm trying to achieve.....
tbone
I don't really understand what you want to achieve. The underlying source code is available, just do a File.Open() on the .cs source files ... so what is it exactly you want to do?
Brann
A: 

I think you're looking for the Microsoft.CSharp.CSharpCodeProvider, it accepts a file and provides direct access to the C# code generator and compiler. I imagine it can accept a string as well.

MSDN: http://msdn.microsoft.com/en-us/library/microsoft.csharp.csharpcodeprovider.aspx

Edit:

After the question was updated I see that this is not relevant, but it still may be possible to utilize this object to extract the source code from the public methods like you desire. I'll investigate some more...

John Rasch
+1  A: 

I would think you could use the Visual Basic.NET Lexical Grammar and parser-generators like Flex and Bison (in C/C++) or something like Antlr (for .NET).

This is how compilers parse languages to do their job.

Eric Lathrop
A: 

You could compile the thing, then use the Reflector tool. We all think of Reflector as primarily a GUI tool, and one of the neat features it has is it can de-compile a .NET assembly. It can produce source from a DLL or EXE. But Reflector itself can be controlled programmatically. So your app can

  • Compile the source into an assembly
  • call into Reflector, ask it to de-compile
  • programmatically fiddle with Reflector's output - get a list of functions, and the decompiled source associated to same.

Example.

This approach may not satisfy - because the source you get from Reflector is not the original source but the de-compiled source. Comments will be gone, and the decompilation is not 100% faithful to the original. Functionally equivalent but not 100% textually the same.

Anyway, worth a look.

Cheeso
+3  A: 
madgnome
A: 

This code is crude but more or less accomplishes what I was intending to do:

Private _SourceCode As String = Nothing
Private ReadOnly Property SourceCode() As String
                Get
                    If _SourceCode = Nothing Then
                        Dim thisCodeFile As String = Server.MapPath("~").ToString & "\" & Type.GetType(Me.GetType.BaseType.FullName).ToString & ".aspx.vb"
                        _SourceCode = My.Computer.FileSystem.ReadAllText(thisCodeFile)
                    End If
                    Return _SourceCode
                End Get
End Property  

Private Function extractProcedureDefinition(ByVal procedureName As String) As String
   Return extractStringContents(Me.SourceCode, "Sub " & procedureName & "()", "End Sub", True)
End Function  

Private Function extractFunctionDefinition(ByVal procedureName As String) As String
   'TODO: This works now, but wouldn't if we wanted includeTags = False, as it does not properly handle the "As xxxxx" portion
   Return extractStringContents(Me.SourceCode, "Function " & procedureName, "End Sub", True)
End Function

    Private Function extractStringContents(ByVal body As String, ByVal openTag As String, ByVal closeTag As String, ByVal includeTags As Boolean) As String
                Dim iStart As Integer = body.IndexOf(openTag)
                Dim iEnd As Integer = body.IndexOf(closeTag, iStart)
                If includeTags Then
                    iEnd += closeTag.Length
                Else
                    iStart += openTag.Length
                End If
                Return body.Substring(iStart, iEnd - iStart)
    End Function
tbone
A: 

madgnome was right on the dime for me! I wanted to parse C# code and determine relations between namespaces, classes, members and assemblies. NRefactory and the NRefactoryDemo application was exactly what I needed to solve this, and it was very easy to get started!

Thanks a lot!

Andreas Larsen