tags:

views:

529

answers:

6

I have a requirement to get the contents of every method in a cs file into a string. What I am looking for is when you have an input of a cs file, a dictionary is returned with the method name as the key and the method body as the value.

I have tried Regex and reflection with no success, can anyone help?

Thanks

A: 

after you find the function headers, count the number of braces until you find the end of the function.

StingyJack
Okay so long as there are no unbalanced {}s in comments or strings... And relies on being able to identify a method's start.
Blair Conrad
Trebs got the best idea.
StingyJack
A: 

A custom-made parser will be the best option. As Blair stated in the comment to StingyJack, it is VERY difficult to parse code with regex. I've attempted it once, and, although it is possible to match balancing braces with NET, escaping comments and strings is much more complicated.

A parser should make things much simpler. See antlr for a good generator.

As for reflection, I believe you could attempt to compile the code (as long as you have all necessary dependencies) and then access it contents Reflector-like. But I'd go with the parser.

Santiago Palladino
+1  A: 

Assuming that the file is valid (i.e. compiles), you can start by reading the whole file into a string.

I gather from your question that you are only interested in method names, not in class names. Then you need a regex that gives you all instances of public|protected|private, optional keywords virtual/override etc, MethodName, (, optional parameters, ). It would help if there were coding conventions, so you could assume that all method definitions were always in one line, not spread over several lines.

Once you have that, it is only a matter of counting { and } to get the function body.

And one final advice: Beware of assumptions. They have the nasty habbit of biting you in the butt.

EDIT: Ouch, forgot about comments! if you have brackets in comments in the method body, your counting can go wrong. So you need to strip all comments from the source as your very first step.

Treb
+1  A: 

In general the problem you are trying to solve is to parse the C# code in the same manner that the compiler would, and then save the contents of the functions rather than generate code. So as background for your solution you should look at c# grammars and how to parse them.

As per StingyJack, a simple method for doing this would be to create a regex that only identifies function definitions. Then you can assume that everything in between is a function body. However that assumption will not handle things like multiple classes in the one file or even the trailing }'s at the end of a class. To handle things like that you will have to engineer a c# compiler, as processing the complete c# grammar is the only thing that will correctly identify what c# thinks is a function.

Peter M
+5  A: 

I don't know if it's any use to you but Visual Studio Addins include a EnvDTE object, that gives you full access to the VB and C# language parsers. See Discovering Code with the code Model

I touched on it tangentially years ago, I don't know how difficult it is to use, or how effective it is, but it does look like it will give you what you need.

The code model allows automation clients to avoid implementing a parser for Visual Studio languages in order to discover the high-level definitions in a project, such as classes, interfaces, structures, methods, properties, and so on.

If you read the article in full it tells how to pull the full text from a file for a function

Hope this helps :)

Binary Worrier
A: 

NRefactory is the tool for this job. Take a look here: http://laputa.sharpdevelop.net/content/binary/NRefactory.wmv

reshefm