tags:

views:

159

answers:

3
+1  Q: 

Code parsing C#

Dear all,

I am researching ways, tools and techniques to parse code files in order to support syntax highlighting and intellisence in an editor written in c#.

Does anyone have any ideas/patterns & practices/tools/techiques for that.

EDIT: A nice source of info for anyone interested:

Parsing beyond Context-free grammars ISBN 978-3-642-14845-3

+3  A: 

My favourite parser for C# is Irony: http://irony.codeplex.com/ - i have used it a couple of times with great success

Here is a wikipedia page listing many more: http://en.wikipedia.org/wiki/Compiler-compiler

Rob Fonseca-Ensor
Does Irony support multiple language parsing?
sTodorov
Irony is for creating parsers, so yes - it parses anything you can build a grammar for
Rob Fonseca-Ensor
+1  A: 

There are two basic aproaches:
1) Parse the entire solution and everything it references so you understand all the types involved in the code
2) Parse locally and do your best to guess what types etc are.

The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.

To do (1) then you need to do one of (in decreasing order of difficulty):

  • Parse all the source code. Not possible if you reference 3rd party assemblies.
  • Use reflection on the compiled code to garner type information you can use when parsing the source.
  • Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need
Jason Williams
OP wants to parse multiple languages. There's the "small" problem of actually getting working grammars for the languages you want to process. Legacy langauges are hard to do this for, because the standards committees have been decorating them with goo; check out IBM Enterprise COBOL or Fortran 2005. Modern langauges are a little easier but even they have pressure to add stuff; try parsing modern VB.net. I've got 15 years into building parsers using unifed instructure for a wide range of languages (including those I mentioned) and I'm not hardly done yet :-{
Ira Baxter
@Ira: OP doesn't make it very clear what languages are required, but most of my answer stands equally well for any language. But you're right, it's a very nontrivial problem. Visual Studio Intellisense has been developed for many years by an experienced team, and only really works well in .net languages - beyond basic syntax highlighting, the support is pretty poor in most other languages, which is a good indicator of the difficulty of the problem the OP be attempting to address.
Jason Williams
@Ira the feat you are trying to accomplish sounds very serious. I wish you all the success with it. However, what I am researching is mostly support for C#, Ruby, Python, VB. net, java. I can only imagine the difficulties involved with parsing legacy languages
sTodorov
@Jason, I think for now I will concentrate on researching parsing C# and python because of the difference in the structure, e.g. curly brackets and indentation
sTodorov
@sTodorov: I've done all the langauges you've mentioned except for Ruby and that's in progress. If you want to parse these languages fully you need pretty much all that machinery that I've used in some form or another. If all you want is syntax highlighting, you can a good-enough job with just regular expression matching, because syntax highlighting doesn't have be always right to be useful.
Ira Baxter
@sTorodov: I guess what you're looking for is a code model that is flexible enough to represent code elements from the different languages you support, so you can add a parsing "layer" that maps the specific source code to a generic description that can be used for intellisense/colouring. My addin (AtomineerUtils) parses C-like languages (C, C++, C#, Java) in this way, and I was surprised how little work it took to add support for VB - there are surprisingly few differences in the parsing once you look past the superficial syntax, so most internal processing methods didn't need to change.
Jason Williams
@Ira: It definetely sounds like syntax highlightinhg is the better choice for now as it does not involve so much complexities. I will have a look at how the reg exp engines can work for me in that regard. BTW, sorry for prying, but the DMS software toolkit seems very interesting.
sTodorov
@Jason: Yess, I am looking for something exactly like the code model you suggest. Thanks for the pointers and I will have a look in this direction.
sTodorov
@sTodorov: No need to pry, check out my bio (I assume you have) and the website link there, you can find out plenty about DMS.
Ira Baxter
+1  A: 

You could take a look at how http://www.icsharpcode.net/ did it. They wrote a book doing just that, Dissecting a C# Application: Inside SharpDevelop, it even has a chapter called

Implement a parser to provide syntax highlighting and auto-completion as users type

Jonas Elfström