tags:

views:

153

answers:

4

I am looking for an approach for finding the code between the base class identifier colon and the opening curly brace of a class that's been that's been stored into a string literal.

By this I mean that I have a class

public class Class : BaseClass
{    
}

That's been stored as a string

string classString = "public class Class : BaseClass{}\r\n"

The class will most certainly be more detailed with the potential for strongly-typed, fully qualified base class and interfaces, but I need an approach for sniffing out the code between the colon and opening curly bracket.

Assuming that the class is not a generic that defines derivation constraint i.e.

public class LinkedList<K,T> : BaseClass
  **where K : IComparable**
{
}

Then it might be safe to assume that there would on be one colon in the class definition and it would fairly easy to find the derivation colon and the opening curly brace.

If that's the case I could do

    string baseClassString = classString.Substring(derivationColonIndex + 1, (openCurlyBraceIndex - (derivationColonIndex + 1)))

Can anyone think of a better approach that would GUARANTEE the I get a string for the baseClass and any interfaces that might exist between the colon and opening curly brace.

Background for why I'm needing this : Classes are being generated base on data coming from a database, if the certain data in the db changes, then potentially, I have the need to change the inheritance in the class string. Thus, I would replace the existing substring of the base class and interfaces.

+2  A: 

You are going down into a pretty deep rabbit hole. Writing your own C# language parser is a task that can keep you occupied for a long time, with a pay-off that enhances your skill as a programmer but doesn't turn the boss' frown upside-down.

You are re-inventing a wheel. The DataSet designer built into Visual Studio already does what you're trying to do. It could be argued that it is the wrong wheel, the fans of NHibernate will certainly think so. They generate the dbase schema from the C# class declarations.

Rescue your plan by considering that modifying an existing C# class that models the dbase is not necessary. Just re-generate the class from scratch every time you compile. It normally only takes a fraction of a second. The compiler will dutifully warn you when there's a breaking change. That's how the Settings designer and the Resource designer work.

Hans Passant
Essentially we have a schema importing utility that will import the necessary schema information of any of our databases. We then have a code generator that generates data objects from the information in the database. Without getting too detailed, the code generator has the ability to completely overwrite the data objects at anytime with 100% accuracy every time. The issues is coming from the fact that an app developer can add "custom/additional" code to the data objects.
BrandonS
The code generator detects this, and knows that it will need to merge the code instead of overwriting it so that the custom code is not lost. That's where the current issue is coming from. This process is not nearly perfect, and has been around for a while, and we do not have the option to start from scratch. Besides the ability to "Re-Parent" a class, our code generator works exactly as we need it to. Considering this is just an in-house tool, we have more important things to invest our time into. Thanks you all for your replies. It is all very useful information.
BrandonS
Devs should extend the auto-generated classes by inheriting from them.
Hans Passant
+4  A: 

Man, you really should be using CodeDOM:

http://msdn.microsoft.com/en-us/library/y2k85ax6.aspx

The CodeDOM provides types that represent many common types of source code elements. You can design a program that builds a source code model using CodeDOM elements to assemble an object graph. This object graph can be rendered as source code using a CodeDOM code generator for a supported programming language. The CodeDOM can also be used to compile source code into a binary assembly.

I'd like to draw your attention to the object graph. Using the object graph, you should be able to do what you need to do.

EDIT: Sorry for the misdirection, actually what you're trying to accomplish is the reverse of what I suggested - my bad! You may want to look at the following projects, which offer the capability to build an object graph from the code, rather than generate code from an object graph:

http://csparser.codeplex.com/

http://wiki.sharpdevelop.net/Default.aspx?Page=NRefactory&amp;NS=&amp;AspxAutoDetectCookieSupport=1

code4life
This is pretty cool. I had never heard of this. It might be difficult to adapt it for the use we need but it is definately something that I will keep in mind for future use. Thanks.
BrandonS
How does this help if you're *starting* with the string? The problem as I understand it is that there is a string which must be understood as a graph. CodeDOM goes the other way: it turns the graph into the string.
Eric Lippert
@Eric, you're right... man, I was on a weird tangent yesterday. Thanks for pointing this out.
code4life
+1  A: 

If you're willing to spend some time learning it, SharpDevelop contains a C# parser named NRefactory. It returns an abstract syntax tree from a source string. It could catch errors and handle other language elements like comments, attributes, etc.

Obviously it's not a quick fix, but if you have the time it's an interesting tool.

Corbin March
A: 

There's an ANTLR grammar for C#. Maybe you can use that?

nikie