views:

488

answers:

3

I have a data structure that represents C# code like this:

class Namespace:
    string Name;
    List<Class> Classes;

class Class:
    string Name;
    List<Property> Properties;
    List<Method> Methods;
    List<Method> Constructors;
    List<Field> Fields;
    List<Class> InnerClasses;
    Class Parent;
    List<Interface> Implements;

... which I'm building using a simple lexer/parser combination. I need to traverse the tree and apply a large set of rules (more than 3000). Rules run when encountering different (and quite complex) patterns in the tree. For example, there's a rule that runs when a class only implements interfaces in the same assembly.

My original naïve implementation iterates over each rule and then each rule traverses the tree looking for its specific pattern. Of course, this takes quite a lot of time, even with a small amount of source code.

I suppose this could be likened to how antivirus software works, recognizing complex patterns on a large body of binary code.

How would you suggest one implement this kind of software?

EDT: Just like to add: No, I'm not re-implementing FxCop.

Thanks

+1  A: 

You could try to aggregate your 3000 rules. Some of the 3000, I would guess assume another member of the 3000. Say rule 12 checks 'a class implements an interface'. Rule 85 might be 'a class only implements interfaces in the same assembly'. If rule 12 fails, no need to run rule 85 at all.

This approach (alpha-beta pruning) would either need you to restructure your algorithm to search the class tree, looking for all rules patterns at the same time. Or to stash a record that a previous rule pass has identified that the current rule pass is irrelevant.

COMMENT: I have a nub level account so i can't comment directly. Can you give an example of maybe 2 more rules? I am currently thinking your algorithm is 0(n*n) (following copied from a big 0 notation post)

O(n*log(n)): an algorithm that does some sort of divide and conquer strategy. Hurts for large n. Typical example: merge sort

O(n*n): a nested loop of some sort. Hurts even with small n. Common with naive matrix calculations. You want to avoid this sort of algorithm if you can.

Jimmy McNulty
A: 

I'd consider creating some sort of representation for pattern/context, then creating a hash map from pattern to set of actions. Without knowing more of your requirements, it's hard to be more specific, but as an example, the string "Namespace/Class" could be a key to a set of actions that depend on knowing the namespace and a single class that it contains, "Class/Interface" could a key to the set of actions that deal with a single class and a single interface it implements, etc.

The tree traversal algorithm could keep track of its own context (parent node, current node, etc.), form a key based on where it is in the tree, retrieve the action set for that key, and then fire all of those actions, giving each an argument structure that provided the actual nodes corresponding to the key pattern.

This amounts to creating a special-purpose rules engine which deals with rules of the form "If I have a class C, and it implements an interface I, then do ... with C and I".

joel.neely
A: 

@Jimmy McNulty

That's a great approach. Alpha-beta pruning you say it's called? It's rearranging rules so that if one fails it excludes others. Am I right? I'm going to look into that.

Here are some examples of other rules:

  • Class is final and doesn't implement / extends a class outside of the assembly.
  • Method is virtual but class is private or internal.
  • Class or method has a specific attribute.
  • Method parameter is known at compile time.

I would love to hear about any other technique that would allow me to execute this kind of logic faster/smarter.

Thanks