tags:

views:

894

answers:

5

Working on a project that parses a log of events, and then updates a model based on properties of those events. I've been pretty lazy about "getting it done" and more concerned about upfront optimization, lean code, and proper design patterns. Mostly a self-teaching experiment. I am interested in what patterns more experienced designers think are relevant, or what type of pseudocoded object architecture would be the best, easiest to maintain and so on.

There can be 500,000 events in a single log, and there are about 60 types of events, all of which share about 7 base properties and then have 0 to 15 additional properties depending on the event type. The type of event is the 2nd property in the log file in each line.

So for I've tried a really ugly imperative parser that walks through the log line by line and then processes events line by line. Then I tried a lexical specification that uses a "nextEvent" pattern, which is called in a loop and processed. Then I tried a plain old "parse" method that never returns and just fires events to registered listener callbacks. I've tried both a single callback regardless of event type, and a callback method specific to each event type.

I've tried a base "event" class with a union of all possible properties. I've tried to avoid the "new Event" call (since there can be a huge number of events and the event objects are generally short lived) and having the callback methods per type with primitive property arguments. I've tried having a subclass for each of the 60 event types with an abstract Event parent with the 7 common base properties.

I recently tried taking that further and using a Command pattern to put event handling code per event type. I am not sure I like this and its really similar to the callbacks per type approach, just code is inside an execute function in the type subclasses versus the callback methods per type.

The problem is that alot of the model updating logic is shared, and alot of it is specific to the subclass, and I am just starting to get confused about the whole thing. I am hoping someone can at least point me in a direction to consider!

+1  A: 

Possibly Hashed Adapter Objects (if you can find a good explanation of it on the web - they seem to be lacking.)

finnw
I ended up using something like this along with the property bag approach. I basically just extract out the event type as the 2nd property of the bag and attach handlers to the 60 different types. Thanks for the formal reference.
Josh
+1  A: 

Consider a Flyweight factory of Strategy objects, one per 'class' of event.

For each line of event data, look up the appropriate parsing strategy from the flyweight factory, and then pass the event data to the strategy for parsing. Each of the 60 strategy objects could be of the same class, but configured with a different combination of field parsing objects. Its a bit difficult to be more specific without more details.

Matt Howells
The parser component is done, its a finite state machine and works pretty well for all the events. Sorry if I was unclear. The problem is I don't know how to process the events after parsing.
Josh
+2  A: 

Well... for one thing rather than a single event class with a union of all the properties, or 61 event classes (1 base, 60 subs), in a scenario with that much variation, I'd be tempted to have a single event class that uses a property bag (dictionary, hashtable, w/e floats your boat) to store event information. The type of the event is just one more property value that gets put into the bag. The main reason I'd lean that way is just because I'd be loathe to maintain 60 derived classes of anything.

The big question is... what do you have to do with the events as you process them. Do you format them into a report, organize them into a database table, wake people up if certain events occur... what?

Is this meant to be an after-the-fact parser, or a real-time event handler? I mean, are you monitoring the log as events come in, or just parsing log files the next day?

David Hill
Well, I am pretty sure I want to stick with primitives as event properties, since they are known and constant and I care alot about performance. I want to generate a report of aggregate information over chunks of the events. After the fact, but I want to process seqeuentially (deterministically?).
Josh
A: 

I'm not sure I understand the problem correctly. I assume there is a complex 'model updating logic'. Don't distribute this through 60 classes, keep it in one place, move it out from the event classes (Mediator pattern, sort of).

Your Mediator will work with event classes (I don't see how could you use the Flyweight here), the events can parse themselves.

If the update rules are very complicated you can't really tackle the problem with a general purpose programming language. Consider using a rule based engine or something of the sort.

A: 

Just off the top:

I like the suggestion in the accepted answer about having only one class with a map of properties. I also think the behvavior can be assembled this way as well:

class Event
{
    // maps property name to property value
    private Map<String, String> properties;

    // maps property name to model updater
    private Map<String, ModelUpdater> updaters; 

    public void update(Model modelToUpdate)
    {
     foreach(String key in this.properties.keys)
     {
      ModelUpdater updater = this.updaters[key];
      String propertyValue = this.properties[key];

      updaters.updateModelUsingValue(model, propertyValue);
     }
    }

}

The ModelUpdater class is not pictured. It updates your model based on a property. I made up the loop; this may or may not be what your algorithm actually is. I'd probably make ModelUpdater more of an interface. Each implementer would be per property and would update the model.

Then my "main loop" would be:

Model someModel;

foreach(line in logFile)
{
    Event e = EventFactory.createFrom(line);
    e.update(someModel);
}

EventFactory constructs the events from the file. It populates the two maps based on the properties of the event. This implies that there is some kind of way to match a property with its associated model updater.

I don't have any fancy pattern names for you. If you have some complex rules like if an Event has properties A, B, and C, then ignore the model updater for B, then this approach has to be extended somehow. Most likely, you might need to inject some rules into the EventFactory somehow using the Rule Object Pattern. There you go, there's a pattern name for you!

moffdub