views:

127

answers:

4

I take care of critical app in my project. It does stuff related to parsing business msgs (legacy standard), processing them and then storing some results in a DB (another apps picks that up). After more then a year of my work (I've other apps to look after as well) the app is finally stable. I've introduced strict TDD policy and I have 20% unit test coverage (thank you Michael Feathers for your book!), most of it in critical parts. I have some white-box Fitness tests as well (whole business scenarios are covered there). I feel that I cannot further refactor this app and I'm safe to play hard with it. It's designed so badly, I want to rewrite it. App itself is around 20k of challenging legacy C/C++ code. There were other dependencies but I manged to decouple most of them.


All I have is Sun C++ compiler, cppunitlite, STLPort and Boost. Please do not suggest other technologies (no XML, Java etc) as this is not the option in my organization. I'd like to do it with modern C++ (perhaps play with metaprogramming...), TDD from start to the end.

There are about 30 types of msgs I need to parse. Each of them is composed of 3-10 lines, most of them are pretty similar. This is root of all evil -> lots of code duplication. Each msgs has a class describing how it should be parsed. Take a look at the main inheritance tree:

                             MSG_A                     MSG_B
                            /     \                   /     \
                    MSG_A_NEW   MSG_A_CNL      MSG_B_NEW   MSG_B_CNL

Both trees goes much deeper. There are very small differences between MSG_A_NEW and MSG_B_NEW. It should be handled by single class that could be injected with some small customization.

My initial plan it to have one generic msg class that will be customized. Some entity (builder... ?) will take a look at the msgs and initialize proper object that will be able to parse the msg. Another entity will be able to discover what line is it and this info will be used by builder. I'm planning to write several parsers that are responsible for parsing just one specific line. This will allow me to reuse it in parsing different msgs.

There are several challenges that I struggle to solve in an elegant and extensible way. Each type of msg:

has min and max number if lines - has some must-have lines - has some optional lines - certain lines must be at certain places (i.e. date cannot be before msg type), order matters

I need to be able to validate format of the msgs.


I'm not sure if I explained the design challenge here good enough. My design experience is very limited. I've been bug-fixing for a while now and finally I will have a change to do some fun codding :)

What high-level advice do you have for that? Which design patterns can you identify in this description? Main design constraint is maintainability and extensibility with performance at the bottom (we have other bottlenecks anyway...).

A: 

That does sound like a fun challenge. :-)

Your "initial plan" sounds like a good one: factor out all of the similar processing between all of the messages and put the code for them in a base message class. The changing items can become virtual functions (such as CheckForRequiredLines or VerifyLineOrder, perhaps), possibly with default implementations for the most common case. Then derive other classes for the specific message types.

It's hard to give generic advice for a design problem like this. It seems to me that your main parser function corresponds to the Factory Method pattern, but that's the only one I can easily identify. (I'm not too familiar with the names of design patterns -- I use many of them, but I only learned that they have names a couple years ago.)

Head Geek
A: 

You probably are already aware of this, but just in case... You should pick up/borrow the Gang of Four design patterns book for help in identifying and applying appropriate patterns. This is the canonical reference, and it contains cross-references and tables to help you decide what patterns might fit your application. It might be difficult for people here to identify specific patterns that might help you, based just on that description.

mbyrne215
ofc I have that book, but it takes a lot of time and experience to be able to fully apply knowledge you can find there. Anyway, thx for tip.
Nazgob
A: 

I would suggest looking at the libraries provided by boost, for example Tuple or mpl::vector. These libraries allows you to create a list of unrelated types and then operate over them. The very rough idea is that you have sequences of types for each message kind:

Seq1 -> MSG_A_NEW, MSG_A_CNL
Seq2 -> MSG_B_NEW, MSG_B_CNL

Once you know your message kind, you use the appropriate tuple with a function template that applies the first tuple type to the data. Then the next entry in the tuple and so on.

This does assume that the layout of your data streams are known at compile time, but it does have the advantage that you are not paying any runtime overhead for the data structures.

Richard Corden
I'm a big fan of the Boost library, but this shouldn't be necessary for this problem, because the types *aren't* unrelated.
Head Geek
However - maybe the types are related simply because that's the only way the OP believed the problem could be solved!
Richard Corden
+1  A: 
eli
I'm trying sth similar now, thanks for tips. BTW, decent text notation!
Nazgob