I take care of critical app in my project. It does stuff related to parsing business msgs (legacy standard), processing them and then storing some results in a DB (another apps picks that up). After more then a year of my work (I've other apps to look after as well) the app is finally stable. I've introduced strict TDD policy and I have 20% unit test coverage (thank you Michael Feathers for your book!), most of it in critical parts. I have some white-box Fitness tests as well (whole business scenarios are covered there). I feel that I cannot further refactor this app and I'm safe to play hard with it. It's designed so badly, I want to rewrite it. App itself is around 20k of challenging legacy C/C++ code. There were other dependencies but I manged to decouple most of them.
All I have is Sun C++ compiler, cppunitlite, STLPort and Boost. Please do not suggest other technologies (no XML, Java etc) as this is not the option in my organization. I'd like to do it with modern C++ (perhaps play with metaprogramming...), TDD from start to the end.
There are about 30 types of msgs I need to parse. Each of them is composed of 3-10 lines, most of them are pretty similar. This is root of all evil -> lots of code duplication. Each msgs has a class describing how it should be parsed. Take a look at the main inheritance tree:
MSG_A MSG_B
/ \ / \
MSG_A_NEW MSG_A_CNL MSG_B_NEW MSG_B_CNL
Both trees goes much deeper. There are very small differences between MSG_A_NEW and MSG_B_NEW. It should be handled by single class that could be injected with some small customization.
My initial plan it to have one generic msg class that will be customized. Some entity (builder... ?) will take a look at the msgs and initialize proper object that will be able to parse the msg. Another entity will be able to discover what line is it and this info will be used by builder. I'm planning to write several parsers that are responsible for parsing just one specific line. This will allow me to reuse it in parsing different msgs.
There are several challenges that I struggle to solve in an elegant and extensible way. Each type of msg:
has min and max number if lines - has some must-have lines - has some optional lines - certain lines must be at certain places (i.e. date cannot be before msg type), order matters
I need to be able to validate format of the msgs.
I'm not sure if I explained the design challenge here good enough. My design experience is very limited. I've been bug-fixing for a while now and finally I will have a change to do some fun codding :)
What high-level advice do you have for that? Which design patterns can you identify in this description? Main design constraint is maintainability and extensibility with performance at the bottom (we have other bottlenecks anyway...).