views:

220

answers:

8

I often have code based on a specific well defined algorithm. This gets well commented and seems proper. For most data sets, the algorithm works great.

But then the edge cases, the special cases, the heuristics get added to solve particular problems with particular sets of data. As number of special cases grow, the comments get more and more hazy. I fear going back and looking at this code in a year or so and trying to remember why each particular special case or heuristic was added.

I sometimes wish there was a way to embed or link graphics in the source code, so I could say effectively, "in the graph of this data set, this particular feature here was causing the routine to trigger incorrectly, so that's why this piece of code was added".

What are some best-practices to handle situations like this?

Special cases seem to be always required to handle these unusual/edge cases. How can they be managed to keep the code relatively readable and understandable?

Consider an example dealing with feature recognition from photos (not exactly what I'm working on, but the analogy seems apt). When I find a particular picture for which the general algorithm fails and a special case is needed, I record as best I can that information in a comment, (or as someone suggested below, a descriptive function name). But what is often missing is a permanent link to the particular data file that exhibits the behavior in question. While my comment should describe the issue, and would probably say "see file foo.jp for an example of this behavior", this file is never in the source tree, and can easily get lost.

In cases like this, do people add data files to the source tree for reference?

A: 

Without knowing the specific nature of your problem is not easy to give an answer, but in my own experience, handling of special cases on hard code must be avoided. Haven't you thought about implementing a rules engine or something like that for handling special cases outside your main processing algorithm?

tekBlues
+6  A: 

Martin Fowler said in his refactoring book that when you feel the need to add a comment to your code, first see if you can encapsulate that code into a method and give the method a name that would replace the comment.

so as an abstract you could create a method named.

private bool ConditionXAndYHaveOccurred(object param)
{
   // code to check for conditions x and y
   return result;
}

private object ApplySolutionForEdgeCaseWhenXAndYHappen(object param)
{
   //modify param to solve for edge case
   return param;
}

Then you can write code like

if(ConditionXAndYHaveOccurred(myObject))
{
    myObject = ApplySolutionForEdgeCaseWhenXAndYHappen(myObject);
}

Not a hard and fast concrete example, but it would help with readability in a year or two.

Matthew Vines
While I think descriptive function names are great, and try to do them whenever possible, I don't think it's usually possible to replace a paragraph of descriptive text with a function name.
loneRanger
The other side to that is that Comments are often not kept up to date with the code. So that paragraph of comments may end up being the most confusing thing about your code when you come back to it later after a few revisions. Ideally you should be commenting Why you wrote the code, not what you are doing. In which case, feel free to tag your descriptive functions with the narrative that explains exactly why it exists.
Matthew Vines
Unfortunately true, about out of date comments.
loneRanger
+3  A: 

Unit testing can help here. Having tests that actually simulate the special cases can often serve as documentation on why the code does what it does. This can often be better then just describing the issue in a comment.

Not that this replaces moving the special case handling to their own functions and decent comments...

Aardvark
+1  A: 

I'm not usually an advocate of test driven development and similar styles that stress tests too much, but this seems to be a perfect case where a bunch of unit test can help a lot. And not even in the first place to catch bugs from later changes, but simply to document all the special cases that need to be addressed.

A few good unit test with comments in them are in itself the best description of the special cases. And the commenting of the code itself gets easier too. One can simply point to some unit tests that illustrate the problem that is being solved at that point in the code.

sth
+2  A: 

If you have a knowledge base or a wiki for the project, you could add the graph in it, linking to it in the method as per Matthew's Fowler quote and also in the source control commit message for the edge case change.

//See description at KB#2312
private object SolveXAndYEdgeCase(object param)
{
   //modify param to solve for edge case
   return param;
}

Commit Message: Solution for X and Y edge case, see description at KB#2312

It is more work, but a way to document cases more thoroughly than mere test cases or comments could. Even though one might argue that test cases should be documentation enough, you might not want store the whole failing data set in it, for instance.

Remember, vague problems lead to vague solutions.

Vinko Vrsalovic
I have a project wiki, and a bug database, but they aren't really tied together. In 5 years, while I can find the source code. Will I be able to find the bug database? Maybe that's my problem.
loneRanger
You really should have your KB or issue tracking tool and wiki in the same backup regime as your source code. Not only for solving your particular case, but for many other reasons as well, like you really want to keep the history of bugs, dates of resolution, description and so on to be able to learn from it and to store relevant information.So if you are not sure you'll be able to find the wiki or bug database in a few years, make sure you will.
Vinko Vrsalovic
A: 

It sounds like you need more thorough documentation than just code comments. That way someone could look up the function in question in the documentation and be presented with an example picture that requires a special case.

Niki Yoshiuchi
A: 

About the

I sometimes wish there was a way to embed or link graphics in the source code, so I could say effectively, "in the graph of this data set, this particular feature here was causing the routine to trigger incorrectly, so that's why this piece of code was added".

part:

If the "graphic" that you want to embed is a graph, and if you use Doxygen, you can embed dot commands in your comment to generate a graph in the documentation:

/**
If we have a subgraph looking like this:
\dot
digraph g{
A->B;
A->C;
B->C;
}
\enddot
the usual method does not work well and we use this heuristic instead.
*/
Éric Malenfant
+1  A: 

Don Knuth invented literate programming to make it easy for your program documentation to include plots, graphs, charts, mathematical equations, and whatever else you need to make it understood. A literate program is a great way to explain why something is the way it is and how it got that way over time. There are many, many literate-programming tools; the "noweb" tool is one of the simplest and is shipped with some Linux distributions.

Norman Ramsey