I often have code based on a specific well defined algorithm. This gets well commented and seems proper. For most data sets, the algorithm works great.
But then the edge cases, the special cases, the heuristics get added to solve particular problems with particular sets of data. As number of special cases grow, the comments get more and more hazy. I fear going back and looking at this code in a year or so and trying to remember why each particular special case or heuristic was added.
I sometimes wish there was a way to embed or link graphics in the source code, so I could say effectively, "in the graph of this data set, this particular feature here was causing the routine to trigger incorrectly, so that's why this piece of code was added".
What are some best-practices to handle situations like this?
Special cases seem to be always required to handle these unusual/edge cases. How can they be managed to keep the code relatively readable and understandable?
Consider an example dealing with feature recognition from photos (not exactly what I'm working on, but the analogy seems apt). When I find a particular picture for which the general algorithm fails and a special case is needed, I record as best I can that information in a comment, (or as someone suggested below, a descriptive function name). But what is often missing is a permanent link to the particular data file that exhibits the behavior in question. While my comment should describe the issue, and would probably say "see file foo.jp for an example of this behavior", this file is never in the source tree, and can easily get lost.
In cases like this, do people add data files to the source tree for reference?