It's well known that Bayesian classifiers are an effective way to filter spam. These can be fairly concise (our one is only a few hundred LoC) but all core code needs to be written up-front before you get any results at all.
However, the TDD approach mandates that only the minimum amount of code to pass a test can be written, so given the following method signature:
bool IsSpam(string text)
And the following string of text, which is clearly spam:
"Cheap generic viagra"
The minimum amount of code I could write is:
bool IsSpam(string text)
{
return text == "Cheap generic viagra"
}
Now maybe I add another test message, e.g.
"Online viagra pharmacy"
I could change the code to:
bool IsSpam(string text)
{
return text.Contains("viagra");
}
...and so on, and so on. Until at some point the code becomes a mess of string checks, regular expressions, etc. because we've evolved it instead of thinking about it and writing it in a different way from the start.
So how is TDD supposed to work with this type of situation where evolving the code from the simplest possible code to pass the test is not the right approach? (Particularly if it is known in advance that the best implementations cannot be trivially evolved).