I doubt anyone would recommend I waste
the time to implement an unstable
sorting algorithm just to test the
test case, then reimplement the
merge-sort. How often do you come
across a similar situation and what do
you do?
Let me be the one recommend it then. :)
All this stuff is trade-offs between the time you spend on the one hand, and the risks you reduce or mitigate, as well as the understanding you gain, on the other hand.
Continuing the hypothetical example...
If "stableness" is an important property/feature, and you don't "test the test" by making it fail, you save the time of doing that work, but incur risk that the test is wrong and will always be green.
If on the other hand you do "test the test" by breaking the feature and watching it fail, you reduce the risk of the test.
And, the wildcard is, you might gain some important bit of knowledge. For example, while trying to code a 'bad' sort and get the test to fail, you might think more deeply about the comparison constraints on the type you're sorting, and discover that you were using "x==y" as the equivalence-class-predicate for sorting but in fact "!(x<y) && !(y<x)" is the better predicate for your system (e.g. you might uncover a bug or design flaw).
So I say err on the side of 'spend the extra time to make it fail, even if that means intentionally breaking the system just to get a red dot on the screen for a moment', because while each of these little "diversions" incurs some time cost, every once in a while one will save you a huge bundle (e.g. oops, a bug in the test means that I was never testing the most important property of my system, or oops, our whole design for inequality predicates is messed up). It's like playing the lottery, except the odds are in your favor in the long run; every week you spend $5 on tickets and usually you lose but once every three months you win a $1000 jackpot.