views:

229

answers:

3

I often hear around here from test driven development people that having a function get large amounts of information implicitly is a bad thing. I can see were this would be bad from a testing perspective, but isn't it sometimes necessary from an encapsulation perspective? The following question comes to mind:

http://stackoverflow.com/questions/1287567/c-is-using-random-and-orderby-a-good-shuffle-algorithm

Basically, someone wanted to create a function in C# to randomly shuffle an array. Several people told him that the random number generator should be passed in as a parameter. This seems like an egregious violation of encapsulation to me, even if it does make testing easier. Isn't the fact that an array shuffling algorithm requires any state at all other than the array it's shuffling an implementation detail that the caller should not have to care about? Wouldn't the correct place to get this information be implicitly, possibly from a thread-local singleton?

+6  A: 

I don't think it breaks encapsulation. The only state in the array is the data itself - and "a source of randomness" is essentially a service. Why should an array naturally have an associated source of randomness? Why should that have to be a singleton? What about different situations which have different requirements - e.g. speed vs cryptographically secure randomness? There's a reason why java.util.Random has a SecureRandom subclass :) Perhaps it doesn't matter whether the shuffle's results are predictable with a lot of effort and observation - or perhaps it does. That will depend on the context, and that's information that the shuffle algorithm shouldn't care about.

Once you start thinking of it as a service, it makes sense that it's passed in as a dependency.

Yes, you could get it from a thread-local singleton (and indeed I'm going to blog about exactly that in the next few days) but I would generally code it so that the caller gets to make that decision.

One benefit of the "randomness as a service" concept is that it makes for repeatability - if you've got a test which fails, you can pass in a Random with a specific seed and know you'll always get the same results, which makes debugging easier.

Of course, there's always the option of making the Random optional - use a thread-local singleton as a default if the caller doesn't provide their own.

Jon Skeet
+1, saved me answering. The way I see it is that shuffling is an algorithm which applies randomness to (the order of) an array. Therefore the natural inputs are an array, and some randomness. For the function to just produce some randomness of its own makes it useless for certain practical purposes - either it's not securely random, or it is secure but consumes all your system entropy and therefore can't be used frivolously. Quite aside from the value when testing, dependency injection leads to more flexible, generic functions.
Steve Jessop
-1 The question is about a function which randomly shuffles an array; the array itself has nothing to do with the source of randomness, obviously. Unless there is a requirement to allow clients to choose the randomizer, passing it in *would* break encapsulation, IMO.
Rogerio
@Rogerio: "randomly shuffles an array" sounds like it's talking about both an array and randomness to me. You clearly need *some* source of randomness, whether that's created within the object or passed in. If the client can't specify it, that sounds like unnecessary rigidity to me. Of course, it depends on what level of encapsulation you're aiming at. If you're going to implement an `IShuffler` interface, it would make sense for the source of randomness to be supplied on construction, leaving the interface random-free. For a single standalone method it's harder to call IMO.
Jon Skeet
+1  A: 

I don't think this violates encapsulation.

Your Example

I would say that being able to provide an RNG is a feature of the class. I would obviously provide a method that doesn't require it, but I can see times where it may be useful to be able to duplicate the randomization.

What if the array shuffler was part of a game that used the RNG for level generation. If a user wanted to save the level and play it again later, it may be more efficient to store the RNG seed.

General Case

Simple classes that have a single task like this typically don't need to worry about divulging their inner workings. What they encapsulate is the logic of the task, not the elements required by that logic.

Ben S
+4  A: 

Yes, that does break encapsulation. As with most software design decisions, this is a trade-off between two opposing forces. If you encapsulate the RNG then you make it difficult to change for a unit test. If you make it a parameter then you make it easy for a user to change the RNG (and potentially get it wrong).

My personal preference is to make it easy to test, then provide a default implementation (a default constructor that creates its own RNG, in this particular case) and good documentation for the end user. Adding a method with the signature

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)

that creates a Random using the current system time as its seed would take care of most normal use cases of this method. The original method

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)

could be used for testing (pass in a Random object with a known seed) and also in those rare cases where a user decides they need a cryptographically secure RNG. The one-parameter implementation should call this method.

Bill the Lizard