views:

121

answers:

5

Consider these two data validation scenarios:

Check everything everywhere

Make sure that every method that takes one or more arguments actually checks them to ensure that they're syntactically valid.

Pros

  • Very fine check granularity.
  • If the code that is being written is for some kind of library we make sure to limit the damage that can be done if the developers that will be using it fail to provide valid data.

Cons

  • It's costly to always perform checks that most of the time shouldn't be needed.
  • It's still possible to forget to add a check every now and then.
  • More code is being written and hence in need of maintenance.

Make use of TDD goodness

Validate data only when it enters your code from the external world. To make sure that internally data will be always syntactically correct, create tests that check every method that returns a value. To make sure that if valid data enters, valid data exits.

The pros and the cons are practically switched with the ones from the former approach.

As of now I'm using the first approach, but since I'm employing test driven development I thought that maybe I could go with the second one.

The advantages are clear, still, I wonder if it's as secure as the first method.

A: 

The advantages are clear, still, I wonder if it's as secure as the first method.

This completely depends on how well you test it.

This could be just as secure, if the following two criteria are met:

  • Every publicly exposed means of adding data to the system are validated completely
  • Every internal method that translates data is completely and adequately tested

However, I question that this would be easier or that it would require less code. The amount of code required to check every public entry point is going to be very similar to the amount of code required to validate each method. You're going to need more checks in the entry points, since they'll have to check things that might otherwise be checked internally.

Reed Copsey
A: 

For the second method, you need two good sets of tests. You must not only check that

To make sure that if valid data enters, valid data exits.

You must also check that if Invalid data enters, an exception is thrown. I suppose you still have to validate data and kick out if you have invalid data. This is really the only way if you don't want pesky ArgumentNullException s or other cryptic errors in your production application. However TDD can really toughen up the quality of all that checking (especially with Fuzz Testing).

C. Ross
That's interesting. Still, wouldn't I still get pesky exceptions if I miss something in my run-time checks? And wouldn't that be equivalent to write my tests poorly? I would then only have make sure that invalid data is properly handled at the entry points and not thorough the whole application.
RobSullivan
+2  A: 

It sounds like the first method is contract driven, and one aspect of that is that you also need to verify that what you return from any public interface meets the contract.

But, I think that both approaches are valid, but very different.

TDD only partially deals with the public interface, in that it should check that every input is properly validated, unfortunately, unless you have all your validation in separate functions, to adequately test, it becomes very difficult to ensure that this function of 3 or 4 parameters is being properly tested for validity. The number of tests you have to write is quite high, in either approach.

If you are using a library, then in every function that can be called directly from the outside (outside being outside the library) then you will need to check that every input is valid, and that invalid input is handled as per the contract, either returning a null or throwing an exception. But, it must be in agreement with the documentation.

Once you have verified it, then there is no reason to force the verification on private functions as those can only be called from within the library, and you should be verifying that you are only dealing with valid data.

Lots of tests will be needed, regardless, unfortunately. All these tests do is to ensure that you don't have any surprise problems, but that should generally help justify the cost of writing and maintaining them.

As to your question, if your tests are really well written, and you ensure that all validity checks are done completely, then it should be as secure, but the risk is that if you believe it is secure and you have poorly written tests then it will actually be worse than no tests, as there is an assumption that your tests are well-written.

I would use both methods, until you know your tests are well-written then just go with TDD.

James Black
Agreed. I normally worry about my public interface and assume that arguments to internal methods are valid (more to keep the methods short than anything else). I suppose these methods could bubble up to public in refactoring/reuse, but if you're always testing your public interface...
TrueWill
In actuality I would use AspectJ to test my java public functions to ensure that the input parameters and the return data is in agreement with the documentation.
James Black
+1  A: 

My opinion is that in the first scenario, two of your Cons outweigh everything else:

  • It's costly to always perform checks that most of the time shouldn't be needed.
  • More code is being written and hence in need of maintenance.

Also, technically TDD has no bearing on this question, because it is not a testing technique. More later...

To mitigate the Cons I would strongly advocate (as I think you say) splitting the code into an outside and an inside: The outside is where all the validation occurs. Hopefully this is but a thin wrapper around the inside, to prevent GIGO. Once inside, data never needs to be validated again.

As for TDD, I would strongly advocate (as you are now doing) employing it to develop your code, with the added benefit of leaving a trail of tests that become a regression test suite. Now you will naturally develop your outside code to perform robust validation, with the promise of easily adding any checks that you might initially forget. Your inside code can be developed assuming it will only handle valid data, but TDD will still give you the confidence that it will function to spec.

I'm saying that I would go with the second approach, as I've described, independently of whether I'm developing with TDD, or not (but TDD is always my first choice).

quamrana
A: 

One item is missing from your list of Pros and Cons and that is something important enough to make unit testing a much more safer method than maniac parameters checking.

You just have to consider the When and the Where.

For unit testing the when and the where are:

  • when: at design time
  • where: in a dedicated source file outside of the application code

For overkill data checking they are:

  • when: at runtime
  • where: entangled in the application source code, typically using asserts.

That is the point: code covered by unit testing detects errors at design time when you run the tests, if you are the paranoid and schizofrenic kind of tester (the bests) you write tests designed to break whatever can be, checking each data boundary and perverse input. You also use code coverage tools to ensure every branch of every alternative is tested. You have no limit : tests lies in their own files and do not clutter application. Doesn't matter if you get ten times as many test lines than the actual application code, no run time penalty, no readability penalty.

On the other hand integrated overkill testing detects errors at runtime. In the worst-case it will detects errors on the user system, where you can do nothing about it (if even you ever heard of this error happening). Also even if you are the paranoid kind you will have to limit your testing. Assertion just can't be 90 percents of the application code. It raise readability issues, maintenance, often heavy performances penalty. Where will you stop then: only checking parameters for external input ? Checking every possible or impossible inputs of inner functions ? Checking every loop invariant ? Also testing behavior when out of flow data (globals, system files, etc) is changed ? You must also be conscious that assertion code can also contain some bugs. What if the formula of an assertion perform a divide. You must ensure it will not lead by a DIVIDE-BY-ZERO error or such ?

Another problem is that in many cases you just don't know what can be done when an assertion failure. If you are at a real entry point you can return back something understandable for your user or the lib user... when you are checking innner functions

kriss
understand my answer is somewhat parodic, I'm a exagerating the respective advantages and defaults of each method. But I believe the basic point hold.
kriss