views:

71

answers:

6

Hello,

If that title did not make sense (which I'm expecting =)) here is what I'm asking:

I have a function called ParseFile(). It takes a string as parameter, and a DataTable as a return value.

I want to unit-test this function. Is it wrong of me to code the function first, run it, take the output, serialize it to XML, save it as expected output, then write my unit-test to call the function and assert against that deserialized data?

I realize this helps me down the road, in cases where we get new input we might not have seen before and have to change the parsing function to handle it - running my test will now assert that I have not broken any currently working files. Awesome...

.. but in this case, the format will never change and is standard. So is it completely useless to do what I'm saying? And if it is, how to test this function then?

And heck, if what I'm saying is still a good idea - how would you even do that true TDD style and write the test first? Without tediously writing Assert calls() for every single expected field in a file? I'm not quite in full TDD 'mode' yet - but I'm trying to get there... and it's cases like these I sometimes wonder how you write a test for it first possibly, when the expected output is a dataset for example...

Thanks

+2  A: 

It depends on why you are writing tests.

From a Test-Driven Development standpoint, the process you describe is just wrong. From a Quality Assurance standpoint, it makes a lot of sense since it provides you with a regression test suite that works as a safety net as you move forward.

The key to developing such methods with TDD is to divide them into smaller chunks (units) and test each chunk in isolation. Instead of having a single method with some input and output and a lot going on behind that method, it will often produce a more flexible and reusable API if you can manage to split it up into a lot of smaller chunks.

How do you split up such a method?

Start thinking about how you would organize it into private helper methods. Then think about if some (or all) of these helper methods could conceivable be represented by an object in its own right. Design Patterns such as Strategy and Abstract Factory can be enormously helpful in that sense.

Instead of a coarse-grained API with a lot of internal logic you'll end up with a fine-grained API with a lot of public, but composable logic.

Mark Seemann
+1  A: 

You shouldn't compare the serialized form of your object as your test. It's too implementation specific. You should only be testing the interface.

You should test specific properties of your DataTable object, for example, that it has four rows, five columns and that cell[1,2] contains the string "Fish".

Mark Byers
+2  A: 

Typically I try to stay away from testing functions which are that "broad". I prefer to test more granular functions. I'm going to assume that your ParseFile() method uses several utility methods, which themselves use other utility methods. Those are the kinds of methods that I try to test. Generally any input like that is made up of several different pieces of data. Rather than trying to test that an entire file was parsed forrectly, can you look through the ParseFile() method and the data being parsed and break it into several smaller tests which, when taken in their aggregate, give you the same confidence?

My personal justification for preferring that approach is that if I ever need to modify any of that parsing code and a test fails I have a much faster route to finding the source of the failure, rather than "ParseFile() did not return the expected results". :)

JMD
+3  A: 

This seems more like black-box testing than unit testing. But I don't see a problem with the way you did this if you are sure the generated data set is correct. You are making sure the data set doesn't change in the future, and I think that's a good test, but maybe not necessarily a unit test

OTisler
+4  A: 

It's not wrong, but it's not TDD.

That being said, I'd like to warn you about asserting on XML strings: when something goes wrong when the XML size is large enough, you end up comparing the two XML strings manually, err well, visually.

Been there, done that. I remember being in that situation, copying the XML in two files, modifying it to have one attribute per line and comparing the two files with diff. I said I'll try to assert on the XML with XPath and/or XQuery next time.

Also, isn't your function doing too many things: parsing the string and generating the XML ? You might want to consider to separate this out.

How would you even do that true TDD style and write the test first?

If you really want to use TDD and keep one function, then you can start with a test: what should your XML output looks like with an empty string? This is your first test. Once it is pass, restart with a string with a simple element, write the test, make it pass, and take a more complex string. Lather, rinse, repeat.

philippe
+3  A: 

I have used tests like this in the past -- they are often incredibly useful.

They are totally not TDD. I find that I write this kind of test when I have code that is better built without TDD: Glue Code. You especially see this with data extracts (aka, "Query the database for this (existing) data, format it this way, and send it to the client.") You could TDD field formatters or other utility objects, but ultimately all that matters for this kind of code is if you produce a given (large) output for a given (large) input. Turning the code inside-out to introduce testing seams isn't worth the hassle.

You do get the most important virtue of a unit test suite out of this kind of test: If you change a shared piece of underlying functionality, and it breaks the function, your tests will go red.

Some people will say this isn't a unit test, because it isn't isolated. I don't care. That distinction doesn't help me any. This test provides me with reassurance that the system is working as it is supposed to. That lets me make changes to the codebase without fear.

Sean McMillan