views:

106

answers:

2

We have a suite of converters that take complex data and transform it. Mostly the input is EDI and the output XML, or vice-versa, although there are other formats.

There are many inter-dependencies in the data. What methods or software are available that can generate complex input data like this?

Right now we use two methods: (1) a suite of sample files that we've built over the years mostly from files bugs and samples in documentation, and (2) generating pseudo-random test data. But the former only covers a fraction of the cases, and the latter has lots of compromises and only tests a subset of the fields.

Before go further down the path of implementing (reinventing?) a complex table-driven data generator, what options have you found successful?

+2  A: 

Well, the answer is in your question. Unless you implement a complex table-driven data generator, you're doing the things right with (1) and (2).

(1) covers the rule of "1 bug verified, 1 new test case". And if the structure of the pseudo-random test data of (2) corresponds whatsoever in real life situations, it is fine.

(2) can always be improved, and it'll improve mainly over time, when thinking about new edge cases. The problem with random data for tests is that it can only be random to a point where it becomes so difficult to compute the expected output from the random data in the test case, that you have to basically rewrite the tested algorithm in the test case.

So (2) will always match a fraction of the cases. If one day it matches all the cases, it will be in fact a new version of your algorithm.

FWH
You get points for your last paragraph; it made me laugh. I hope someone knows of an existing test data generator...
lavinio
A: 
  1. I'd advise against using random data as it can make it difficult if not impossible to reproduce the error that reported (I know you said 'pseudo-random', just not sure what you mean by that exactly).

  2. Operating over entire files of data would likely be considering functional or integration testing. I would suggest taking your set of files with known bugs and translating these into unit tests, or at least do so for any future bugs you come across. Then you can also extend these unit tests to include coverage for the other erroneous conditions that you don't have any 'sample data'. This will likely be easier then coming up with a whole new data file every time you think of a condition/rule violation you want to check for.

  3. Make sure your parsing of the data format is encapsulated from the interpretation of the data in the format. This will make unit testing as described above much easier.

  4. If you definitely need to drive your testing you may want to consider getting a machine readable description of the file format, and writing a test data generator which will analyze the format and generate valid/invalid files based upon it. This will also allow your test data to evolve as the file formats do as well.

Mark Roddy
1. It's actually pseudo-random; it's randomly generated, but fixed between runs.2. The problems usually encountered are caused mostly by interactions.3. The EDI parser is a separate step, as is the XML parser. Both write to a neutral internal format. So the parsers are logically separated.4. Yeah; of course, if we use the same dictionary for generating the test data as we do for interpreting it, we've got a problem. So using/finding a different algorithm would be valuable.
lavinio