views:

197

answers:

4

I am trying to use test-driven development for an application that has to read a lot of data from disk. The problem is that the data is organized on the filesystem in a somewhat complex directory structure (not my fault). The methods I'm testing will need to see that a large number of files exist in several different directories in order for the methods to complete.

The solution I'm trying to avoid is just having a known folder on the hard drive with all the data in it. This approach sucks for several reasons, one reason being that if we wanted to run the unit tests on another computer, we'd have to copy a large amount of data to it.

I could also generate dummy files in the setup method and clean them up in the teardown method. The problem with this is that it would be a pain to write the code to replicate the existing directory structure and dump lots of dummy files into those directories.

I understand how to unit test file I/O operations, but how do I unit test this kind of scenario?

Edit: I will not need to actually read the files. The application will need to analyze a directory structure and determine what files exist in it. And this is a large number of subdirectories with a large number of files.

+4  A: 

I would define a set of interfaces that mimick the file system, such as IDirectory and IFile, and then use Test Doubles to create a representation of the directory structure in memory.

This will allow you to unit test (and vary) that structure to your heart's content.

You will also need concrete implementations that implement those interfaces using the real BCL classes for that purpose.

This lets you vary data structure and data access independently of each other.

Mark Seemann
This is still not an easy task, but I think it's the way I'll go. Trying to mimic the file system is helping me more effectively separate data access from business logic, which is better design.
Phil
+1  A: 

Whew, that sounds like a beast. I've been dabbling in testing myself.

It sounds like the main focus of your question is "How do I set up a large number of files so that I can test methods that check that said files exist?"

You mention several possible solutions. You said that you don't want to simply have a folder on the hard drive full of test data because you wouldn't want to have to go through the process of copying the data to another computer, which is understandable.

You also mention that you could write methods to generate dummy files, but it would be a pain to replicate the data structure.

Roy Osherove says in The Art of Unit Testing that it's a great idea to maintain and version your test code as your project is maintained and versioned.

I think that for the sake of consistency, it would make sense to create some dummy data and place it in some kind of source control repository with your test code. That way, you could streamline the process of copying the dummy data onto another computer and not have to worry about keeping track of which dummy data is on which machine. That would be a pain!

My solution: place dummy data is source control.

Rice Flour Cookies
A: 

A possible solution would be to create the dummy file&directory structure from a tar file that your setup method deploys.

mouviciel
+1  A: 

This has a Python perspective. You may not be working in Python, but the answer more-or-less applies to most languages.

With unit testing with any external resource (e.g. the os module) you have to mock out the external resource.

The question is "how do mock out os.walk?" (or os.listdir or whatever you're using.)

  1. Write a mock version of the function. os.walk for example. Each mocked-out version returns a list of directories and files so that you can exercise your application.

    How to build this?

    Write a "data grabber" that does os.walk on real data and creates a big-old flat list of responses you can use for testing.

  2. Create a mock directory structure. "it would be a pain to write the code to replicate the existing directory structure" isn't usually true. The mocked directory structure is simply a flat list of names. There's no pain at all.

Consider this

def setUp( self ):
    structure= [ 
        "/path/to/file/file.x", 
        "/path/to/another/file/file.y", 
        "/some/other/path/file.z",...
    ]
    for p in structure:
        path, file = os.path.split( p )
        try:
            os.makedirs( path )
        except OSError:
            pass
        with open( p, "w" ) as f:
            f.write( "Dummy Data" )

That's all that's required for setUp. tearDown is similar.

S.Lott
I believe that mocking the creation of the data structure would have the same problem the parent posed about actually creating the data structure: the code for creating a mock data structure would be complex.
Rice Flour Cookies
@Rising Star: Not necessarily. You're not mocking everything about the filesystem, just enough to make the application think it's working.
S.Lott
I ended up actually writing a small throw-away program that created a list of all the files in the directory structure (with their full filepaths), saved that list to a text file, and used that text file to populate an "IFilesystem" object in my test class. The object doesn't actually have any complex directory structures; it just searches the list of files at every FileExists() call.
Phil
@Phil: That's a fine "mock" structure. What more could you want for the purposes of unit testing?
S.Lott
A virtual filesystem...
Phil
@Phil: "A virtual filesystem"? What does that mean? In the question you claim that your problem is traversing the directory structure. You have captured the structure for your tests. What more do you want for your tests? Please update the question with the *specific* requirements that you want. You don't need an entire filesystem because your tests only require the directory information which you've already captured. If you want more, please update the question with *specific* requirements.
S.Lott
I jest. I don't need a virtual filesystem. It would just be neat.
Phil
@Phil: Why would all that complexity be "neat" for a unit test? It sounds like a crushing amount of useless featuritis. If you could enumerate the specific features of a virtual file system you're actually going to use for unit tests, I think you'll see that you've already built everything. I don't think there's anything more to wish for than what you've already done.
S.Lott
The reason I say it would be neat is because I want to reduce the amount of code I have to write. The mock filesystem class that I wrote isn't overly complex, but it is complex enough to need its own unit tests. That's not necessarily bad, but if someone else already created a filesystem mocking library, I would just use that and life would be easier.And this is not to mention the fact that I now have to create a production class that also implements IFilesystem, which would just be a wrapper class for already existing libraries. Again, that's not horrible, but I'd like to avoid it if I can.
Phil
@Phil: You wrote a mock file system that required unit tests? And all it has to do is respond with a list of files that you already gathered from production? I'm not getting something. It sounds like you wrote a mock filesystem that does way too much.
S.Lott
Well my interface has only the methods I need, and they are pretty simple methods like "GetDirectories()" and "GetFiles()". I think my problem was that my mock filesystem had to parse the list of filepaths to figure out what directories exist, and after thinking about it, that extra complexity could easily be avoided. So the mock class doesn't do too much, but what it does do could be simplified. I think I get what you're saying. Thanks for taking the time to help me think it out.
Phil
@Phil: "mock class doesn't do too much". That's what we strive for. That's why this question interested me. And why I keep asking what more it has to do. If the answer is "nothing more" then you've already described all the requirements and it looks like you can satisfy them.
S.Lott