views:

56

answers:

4

I have a python function that writes an output file to a disk.

I want to write a unit test for it using python unittest module.

How should I assert equality of files? I would like to get an error if the file content differs from the expected one + list of differences. As in the output of unix diff command.

Is there any official/recommended way of doing that?

+5  A: 

The simplest thing is to write the output file, then read its contents, read the contents of the gold (expected) file, and compare them with simple string equality. If they are the same, delete the output file. If they are different, raise an assertion.

This way, when the tests are done, every failed test will be represented with an output file, and you can use a 3rd-party tool to diff them against the gold files (Beyond Compare is wonderful for this).

If you really want to provide your own diff output, remember that the Python stdlib has the difflib module. The new unittest support in Python 3.1 includes an assertMultiLineEqual method that uses it to show diffs, similar to this:

    def assertMultiLineEqual(self, first, second, msg=None):
        """Assert that two multi-line strings are equal.

        If they aren't, show a nice diff.

        """
        self.assertTrue(isinstance(first, str),
                'First argument is not a string')
        self.assertTrue(isinstance(second, str),
                'Second argument is not a string')

        if first != second:
            message = ''.join(difflib.ndiff(first.splitlines(True),
                                                second.splitlines(True)))
            if msg:
                message += " : " + msg
            self.fail("Multi-line strings are unequal:\n" + message)
Ned Batchelder
+1  A: 

You could separate the content generation from the file handling. That way, you can test that the content is correct without having to mess around with temporary files and cleaning them up afterward.

If you write a generator method that yields each line of content, then you can have a file handling method that opens a file and calls file.writelines() with the sequence of lines. The two methods could even be on the same class: test code would call the generator, and production code would call the file handler.

Don Kirkby
+1  A: 

I prefer to have output functions explicitly accept a file handle (or file-like object), rather than accept a file name and opening the file themselves. This way, I can pass a StringIO.StringIO (or more usually a cStringIO.StringIO) object to the output function in my unit test, then .read() the contents back from that StringIO object (after a .seek(0) call) and compare with my expected output.

For example:

##File:lamb.py
import sys

def write_lamb(filename):
    outfile = open(filename, 'w')
    outfile.write("Mary had a little lamb.\n")
    outfile.close()

if __name__ == '__main__':
    write_lamb(sys.argv[1])


##File test_lamb.py
import unittest
import tempfile
import lamb

class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        tempfile_path = tempfile.mkstemp()[1]
        lamb.write_lamb(tempfile_path)
        expected = "Mary had a little lamb.\n"
        result = open(tempfile_path).read()
        try:
            # NOTE: You could replace this with a string-comparison
            # method like assertMultiLineEqual
            self.assertEqual(result, expected)
        finally:
            # NOTE: To retain the tempfile if the test fails, remove
            # the try-finally clause.
            os.remove(tempfile_path)

Goes to this

##File:lamb2.py
import sys

def write_lamb(outfileh):
    outfileh.write("Mary had a little lamb.\n")

if __name__ == '__main__':
    outfile = open(sys.argv[1])
    write_lamb(outfile)
    outfile.close()


##File test_lamb2.py
import unittest
#import tempfile
import cStringIO
import lamb2

class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        tempfile = cStringIO.StringIO()
        # NOTE: Alternatively, for Python 2.6+, you can use
        # tempfile.SpooledTemporaryFile, e.g.,
        #tempfile = tempfile.SpooledTemporaryFile(10 ** 9)
        lamb.write_lamb(tempfile)
        expected = "Mary had a little lamb.\n"
        tempfile.seek(0)
        result = tempfile.read()
        self.assertEqual(result, expected)

This approach has the added benefit of making your output function more flexible if, for instance, you decide you don't want to write to a file, but some other buffer, since it will accept all file-like objects.

Note that using StringIO assumes the contents of the test output can fit into main memory. For very large output, use a temporary file approach (e.g., tempfile.SpooledTemporaryFile).

gotgenes
A: 

Based on suggestions I did the following.

class MyTestCase(unittest.TestCase):
    def assertFilesEqual(self, first, second, msg=None):
        first_f = open(first)
        first_str = first_f.read()
        second_f = open(second)
        second_str = second_f.read()
        first_f.close()
        second_f.close()

        if first_str != second_str:
            first_lines = first_str.splitlines(True)
            second_lines = second_str.splitlines(True)
            delta = difflib.unified_diff(first_lines, second_lines, fromfile=first, tofile=second)
            message = ''.join(delta)

            if msg:
                message += " : " + msg

            self.fail("Multi-line strings are unequal:\n" + message)

I created a subclass MyTestCase as I have lots of functions that need to read/write files so I really need to have re-usable assert method. Now in my tests, I would subclass MyTestCase instead of unittest.TestCase.

What do you think about it?

jan