views:

2443

answers:

7

I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?

Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.

A: 

Maybe... compare MD5 digests of each file? I'm sure there are a lot of ways to do it. You could just open both files and compare each byte.

EDIT: James stated how the XLS format might have differences in the metadata. Perhaps you should use the same interface you used to generate the xls files to open them and compare the values from cell to cell?

CookieOfFortune
This is a good idea, except you do have to consider if the files will be identical (i.e. that there's not any metadata involved, such as in the XLSX format, for example. This differing metadata will, obviously, produce different hashes).
James Burgess
+1  A: 

You could use javaxdelta to check whether the two files are the same. It's available from here:

http://javaxdelta.sourceforge.net/

Jon
I eventually got javaxdelta working after mucking around with its dependency on the "trove" library, but although it works as advertised, sleske is right that I need a canonical comparison, not a byte-by-byte comparison. Thanks anyway for the suggestion, which I've voted up.
Andrew Swan
A: 

Just found out there's something in commons-io's FileUtils. Thanks for the other answers.

Andrew Swan
This actually doesn't solve my problem, as there seem to be differences between the Excel files that aren't due to geniune content differences. I'll try sleske's suggestion of parsing the files' contents and doing a canonical comparison.
Andrew Swan
+4  A: 

A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.

However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out. So you'll need another kind of comparison for your test.

I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).

Apache POI might be useful for reading the file.

BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...

sleske
You're right, I was using the term unit test loosely; in fact it's an integration test run by JUnit. I'll fix that now.
Andrew Swan
Thanks for putting me on the right track; see my DBUnit solution below (or above, depending on votes!)
Andrew Swan
A: 

Please, take a look at the site to compare the binary files, http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html

Tiger

Tiger
Two issues: that code is just doing a byte-for-byte comparison, which as sleske points out, isn't ideal where Excel is concerned. Also, I was after a third-party utility method, not a block of code on some forum that may or may not work. I'll update the question to make this clear.
Andrew Swan
+2  A: 

Here's what I ended up doing (with the heavy lifting being done by DBUnit):

/**
 * Compares the data in the two Excel files represented by the given input
 * streams, closing them on completion
 * 
 * @param expected can't be <code>null</code>
 * @param actual can't be <code>null</code>
 * @throws Exception
 */
private void compareExcelFiles(InputStream expected, InputStream actual)
  throws Exception
{
  try {
    Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
  }
  finally {
    IOUtils.closeQuietly(expected);
    IOUtils.closeQuietly(actual);
  }
}

This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.

Andrew Swan
Hey using XlsDataSet from DBUnit is a really clever idea; didn't think of that :-).
sleske
A: 

You may use Beyond Compare 3 which can be started from command-line and supports different ways to compare Excel files, including:

  • Comparing Excel sheets as database tables
  • Checking all textual content
  • Checking textual content with some formating
Wernight