views:

7726

answers:

16

We have a lot of spreadsheets (xls) in our subversion repository. These are usually edited with gnumeric or openoffice.org, and are mostly used to populate databases for unit testing with dbUnit. There are no easy ways of doing diffs on xls files that I know of, and this makes merging extremely tedious and error prone.

I've found Spreadsheet Compare, but it requires Excel 2000 or later. I've also tried to convert the spreadsheets to xml and doing a regular diff, but it really feels as a last resort.

Are there any tools for diffing two spreadsheets (xls or ods)? I am primarily looking for a multi-platform/open source tool.

A: 

Diff Doc may be what you're looking for.

  • Compare documents of MS Word (DOC, DOCX etc), Excel, PDF, Rich Text (RTF), Text, HTML, XML, PowerPoint, or Wordperfect and retain formatting
  • Choose any portion of any document (file) and compare it against any portion of the same or different document (file).
ConroyP
Diff Doc is windows-only and closed source, it really doesn't fit my needs.
neu242
A: 

I don't know of any tools, but there are two roll-your-own solutions that come to mind, both require Excel:

  1. You could write some VBA code that steps through each Worksheet, Row, Column and Cell of the two Workbooks, reporting differences.

  2. If you use Excel 2007, you could save the Workbooks as Open-XML (*.xlsx) format, extract the XML and diff that. The Open-XML file is essentially just a .zip file of .xml files and manifests.

You'll end up with a lot of "noise" in either case if your spreadsheets aren't structurally "close" to begin with.

Wayne
From Excel 2002 onwards you can also save in 'XML Spreadsheet' format which is simpler than dealing with xlsx files.
Sam Warwick
+6  A: 

If you don't want to shell out for the Diff Doc program and your spreadsheets are not very complicated, you could export them both as CSV and use a tool like Beyond Compare

Shabbyrobe
+2  A: 

I haven't tried but I know OpenOffice allows you to compare 2 documents. See the post of Kurt Stam for further details.

What do you think ?

paulgreg
+2  A: 

Do you use TortoiseSVN for doing your commits and updates in subversion? It has a diff tool, however comparing Excel files is still not really user friendly. In my environment (Win XP, Office 2007), it opens up two excel files for side by side comparison.

Right click document > Tortoise SVN > Show Log > select revision > right click for "Compare with working copy".

Casper
+1  A: 

I would use the SYLK file format if performing diffs is important. It is a text-based format, which should make the comparisons easier and more compact than a binary format. It is compatible with Excel, Gnumeric, and OpenOffice.org as well, so all three tools should be able to work well together. SYLK Wikipedia Article

Adam Hawkes
+4  A: 

Quick and easy with no external tools, works well as long as the two sheets you are comparing are similar:

  • Create a third spreadsheet
  • Type =if(Sheet1!A1 <> Sheet2!A1, "X", "") in the top left cell (or equivalent: click on the actual cells to automatically have the references inserted into the formula)
  • Ctrl+C (copy), Ctrl+A (select all), Ctrl+V (paste) to fill the sheet.

If the sheets are similar, this spreadsheet will be empty except for a few cells with X in them, highlighting the differences. Unzoom to 40% to quickly see what is different.

Laurent
I searched stackoverflow for a solution to this, I use mostly numeric data so I do a subtraction to see the differance. And conditional formatting to set a red-color for reduced and blue for increased. What I'm looking for is an easy way to deal with non-numeric codes mixed with the numbers.
tovare
+5  A: 

I've done a lot of comparing of Excel workbooks in the past. My technique works very well for workbooks with many worksheets, but it only compares cell contents, not cell formatting, macros, etc. Also, there's some coding involved but it's well worth it if you have to compare a lot of large files repeatedly. Here's how it works:

A) Write a simple dump program that steps through all worksheets and saves all data to tab-separated files. Create one file per worksheet (use the worksheet name as the filename, e.g. "MyWorksheet.tsv" [4]), and create a new folder for these files each time you run the program. Name the folder after the excel filename and add a timestamp, e.g. "20080922-065412-MyExcelFile". I did this in Java using a library called JExcelAPI [1]. It's really quite easy.

B) Add a Windows shell extension to run your new Java program from step A when right-clicking on an Excel file. This makes it very easy to run this program. You need to Google how to do this, but it's as easy as writing a *.reg file.

C) Get BeyondCompare [2]. It has a very cool feature to compare delimited data by showing it in a nice table, see screenshot at [3].

D) You're now ready to compare Excel files with ease. Right-click on Excel file 1 and run your dump program. It will create a folder with one file per worksheet. Right-click on Excel file 2 and run your dump program. It will create a second folder with one file per worksheet. Now use BeyondCompare (BC) to compare the folders. Each file represents a worksheet, so if there are differences in a worksheet BC will show this and you can drill down and do a file comparison. BC will show the comparison in a nice table layout, and you can hide rows and columns you're not interested in.

Footnotes:

[1] http://jexcelapi.sourceforge.net/

[2] http://www.scootersoftware.com/

[3] http://www.scootersoftware.com/moreinfo.php?zz=screenshot&amp;shot=DataCompare

[4] I name tab-separated files *.tsv

thvo
+2  A: 

Hmmm. From the Excel menu choose Window -> Compare side by side?

A: 

I found an openoffice macro here that will invoke openoffice's compare documents function on two files. Unfortunately, openoffice's spreadsheet compare seems a little flaky; I just had the 'Reject All' button insert a superfluous column in my document.

A: 

Shabbyrobe has the most straight forward approach. Dump both spreadsheets to CSV and file diff them with BeyondCompare or the like.

David Robbins
A: 

I have found xdocdiff for WinMerge(http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html). It is a plugin for WinMerge (both OPENSOURCE and freeware, you doesn't need to write a VBA nor save an excel to csv or xml). It works just for the celd's contains.

This plug in support also: .rtf Rich Text

.docx/.docm Microsoft WORD 2007(OOXML)

.xlsx/.xlsm Microsoft Excel 2007(OOXML)

.pptx/.pptm Microsoft PowerPoint 2007(OOXML)

.doc Microsoft WORD ver5.0/95/97/2000/XP/2003

.xls Microsoft Excel ver5.0/95/97/2000/XP/2003

.ppt Microsoft PowerPoint 97/2000/XP/2003

.sxw/.sxc/.sxi/.sxd OpenOffice.org

.odt/.ods/.odp/.odg Open Document

.wj2/wj3/wk3/wk4/123 Lotus 123

.wri Windows3.1 Write

.pdf Adobe PDF

.mht Web Archive

.eml Exported files from OutlookExpress

Regard, Andres

Andres
A: 

Convert to cvs then upload to a version control system then diff with an advanced version control diff tool. When I used perforce it had a great diff tool, but I forget the name of it.

iterationx
A: 

xdocdiff plugin for SVN

graveDust
xdocdiff looks nice, but it seems to require TortioseSVN
neu242
How about good old 'diff' command plus xdoc2txt.
graveDust
A: 

I know several responses have suggested exporting the file to csv or some other text format, and then comparing them. I haven't seen it mentioned specifically, but Beyond Compare 3 has a number of additional file formats that it supports. See http://www.scootersoftware.com/support.php?zz=kb_moreformats. Using one of the Microsoft Excel File Formats you can easily compare two Excel files without going through the export to another format option.

mattsmith321
A: 

4TOPS Compare spreadsheets works both with Excel and OpenOffice. You need to have Excel installed for it to work, though.