views:

52

answers:

1

I'm working on a a tool to generate TSV files for import into a database using bcp.exe and I'd like to validate my output. I can do this by comparing the file I generate to the files produced by exporting using bcp from an existing database. My problem is that the ordering can sometimes be different between files. I'd like a tool that will tell me just if there are lines that have no exact match in a pair of files, irregardless of the order of the lines.

+2  A: 

'Irregardless' of whether 'irregardless' is a word...

The reliable way to do that comparison is to sort the two files into the same order, and then do a file comparison. Since you mention 'bcp.exe', that sounds more like Windows and probably MS SQL Server than Unix and Sybase.

I'd probably use Cygwin and either diff or comm to compare (and sort to order) the files, or any equivalent Unix workalike toolset (MKS, ...). Other people might recommend other tools. It depends, in part, on how many differences you think you're likely to find normally, and how you will handle them after you find them. Is a GUI output necessary? Also, you face a problem tracking the differences back to specific line numbers in the unsorted data files.

Jonathan Leffler