ansaurus

Question

How to find difference in csv record

Answer 1

+1 A:

there is Algorith::Diff

dsm 2009-05-29 11:25:53

Answer 2

A:

You can use hashes in Perl. Read each file into a separate hash, something like

my %File1 = ();
my %File2 = ();
# Filehandles FP1 and FP2 is opened for read
while (<FP1>) {
    if (/^([^,]+),(.+)$/) {
        my ($key, $value) = ($1, $2);
        $File1{$key} = $value;
    }
}
# Repeat for FP2

To print out the results, you can loop through the hashes and check if the key/value is identical, different or missing in various ways. Example:

for my $key (keys %File1) {
    if (defined($File1{$key}) && defined($File2{$key}) {
        print("$key exists in both files\n");
    } elsif (defined($File1{$key})) {
        print("$key exists only in file1\n");
    }
}
# Repeat for %File2

sunny256 2009-05-29 11:26:42

Answer 3

+1 A:

You can use the unix 'join' command to do this. It is available in Cygwin for Windows as well.

Example:

$ join -t ',' -v 1 file1 file2
key2,value2
key7,value7
$ join -t ',' -v 2 file1 file2
key4,value4
key6,value6
$ join -t ',' file1 file2
key1,value1,value1
key3,value3,value3
key5,value5,value5

JimG 2009-05-29 11:27:56

There is also `comm`.

Svante 2009-05-29 11:50:25

Both will break when you have a quoted comma in your CSV though.

yangyang 2009-05-29 11:53:48

Answer 4

A:

You could take a look at my FOSS CSV stream editor CSVfix, which does what you want via the join command - no programming needed.

anon 2009-05-29 11:32:15

Answer 5

+4 A:

diff can do what you want..

diff file1.csv file2.csv --old-line-format="< %L" --new-line-format="> %L" --unchanged-line-format="= %L"

Anon 2009-05-29 11:39:24

I think this is really something I could use, in one command I got all the differences.

Hendra Saputra 2009-05-30 06:51:25

I would add that you should sort both of the CSV files first using the sort command. I would also add that is not 100% reliable if your CSV has any multi-line rows in which case you would need to parse the file properly with Text::xSV, record the keys into hashes and compare them with List::Util and/or List::MoreUtil.

jiggy 2009-06-01 15:35:19

Answer 6

A:

How about an example using SQLite?

DROP TABLE 'file1';
DROP TABLE 'file2';

CREATE TABLE 'file1' (
    key_field VARCHAR primary key,
    value_field VARCHAR
);

CREATE TABLE 'file2' (
    key_field VARCHAR primary key,
    value field VARCHAR
);


.bail off
.separator ,
.import file1.csv file1
.import file2.csv file2

.output stdout
.header on

SELECT col1 AS 'In file1.csv, not in file2.csv' FROM (
    SELECT file1.key_field AS col1,
           file2.key_field AS col2 
    FROM file1 LEFT OUTER JOIN file2
    ON file1.key_field == file2.key_field
) 
WHERE col2 IS NULL
;

SELECT col2 AS 'In file2.csv, not in file1.csv'FROM (
    SELECT file1.key_field AS col1,
           file2.key_field AS col2
    FROM file2 LEFT OUTER JOIN file1
    ON file2.key_field == file1.key_field
) WHERE col1 IS NULL
;

SELECT file1.key_field AS 'In both file1.csv and file2.csv'
    FROM file1 INNER JOIN file2
    WHERE file1.key_field == file2.key_field
;

Here is the output:

C:\Temp> sqlite3 test.db < t.sql
In file1.csv, not in file2.csv
key2
key7
In file2.csv, not in file1.csv
key4
key6
In both file1.csv and file2.csv
key1
key3
key5

Sinan Ünür 2009-05-29 12:32:56

Answer 7

A:

take a look at http://sourceforge.net/projects/csvdiff/

Quote: csvdiff is a Perl script to diff/compare two csv files with the possibility to select the separator. Differences will be shown like: "Column XYZ in record 999" is different. After this, the actual and the expected result for this column will be shown.

max muster 2010-07-23 12:05:41

ansaurus

tags:

views:

answers:

How to find difference in csv record

related questions