ansaurus

Question

Comparing records in file and reporting stats - Scenario 2

Answer 1

+1 A:

Why not import all the data to a SQLite database. You need only one table with a single primary key corresponding to the unique identifier common to both systems. Columns should be the union of legacy and new fields.

Import one data set first, say the set generated by the new system. Then, for every item in the legacy set, try UPDATE on the corresponding entry in the table: If UPDATE fails, you know that the new data set is missing those entries that used to exist in the old system.

If any of the columns corresponding to the legacy data have NULLs, then you know the entries in the new system that did not exist in the legacy system.

You can then SELECT rows where any column from the new system does not match the corresponding column from the old system.

IMHO, this more flexible than a hash table based system.

Sinan Ünür 2009-05-20 23:01:03

Hmm.. but would not the on disk hash actually do the very same thing? I am yet to look deeply into it, but it seems I can tell the PERL lib to use any of the light weight DB like Berkley DB, SQL Lite etc...

PoorLuzer 2009-05-20 23:05:14

Well, using a database that understands SQL means you can run various queries without having to modify the program. I would only use Perl (or any other text processing facility) to prepare the data files for import into the database. By the way, Berkeley and SQLite are very different beasts.

Sinan Ünür 2009-05-21 00:36:15

+1! Excellent explanation :-)

PoorLuzer 2009-05-21 03:18:09

Thank you. I have benefited from this approach in the past.

Sinan Ünür 2009-05-21 03:25:43

After a lot of thought and reading, I agree with Sinan. Great analysis!

PoorLuzer 2009-05-23 12:15:24

ansaurus

tags:

views:

answers:

Comparing records in file and reporting stats - Scenario 2

related questions