Hello,
The situation I'm facing is as follows:
There are large number of 'flat' files from which data is extracted by a C# app in order to create entries which are in turn written in a database (MS SQL server). A full release of the database comprises of ~ 97 million entries across 220 GB.
The task is to create a differential update of the data in the database by parsing a new full release and finding out which of the existing entries have been updated. An entry is considered to be updated if any of its properties has been changed.
[UPDATE] Each entry has a unique ID.
The problem is that the data provider does not supply any indication of entry modification (a version number or a last modification date) - only full releases.
The solution I've come up with so far is to generate a hash sum for each entry and then compare the new to the old one.
The other aspect of the issue which makes hash sums undesirable is the combo between the size of the data and number of entries - it's just staggering.
So, is there a better solution than this?
Any help with the case will be greatly appreciated!
All the best, Borislav