For each day we have approximately 50,000 instances of a data structure (this could eventually grow to be much larger) that encapsulate the following:
DateTime AsOfDate;
int key;
List<int> values; // list of distinct integers
This is probably not relevant but the list values
is a list of distinct integers with the property that for a given value of AsOfDate
, the union of values
over all values of key
produces a list of distinct integers. That is, no integer appears in two different values
lists on the same day.
The lists usually contain very few elements (between one and five), but are sometimes as long as fifty elements.
Given adjacent days, we are trying to find instances of these objects for which the values of key
on the two days are different, but the list values
contain the same integers.
We are using the following algorithm. Convert the list values
to a string via
string signature = String.Join("|", values.OrderBy(n => n).ToArray());
then hash signature
to an integer, order the resulting lists of hash codes (one list for each day), walk through the two lists looking for matches and then check to see if the associated keys differ. (Also check the associated lists to make sure that we didn't have a hash collision.)
Is there a better method?