I'm not sure this is worth attempting to optimize very far. You could implement some kind of binary search with this, but it's effectiveness would be rather limited. How many elements are we talking about?
Without unmatched elements in target
Assuming the lists are sorted, and no elements in target
exist that cannot be matched against source
:
static List<string>[] FindMatches(string[] source, string[] target)
{
// Initialize array to hold results
List<string>[] matches = new List<string>[source.Length];
for (int i = 0; i < matches.Length; i++)
matches[i] = new List<string>();
int s = 0;
for (int t = 0; t < target.Length; t++)
{
while (!MatchName(source[s], target[t]))
{
s++;
if (s >= source.Length)
return matches;
}
matches[s].Add(target[t]);
}
return matches;
}
With unmatched elements
If there is a possibility of elements existing in target
that have no match in source
, the above will break (if the elements are not at the end of target). To fix that, it would be best to go with a different implementation for the comparison. Instead of a boolean, we need it to return 'less than', 'equal' or 'greater than', like a Comparer when used in sorting:
static List<string>[] FindMatches(string[] source, string[] target)
{
// Initialize array to hold results
List<string>[] matches = new List<string>[source.Length];
for (int i = 0; i < matches.Length; i++)
matches[i] = new List<string>();
int s = 0;
for (int t = 0; t < target.Length; t++)
{
int m = CompareName(source[s], target[t]);
if (m == 0)
{
matches[s].Add(target[t]);
}
else if (m > 0)
{
s++;
if (s >= source.Length)
return matches;
t--;
}
}
return matches;
}
static int CompareName(string source, string target)
{
// Whatever comparison you need here, this one is really basic :)
return target[0] - source[0];
}
The two are otherwise essentially the same. As you can see, you loop through the target elements once, advancing the index to the source array when you no longer find a match.
If the number of source elements is limited, it might be worth doing a little smarter search. If the number of source elements is also large, the supposed benefit of that would decrease.
Then again, the first algorithm takes 0.18 seconds with 1 million target elements on my machine, in Debug mode. The second one is even faster (0.03 seconds), but that because of the simpler comparison that is being done. Possibly you would have to compare everything up to the first whitespace character, making it significantly slower.