I need to read a large space-seperated text file and count the number of instances of each code in the file. Essentially, these are the results of running some experiments hundreds of thousands of times. The system spits out a text file that looks kind of like this:
A7PS A8PN A6PP23 ...
And there are literally hundreds of thousands of these entries and I need to count the occurances of each of the codes.
I guess I could just open a StreamReader
and go through line by line, splitting on the space character. Seeing if the code has already been encountered and adding 1 to the count of that code. However, that is probably pretty naive, given the size of the data.
Anyone know of an efficient algorithm to handle this sort of processing?
UPDATE :
OK, so the consensus seems to be my approach is along the right lines
What I'd be interested to hear are things like - which is more efficient - StreamReader. TextReader, BinaryReader
What is the best structure to store my dictionary of results? HashTable, SortedList, HybridDictionary
If there are no line breaks ion the file (I haven't been given a sample yet) will just splitting the whole thing on a space be inefficient?
Essentially, I am looking at making it as performant as possible
thanks again