I am finishing up a program that does a large number of calculations and am trying to optimize the innermost loop.
The calculation I am currently looking at iterates over a large number of pairs of words and makes a table of the counts of corresponding pairs of characters. For example, one pair of words might be:
voice
louse
and the character pairs would then be (v,l), (o,o), (i,u), (c,s), and (e,e), and these pairs would all then have a count of 1. If the combination (v,l) is ever encountered again in another word, it would increment that count to two.
What data structure should I use for highest performance? Given the two characters, I need to retrieve the count for that pair. Currently I am using a nested hash table whose declaration looks like:
Dim data As New Dictionary(of String, Dictionary(of String, Integer))
Using this data structure, the program must hash two strings for every integer it retrieves. For every character pair, it must first check to see if the pair is in the hash table, and if not add it, requiring two more hashes. I have also considered a one level hash table with the key being the two characters concatenated together, so key = "vl" and value = 1, but I have read that string concatenation is relatively slow in VB.
So then, my questions are:
How speedy are the Dictionaries in VB? Would four hashes be quicker than one hash and a string concatenation (two level vs one level hash table)?
Can you think of a better structure to store this kind of data that allows fast additions and retrieval?