views:

52

answers:

1

I am finishing up a program that does a large number of calculations and am trying to optimize the innermost loop.

The calculation I am currently looking at iterates over a large number of pairs of words and makes a table of the counts of corresponding pairs of characters. For example, one pair of words might be:

voice
louse

and the character pairs would then be (v,l), (o,o), (i,u), (c,s), and (e,e), and these pairs would all then have a count of 1. If the combination (v,l) is ever encountered again in another word, it would increment that count to two.

What data structure should I use for highest performance? Given the two characters, I need to retrieve the count for that pair. Currently I am using a nested hash table whose declaration looks like:

Dim data As New Dictionary(of String, Dictionary(of String, Integer))

Using this data structure, the program must hash two strings for every integer it retrieves. For every character pair, it must first check to see if the pair is in the hash table, and if not add it, requiring two more hashes. I have also considered a one level hash table with the key being the two characters concatenated together, so key = "vl" and value = 1, but I have read that string concatenation is relatively slow in VB.

So then, my questions are:

How speedy are the Dictionaries in VB? Would four hashes be quicker than one hash and a string concatenation (two level vs one level hash table)?

Can you think of a better structure to store this kind of data that allows fast additions and retrieval?

+4  A: 

One option is to use a Dictionary(Of Integer, Integer). You can convert from any .NET Unicode character to an unsigned 16 bit integer, as they're UTF-16 code units.

You can then combine two unsigned 16 bit integers into a 32 bit integer very easily. Alternatively, you can just convert each code unit into a 32 bit unsigned integer to start with, and shift and combine in the same way :)

In C# I'd just use:

int combination = (((int) char1) << 16) | ((int) char2);

EDIT: According to jeroenh's comment, the VB equivalent is:

Dim combination As Integer = (AscW(char1) << 16) Or AscW(char2)
Jon Skeet
`Dim combination As Integer = (AscW(char1) << 16) Or AscW(char2)`
jeroenh
@jeroenh: Thanks, I've edited that in.
Jon Skeet
Excellent! This makes the calculations around three times faster.
dvcolgan