views:

27

answers:

1

I have a table with 4 colums and N rows. At the beginning N will be around 1000 and will have tendency to grow up to 3000.

  • 1st: string unique
  • 2nd: int with N/5 unique values
  • 3rd: int with 5 unique values
  • 4th: data value

The objective is to get to the value of the 4th column with different queries, ex: "get the value, where the 1st column is 17", or: "get all values where the 2nd column is 7", or: "does any row has this data". ~40% of queries will be done againsth the 4th column, ~30% against 3rd, ~20% 2nd and ~10% for 1st.

Since there would be around 100 queries per second, and around 2 changes (add/update/remove) per second against this table, I was wondering, what would be the fastest way (in C#) to manage this data? Memory is not an issue

I'm currently using a SortedDictionary, where the key is the 4th data value; and the dictionary's value is a class containing the first three values. Verifying the "4th column" is now easy by just using ContainsKey; and when querrying by other values I use:

foreach(var object in Objects) if(Objects[Data].2nd==object.Value.2nd) {...}

Any suggestions appreciated.

+1  A: 

This is the equivalent problem of how much indexing to use on a table in a database. If you want fast lookups on all 4 columns, you could created SortedDictionarys of each column type, and use the corresponding dictionary for lookups, but this will increase your add/update/remove time by having to update all 4 dictionaries (not to mention locking as well). It all depends on how fast you want updates and lookups on different columns to be.

However, given that multiple columns can have the same data, and SortedDictionary depends on unique key values, you may want to either write your own datastructure or use one of the MultiSet classes available in C# collection libraries (C5 springs to mind, but there are several others)

thecoop
I was thinking the same thing, but it seemed like a waste of memory :)
avance70
It's exactly what happens when you create an index on a column in a database - a B-tree (or variant) is built up containing copies of the index column data
thecoop