views:

65

answers:

2

I have several medium-sized data sets in-memory that I need to be able to filter and find information from quickly. The data sets are small enough that I don't want to take the performance hit of going to a database every time I need an entry but large enough that I really need to index the data somehow.

Currently, I'm using POCO objects with one or more dictionaries for indexing. This works excellent when I need to find something by a specific key, but sometimes that isn't the case. As an example, I often need to find an entry within a specific date-time-range. And sometimes I need the entry with the lowest price. Most often, queries look at a few simple keys and one or two other fields at the same time.

Are there any tools, products, libraries (targeting the .NET-framework) that can help me with this? Or do I need to pick up that big dusty old Algorithms book and start looking at search-trees?

An example:

Trip

  • DepartureCode
  • DestinationCode
  • HotelCode
  • RoomCode
  • Date
  • Price

I need the query to be something like "get me the least expensive Trip between 2010-03-09 and 2010-03-12 where DepartureCode=LAX DestinationCode=NYC"

A: 

How about the DataSet.Table("YourTable").Select() method?

Dim myRows() as DataRow = myDataSet.Tables("myTable").Select("Date>" & _
    myBeginDate & "AND Date<" & myEndDate)

EDIT: From MSDN

DataView Construction

The DataView builds an index for the data in the underlying DataTable when both the DataView is created, and when the Sort, RowFilter or RowStateFilter properties are modified. When creating a DataView object, use the DataView constructor that takes the Sort, RowFilter, and RowStateFilter values as constructor arguments (along with the underlying DataTable). The result is the index is built once. Creating an "empty" DataView and setting the Sort, RowFilter or RowStateFilter properties afterward results in the index being built at least twice.

So if you're wanting to index your DataSet, it looks like a DataView could provide that for you.

smoore
That is a good way of querying (even though I prefer LINQ), but it won't be indexed, right?
CodingInsomnia
I see what you're saying. I think I misread the question.
smoore
Unfortunately I don't think it is possible to create more than one index in the dataview, and one index is easy enough for me to create using dictionaries or a sorted list. I would love to be proven wrong on this though, I haven't really researched it that much..
CodingInsomnia
+2  A: 

"Lowest price" and "specific date/time range" can both be handled using just a sorted collection and binary search. SortedList / SortedDictionary (or SortedSet if you're using .NET 4.0) probably do everything you need here, with only a fairly small amount of work.

Jon Skeet
I'll look into those a bit. I think, however, that they probably won't solve the problem when I need to first look at some keys, and then at a specific range. I've edited the question with an example.
CodingInsomnia
@andlju: I strongly suspect you're not going to optimise that sort of query very easily without a *lot* of work. After you've applied one (or maybe two) keys - e.g. "find all the trips from LAX to NYC" do you still actually have a lot of trips to work through?
Jon Skeet
That is a very good point. In that specific case I still would have a very large amount of trips but the example is somewhat extreme and not very realistic. In most scenarios I'd probably have a more reasonable set of entries left after applying filters on the keys only.
CodingInsomnia