views:

489

answers:

4

I'm in need of a datastructure that is basically a list of data points, where each data point has a timestamp and a double[] of data values. I want to be able to retrieve the closest point to a given timestamp or all points within a specified range of timestamps.

I'm using c#. my thinking was using a regular list would be possible, where "datapoint" is a class that contains the timestamp and double[] fields. then to insert, I'd use the built-in binarysearch() to find where to insert the new data, and I could use it again to find the start/end indexes for a range search.

I first tried sortedlists, but it seems like you can't iterate through indexes i=0,1,2,...,n, just through keys, so I wasn't sure how to do the range search without some convoluted function.

but then I learned that list<>'s insert() is o(n)...couldn't I do better than that without sacrificing elsewhere?

alternatively, is there some nice linq query that will do everything I want in a single line?

A: 
Reed Copsey
A: 

You have the choice of cost at insertion, retrieval or removal time. There are various data structures optimized for each of these cases. Before you decide on one, I'd estimate the total size of your structures, how many data points are being generated (and at which frequency) and what will be used more often: insertion or retrieval.

If you insert a lot of new data points at high frequency, I'd suggest looking at a LinkedList<>. If you're retrieving more often, I'd use a List<> even though its insertion time is slower.

Of course you could do that in a LINQ query, but remember this is only sugar coating: The query will execute every time and for every execution search the entire set of data points to find a match. This may be more expensive than using the right collection for the job in the first place.

__grover
LinkedList will be very slow for his range searching, though, which was what he said he was trying to optimize.
Reed Copsey
I agree, however it depends on several factors. Without testing it with a Stopwatch, I wouldn't count on soft facts. I've seen cases where using a LinkedList was actually faster. It depends on the circumstances. Like I said if there's more inserts that the O(n) becomes costly, the LinkedList performs better due to O(1). If there's more retrieval a sorted List<> will perform better than almost anything else.
__grover
A: 

How about using an actual database to store your data and run queries against that? Then, you could use LINQ-to-SQL.

Erich Mirabal
A: 

If you have only static data then any structure implementing IList should be fine. Sort it once and then make queries using BinarySearch. This should also work if your inserted timestamps are always increasing, then you can just do List.Add() in O(1) and it will be still sorted.

    List<int> x = new List<int>();
    x.Add(5);
    x.Add(7);
    x.Add(3);

    x.Sort();

    //want to find all elements between 4 and 6
    int rangeStart = x.BinarySearch(4);

    //since there is no element equal to 4, we'll get the binary complement of an index, where 4 could have possibly been found
    //see MSDN for List<T>.BinarySearch
    if (rangeStart < 0)
        rangeStart = ~rangeStart;

    while (x[rangeStart] < 6)
    {
        //do you business
        rangeStart++;
    }

If you need to insert data at random points in your structure, keep it sorted and be able to query ranges fast, you need a structure called B+ tree. It's not implemented in the framework, you'll need to get it somewhere by your own.

Inserting a record requires O(log n) operations in the worst case

Finding a record requires O(log n) operations in the worst case

Removing a (previously located) record requires O(log n) operations in the worst case

Performing a range query with k elements occurring within the range requires O((log n) + k) operations in the worst case.

P.S. "is there some nice linq query that will do everything I want in a single line"

I wish I knew such a nice linq query that could do everything I want in one line :-)

Yacoder