views:

100

answers:

1

I am using the new WPF toolkit's Chart to plot large data sets. I also have a crosshair tracker that follows the mouse when it's over the chart area to tell exactly what is the value of the nearest data point (see Yahoo! Finance charts).

I use the following code to find the closest data point that is lower (or equal) to where the mouse is currently hovering (the nasty detail about the chart is that it actually interpolates the data to tell you what's the EXACT value where you hove your mouse over, even though the mouse is located between the data points):

TimeDataPoint point = mainSeries.Find(
    new Predicate<TimeDataPoint>(
    delegate(TimeDataPoint p) {
        return xValue > p.Date && !mainSeries.Exists(new Predicate<TimeDataPoint>(
            delegate(TimeDataPoint middlePoint) {
                return middlePoint.Date > p.Date && xValue > middlePoint.Date;
            }));
    }));

[Here, mainSeries is simply a List<TimeDataPoint>]

This works very well for relatively small data sets, but once I go up to 12000+ points (this will increase rapidly), the code above slows down to a standstill (it does a run through data 12000+^2 times).

I am not very good at constructing queries so I am wondering if it is possible to use a better LINQ query to do this.

EDIT: Another idea that was inspired by @Randolpho comment is this: I will search for all points that are lower than given (this will be at most n (here: 12,000+)) and then select a Max<> (which should be also at most O(n)). This should produce the same result but only with order of n operations and thus should be at least a little bit faster...

My other alternative is to actually filter the data set and maintain an upper bound on the number of points depending on the level of details the user wants to see. I would rather not go down that road if there's a possibility of having a more efficient query.

A: 

Pre-compute the closest data points based on the known resolution of display/chart. Then, when you hover over a point, it's a simple lookup of the x/y coordinates against the known pre-computed value.

For performance reasons, do your pre-computation in a separate thread and do not allow the display of those values until the computation is completed. Re-compute every time the size of the chart is changed.

Bottom line: There is no LINQ query that will help you execute every time you do a mouse-over for large data sets. It just can't be done. You're looking at order N^2 no matter what. So pre-compute it and cache it, so you only do your computations once.

This is an intriguing idea but wouldn't I still need to do a look-up of x/y among 12000+ pairs? Could you elaborate on how I should store the pre-computed x/y pairs for a fast look-up? For example, I have data points at (200,300) and (250, 300) and user's mouse is at (225, 300). – Alexandra

Well, I guess that would depend on the graph. Based on your code and your mention of Yahoo Finance Charts, I'm assuming your data only varies by horizontal postion, i.e. for a given X value, you are computing the display data.

In that case, you can a simple Dictionary<int, TimeDataPoint> as your cache. The Key is the transformed X coordinate (i.e. in the coordinate space of your display graph), the Value is the pre-computed TimeDataPoint. The dictionary would have a record for every X coordinate in your display graph, so a 400-pixel-wide graph has 400 pre-computed data points.

If your data varies against both axes, you could instead use Dictionary<System.Windows.Point, TimeDataPoint>, in pretty much the same way, but this would increase the number of items in your Dictionary by an order of magnitude. A 400 by 300 graph would have 120000 entries in the dictionary, so the tradeoff is a higher memory footprint.

Pre-calculating your data is the tricky part; it'd have to be done differently from the way you're currently doing it. I'm going to assume here that xValue in your example is an interpolation of a Date based on the X value, since it's compared to p.Date.

This might work:

private Dictionary<int, TimeDataPoint> BuildCache(List<TimeDataPoint> mainSeries)
{
    int xPrevious = 0;
    int xCurrent = 0;
    Dictionary<int, TimeDataPoint> cache = new Dictionary<int, TimeDataPoint>();
    foreach(var p in mainSeries)
    {
        xCurrent = XFromDate(p.Date);
        for(int val = xPrevious; val < xCurrent; val++)
        {
            cache.Add(val, p);
        }
        xPrevious = xCurrent;
    }
    return cache;
}

XFromDate would extract the X coordinate for a particular date. I'll leave doing that up to you. :)

Randolpho
This is an intriguing idea but wouldn't I still need to do a look-up of x/y among 12000+ pairs? Could you elaborate on how I should store the pre-computed x/y pairs for a fast look-up?For example, I have data points at (200,300) and (250, 300) and user's mouse is at (225, 300).
Alexandra