views:

76

answers:

1

I am gathering data from a website. I estimate to get 10.000 datapoints (time - value) multipled by seven - over time. That is way to much. Both for storing and plotting it in a real time alike graph (through jQuery flot). I'm looking for a text dealing with this sort of problems. To be more precise: algorithms, statistical math for finding least significant points (if that would be a good idea), general ideas on dealing with this sort of problem. If a text were available on the net that be great. Reference to a book would do also.

+1  A: 

Reading the apha beta pruning article on Wikipedia I came up with this idea: The least significant point is the point where the smallest change took place. In the data array that would be the difference between arr[i-1] and arr[i+1]. Then it's easy to find i:

var smallest = 10000; // large to start with
var rememberI = 0;
function prune(arr){
    for(i in arr){
        if(i > 0 && i < arr.length){
            var test = arr[i+1] - arr[i-1];
            if(test < smallest){
                smallest = test;
                rememberI = i;
            }
        }
    }
    return rememberI;
}

I haven't tested it yet, but it looks like a promising idea.

Afwas
I tested this on another -similar- datastream. It seems to work nice. It has pruned the timeline and now it now starts to prune the recently added datapoints because they are taken close after one another (not much change there).
Afwas