views:

38

answers:

0

I'm trying to build a system for programmatically filtering timeseries data and wonder if this problem has been solved, or at least hacked at, before. It seems that it's a perfect opportunity to do some Ruby block magic, given the scope and passing abilities; however, I'm still a bit short of fully grokking how to take advantage of blocks.

To wit:

Pulling data from my database, I can create either a hash or an array, let's use array:

data = [[timestamp0, value0],[timestamp1,value1], … [timestampN, valueN]]

Then I can add a method to array, maybe something like:

class Array
  def filter &block
    …
    self.each_with_index do |v, i|
      …
      # Always call with timestep, value, index
      block.call(v[0], v[1], i)
      …
    end
  end
end

I understand that one powers of Ruby blocks is that the passed block of code happens within the scope of the closure. So somehow calling data.filter should allow me to work with that scope. I can only figure out how to do that without taking advantage of the scope. To wit:

# average if we have a single null value, assumes data is correctly ordered
data.filter do |t, v, i|
  # Of course, we do some error checking…
  (data[i-1] + data[i+1]) / 2 if v.nil?
end

What I want to do is actually is (allow the user to) build up mathematical filters programmatically, but taking it one step at a time, we'll build some functions:

def average_single_values(args)
  #average over single null values
  #return filterable array
end

def filter_by_std(args)
  #limit results to those within N standard deviations
  #return filterable array
end

def pull_bad_values(args)
  #delete or replace values seen as "bad"
  #return filterable array
end

my_filters == [average_single_values, filter_by_std, pull_bad_values]

Then, having a list of filters, I figure (somehow) I should be able to do:

data.filter do |t, v, i|
  my_filters.each do |f|
    f.call t, v, i
  end
end

or, assuming a different filter implementation:

filtered_data = data.filter my_filters

which would probably be a better way to design it, as it returns a new array and is non-destructive

The result being an array that has been run through all of the filters. The eventual goal, is to be able to have static data arrays that can be run through arbitrary filters, and filters that can be passed (and shared) as objects the way that Yahoo! Pipes does so with feeds. I'm not looking for too generalized a solution right now, I can make the format of the array/returns strict.

Has anyone seen something similar in Ruby? Or have some basic pointers?