views:

298

answers:

4

As my Rails app matures, it's becoming increasingly apparent that it has a strong data warehouse flavour, lacking only a facts table to make everything explicit.

On top of that, I just read Chapters 2 (Designing Beautiful APIs) and 3 (Mastering the Dynamic Toolkit) of Ruby Best Practices.

Now I'm trying to figure out how best to design the fact-retrieving part...

Say I have the following dimensions (existing Models in the app):

  • Product (contains funds)
  • Fund
  • Measure (e.g. total holding, average holding, average exposure)

... and a good old general-purpose Fact:

  • Fact (date, value, plus a foreign-key NULLable column for each of my dimensions)

Some aspects on which I'd be grateful to get some advice:

  • What might constitute a flexible retrieval interface?
  • What happens if I have Facts with both NULL (i.e. all, or don't care) and NOT NULL (specific) values for a dimension? A pseudo-value like :all? Or should some convention apply?
  • How to select only a subset of dimension values? Or exclude a subset? :only and :exclude?
  • Has anyone had experience with creating named_scopes to deal with this? There's the obvious attraction of being able to chain one for each dimension of interest, but does it get too clunky if we get to 7 or 8 dimensions?

(I'm aware that an acts_as_fact plugin is reputed to exist in some form (at least, there was some small buzz at RailsConf 2006) but I couldn't find any code or description of how it might have worked.)

Versions: Rails, ActiveRecord 2.1.2, Oracle Enhanced Adapter 1.2.0

EDIT: I looked at ActiveWarehouse and have some reservations: - the main branch hasn't had a commit since Nov-08 and no there's activity at all since Jan-09; - the tutorial dates to 2006, is admitted to be out of date, and 404s on me; - it seems to be wanting to get away from ActiveRecord - much of my app will stay in AR and I think at present that I want an AR solution.

So I'll steer clear of that one, thanks!

+2  A: 

What happens if I have Facts with both NULL (i.e. all, or don't care) and NOT NULL (specific) values for a dimension? A pseudo-value like :all? Or should some convention apply?

NULL would be a bid misleading because it stands for no association. I would use a value like -1 (if it is an integer foreign_key with only values > 0).

How to select only a subset of dimension values? Or exclude a subset?

with_scope()

you could also overwrite the find function

   def self.find(*args)
    if  anything
      with_scope(a_scope) do
         result = super *args
      end
    else
      result = super *args
    end
   end

   def self.a_scope
    {:find => { :conditions => ["person_id  = ?", me] , :readonly => true}}
   end

Has anyone had experience with creating named_scopes to deal with this? There's the obvious attraction of being able to chain one for each dimension of interest, but does it get too clunky if we get to 7 or 8 dimensions?

We have an olap database with 4 dimension and it works nice. I think if you implement some custom methods for active_record, you will have fun with your app.

I also found this: http://github.com/aeden/activewarehouse/tree/master

Beffa
+1  A: 

there's another which I haven't used, but looks good:

http://github.com/wvanbergen/active_olap/tree/master

http://techblog.floorplanner.com/2008/07/29/active-olap-released/


and this for SOLR which i found in google

http://code.google.com/p/kettle-solr-plugin/

Gene T
Ooh, more links. Shows my DW newbness (previous DW apps worked on = 1) that I didn't think of those search terms. Thanks!
Mike Woodhouse
A: 

I was looking into using ActiveWarehouse a while ago before I got pull of onto other things, so I can't tell you how well it works, but it's something to add to your list to check out. It has generators for facts, dimensions, and cubes as well as an ETL toolkit.

Luke Francl
A: 

So we have a number of gems or plugins of varying degrees of complexity, none of which seem to be very actively under development (or there's something going on but it's under the radar).

In any case, I'm not looking to build a data warehouse, just to implement a fact table or two in star schema style. What I was asking for was ideas on how to access such a table.

I'm leaning towards dropping the idea of multiple-levels, where an aggregate across a dimension would have a NULL in the foreign key for that dimension. While it would reduce the number of rows involved in a query the benefit would be small and not free: I'd have a larger fact table and more complex code.

Retrieval looks like it might then be handled with a set of named_scopes, one for each dimension, for filtering. Or a custom finder, fed with a suitably-constructed Hash, might be better.

When I've built it, I'll come back with some better information...

Mike Woodhouse