views:

716

answers:

6

I've got a lot of mysql data that I need to generate reports from. It's mostly historic data so it won't be changing much, but it weighs in at 20-30 gigabytes easily and is expected to grow. I currently have a collection of php scripts that will do some complex queries and output csv and excel files. I also use phpMyAdmin with bookmarked queries. I manually edit them to change the parameters. The amount of data is growing and the number of people who need access to it is also growing, so I'm making the time to improve this situation.

I started reading about data warehousing the other day and it seems that this an area that relates to what I need to do. I've read some good articles and am even waiting on a book. I think I'm getting a handle on what these sorts of systems do and what's possible.

Creating a reporting system for my data has always been on a todo list, but until recently I figured it would be a highly niche programing venture. Since I now know data warehousing is a common thing, I figure there must be some sort of reporting/warehousing frames available to ease in the development. I'd gladly skip writing interfaces and scripts to schedule and email reports and the like and stick to writing queries and setting up relations.

I've mostly been a lamp guy, but I'm not above switching languages or platforms. I just need a more robust solution as my one off scripts don't scale well.

So where's a good place to get started?

+3  A: 

Pentaho has put together a pretty comprehensive suite of products. The products are "free", but be prepared for the usual heavy sell once you fork over your identifying information.

I haven't had a chance to really stretch them as we're a Microsoft shop from one sad end to the other.

Nick
What do you use at your microsoft shop?
wizard
+3  A: 
6eorge Jetson
+3  A: 

I think you should first check out Kimball and Inmon and see if you want to approach your data warehouse in a particular way. Kimball, in particular, lays out a very good framework for the modelling and construction of the warehouse.

Cade Roux
Kimball's Data Warehouse books are probably one of the few books you'll need.
Codewerks
It is! I have The Data Warehouse Toolkit and it's amazing.
wizard
+1  A: 

It hasn't been updated in a while but there's a nice Data Warehousing/ETL Ruby package called ActiveWarehouse.

But I would check out the Pentaho products like Nick mentioned in another answer. It should easily handle the volume of data you have and may provide you with more ways to slice and dice your data than you could have ever imagined.

mwilliams
The Active Warehouse guys link to a greate article by Anothy Eden which no longer works, here is the archive. http://web.archive.org/web/20070510141152/http://anthonyeden.com/2006/12/20/activewarehouse-example-with-rails-svn-logs
wizard
A: 

Kimball is the simpler method for data warehousing.

We use Informatica for moving data around, but it doesn't do DW things like indexing by default.
I like the idea of Wherescape RED, as a DW tool and using MS SQL's Linked Servers to obviate the need for an ETL tool.

Joe K
A: 

There are a number of tools which try to make the process of designing, implementing and managing/operating a Data Warehouse and they each have their strengths and weaknesses and often vastly differing price points. Under the covers you are always going to be best off if you have a good knowledge of warsehousing principles from the Kimball and/or Inmon camps.

As well as tools like Kalido and Wherescape RED (which do similar thing in very different ways), many of the ETL platforms now have good in-built support for the donkey work of implementation - SCD components etc and lineage tracking.

Best though to view all these as tools to be used in the hands of you, the craftsman, they make certain easy things even easier (or even trivial), some hard things easier but some things they just get in they way of IMHO ;) Learn the methodology and principles first and get a good understanding of them and then you will know which tools to apply from your kitbag and when...

MarkH