views:

41

answers:

6

We recently put a new production database into use. The schema of this database is optimized for OLTP. We're also getting ready to implement a reporting server to be used for reporting purposes. I'm not convinced we should just blindly use the same schema for our reporting database as we do for our production database, and replicate data over.

For those of you that have dealt with having separate production and reporting databases, have you chosen to use the same database schema for your reporting database, or a schema that is more efficient for reporting; for example, perhaps something more denormalized?

Thanks for thoughts on this.

+1  A: 

It's easiest just to copy over.

You could add some views to that schema to simplify queries - to conceptually denormalize.

If you want to go the full Data Warehouse/Analysis Services route, it will be quite a bit of work. But it's very fast, takes up less space, and users seem to like it. If you're concerned about large amounts of data and response times, you should look into this.

If you have many many tables being joined, you might look into actually denormalizing the data. I'd do a test case just to see how much gain for pain you'll be getting.

Sam
+2  A: 

I think that the reporting database schema should be optimized for reporting - so you'll need a ETL Process to load your data. In my experience I was quickly at the point that the production schema does not fit my reporting needs.

If you are starting your reporting project I would suggest that you design your reporting database for your reports needs.

Arthur
+2  A: 

For serious reporting, usually you create data warehouse (Which is typically at least somewhat denormalized and certain types of calculations are done when the data is refreshed to save from averaging the values of 1.3 million records when you run the report. This is for the kind of reporting reporting that includes a lot of aggregate data.

If your reporting needs are not that great a replicated database might work. It may also depend on how up-to-date you need the data to be as data warehouses are typically updated once or twice a day so the reporting data is often one day behind, OK for monthly and quarterly reports not so good to see how many widgits have been ordered so far today.

The determinate of whether you need a data warehouse tends to be how long it would take to runthe reports they need. This is why datawarehouse pre-aggregate data on loading it. IF your reoports are running fine and you just want to get the worokload away from the input workload a replicated adatabase should do the trick. If you are trying to do math on all the records for the last ten years, you need a data warehouse.

You could do this in steps too. Do the replication now, to get reporting away from data input. That should be an immediate improvement (even if not as much as you want), then design and implement the datawarehouse (which can be a fairly long and involved project and which will take some time to get right).

HLGEM
+3  A: 

There's really two sides to the story:

  • if you keep the schema identical, then updating the reporting database from the production is a simple copy (or MERGE in SQL Server 2008) command. On the other hand, the reports might get a bit harder to write, and might not perform optimally

  • if you devise a separate reporting schema, you can optimize it for reporting needs - then the creation of new reports might be easier and faster, and the reports should perform better. BUT: The updating is going to be harder

So it really boils down to: are you going to create a lot of reports? If so: I'd recommend coming up with a specific reporting schema optimized for reports.

Or is the main pain point the upgrade? If you can define and implement that once (e.g. with SQL Server Integration Services), maybe that's not really going to be a big issue after all?

Typically, chances are that you'll be creating a lot of reports of time, so there's a good chance it might be beneficial in the long run to invest a bit upfront in a separate reporting schema, and a data loading process (typically using SSIS) and then reap the benefit of having better performing reports and faster report creation time.

marc_s
+1  A: 

Without going directly for the data warehouse solution you could always put together some views that rearrange data for better reporting access. This helps you in that you don't have to start a large warehouse project right away and could help scope out a warehouse project if you decide to go that way.

CTKeane
+1  A: 

All the answers I've read here are good, I would just add that you do this in stages, stopping as soon as your goals for performance and functionality are met:

Keep the schema identical - this just takes contention and load off the OLTP server

Keep the schema identical - but add new indexed views OR index base tables differently

Build a partial data-warehouse style model (perhaps not keeping snapshot-style history or slowly changing dimensions or anything special not catered for in your normal database) from the copy-schema in another schema or database on the same reporting server. The benefits of star-schema models are huge for reporting, views flattened for users and data dictionaries etc. In this model, if your OLTP database loses changes (for instance customer name changes) due to overwrites, the data warehouse doesn't capture that information (often it's not that important if you stop at this spot). Effectively you are getting data warehouse-style organization for "current" data only. The benefits of retaining the copy of the original schema on your reporting server at this point are that you can pull from the source data in original SQL Server form instead of some kind of intermediate form (like text files) without affecting production OLTP, and you can migrate data models gradually, some in stars, some in normal form, all without affecting production. At some point later, you might be able to drop all or part of the copy.

Build a full data-warehouse including slowly changing dimensions where all the data is captured from the source system.

Cade Roux