views:

226

answers:

9

Hi,

I recently spoke to a user of SQL Server 2005 who said that their database was overly normalized and they replicate data to a report server. Isn't a database supposed to handle both transactions and reporting? Why would I have to invest in 2 servers and replicate?

I know this is an open-ended, subjective question, and I don't have statistics in the above case, but isn't tuning of a database enough to handle transaction reporting? I can understand that for a data mining scenario we need a separate server with Analysis Services and de-normalizing. But for current year transactions?

Thanks.

+3  A: 

It depends.

It's entirely possible that a single year, or even a month's detailed data might be better handled in a database where the schema is optimized for reporting, or even just the indexing scheme is different.

It also depends on the type of reporting, if you are comparing current month trends with past months, it's a lot easier to have them in the same database. And if you have daily moving averages, it's a lot easier to do that in a single database than to have to perform that operation over a database boundary.

As far as overly normalized - that can mean a lot of things.

Cade Roux
A: 

Overly normalized usually means the reporting users don't understand the data-model. That kind of users you might want to keep away from the transaction database. A replicated server is a very inexpensive solution, when compared to having your transaction database non-responsive because the reporting user is doing a fancy join.

It is mostly a simple organizational measure, creating a boundary between operational and reporting users.

Stephan Eggermont
A: 

At a pure technical level, there is no reason why the two servers need to be separate. It's likely they made their decision for "business reasons" such as:

  1. Depending on the complexity of the reports, they could consume considerable resources when being run. If this impacts the performance for other users of the system, this is an argument for dumping the data into a separate "reporting" database server.

  2. If the people running reports are writing raw SQL but are not experienced DB developers, it might be useful to convert the data into a denormalized format first so that it's easier for them to work with. It might also help speed performance of the reports themselves.

cliff.meyers
+2  A: 

I think have a separate reporting server from the production/transaction server is often a good idea. I have setup reporting servers with a data structure that is completely "un-normalized" and would make relational purists cringe...but it is a reporting server, so it doesn't matter.

Users love being able to get to "their" data without a DBA standing in the way (reporting database is read-only of course).

A set of routines (or better yet an unattended nightly batch process) that pulls data from the production servers and gets every rolled up, summarized, crosstab-ed and cleaned for the sole purpose of getting usable information to the users in the fastest possible manner, is very often a good solution.

Definitely in my case took the workload off of me for those "can you create a report for me that will show....." type of requests. Give the users access to the data and train them on the tools and let 'em at it.

EJB
A: 

It really depends on your environment and applications in play. Having a separate reporting server is a safe bet. If you have a production system with highly normalized schema and with lots of transactions occurring and throw in record locking then running complex reports against this could produce devastating performance hits. If for example, report queries constructed by perhaps another developer don't include (NOLOCK) on complex joins almost certainly there will be trouble. With the right query (i.e. wrong) you can bring the whole database to a grinding halt. If the reports allow users to pull large amounts of data you might want to look at that too. You may need to guard against allowing a user to run such a query. Make such reports by request only. IMHO

Palakwai
A: 

Two databases can make sense. Here's an example form my own experience.

Database 1 is used for collecting payment history on millions of equipment leases. This main purpose of this database is to collect the data from the various lenders and serve as INPUT for the calculation of credit scores. This database is huge, updated a lot, never exposed to the web.

Database 2 is for reporting. Much smaller. Never updated. Has the OUTPUT of the credit score calculations. Accessible via the web. Includes many tables, indexes to support fuzzy searches by name, address, etc.

If you think of database 1 getting many, many updates, it would be wasteful to keep updating the search related indexes over and over. If you think of database 1 as being huge and database 2 as small, it would be wasteful to ship excess data to the website facing machine.

Corey Trager
+3  A: 

Application (OLTP) and reporting (DW) loads can be and usually are very different on applications of scale. OLTP transactions deal with a small amount of records at a time, happen often and could be a select, insert or update. DW queries tend to process a larger number of record, happen less often and should be read only.

On smaller applications or a young application that does not have a history of data yet, performance will not be a problem. But as your application grows and gains popularity a separate database and eventually a separate server will be required to meet both the business needs of application performance and analytic reporting.

Here is an overview of the two types of workload.

OLTP queries are usually written by developers who have a vested interest in application performance and know exactly what type of business function they are trying to meet. The same query is performed many times a day and problems are tuned out. Here are some examples of the type of work load.

  • record a sale.
  • verify password.
  • retrieve product detail.
  • update user profile.

DW queries may be auto generated by query tools for adhoc reporting or may be written directly by an analyst or business user with little technical experience. Some may prefer to do a select * into their tool of choice like SAS or Mathematica. These types of queries if not done with dirty reads can wreak havoc on the performance of the OLTP application. Even a well written query to do a trended analyses or to group a large number of customers into percentiles could require a full table scan by virtue of requiring all the data. Types of questions that may need to be answered.

  • How many bicycles where sold today, this week, last month.
  • What is the most popular product.
  • What time of day do product with high margin sell.
  • Give me a trended graph of pageviews for the year.
jms
A: 

It may be the best solution depending on the savvy of the users doing the reporting, and what tools they are using; if they have to manually join 8 tables just to get an ad-hoc customer report, they're better off using a report server with views that do all the dirty work for them.

SqlACID
A: 

Create some views (read only) for the report writers if they don't understand the data structure. Run resource intensive reports when the transaction burden is low. Have a development database to prevent interfering with production.

A situation will always come up when things get out of sync and you spend a lot of time finding the problem.

Highly theoretical, but my relational database professor would say that the only duplication of data is either a primary/foreign key relationship or a copy for backup/testing purposes. I'd like to hear what he thinks of data warehousing.

Jeff O