views:

92

answers:

5

When building a transactional system that has a highly normalized DB, running reporting style queries, or even queries to display data on a UI can involve several joins, which in a data heavy scenario can and usually does, impact performance. Joins are expensive.

Often, the guidance espoused is that you should never run these queries off your transactional DB model, rather you should use a denormalized flattened model that is tailored for specific UI views or reports which eliminates the need for many joins. Data duplication is not an issue in this scenario.

This concept makes perfect sense, but what I rarely see when experts make these statements is exactly HOW to implement this. For example, (and quite frankly I'd appreciate an example using any platform) in a mid sized system running on a sql server back-end you have a normalized transactional model. You also have some reports and a website that require queries. So, you create a "reporting" database that flattens up the normalized data. How do you keep this in sync? Transaction log shipping? If so, how do you transform the data to fit in the reporting model?

A: 

Short answer: try working with indexed views. There are a number of limitations on the underlying tables, but you get synchronisation out-of-the-box.

Noel Abrahams
I've been told this before. I will investigate it further. Thanks.
Saraz
-1: you will introduce a lot of lock contention and deadlocks. Most likely you will have to drop them altogether.
AlexKuznetsov
Eh? What on earth are you talking about?! :-)
Noel Abrahams
@AlexKuznetsov, I still could not understand "when should I use an indexed view instead of a real table?" http://stackoverflow.com/questions/3861476/in-sql-server-when-should-i-use-an-indexed-view-instead-of-a-real-table
vgv8
@vgv8: I dont understand your question.
AlexKuznetsov
+3  A: 

In our shop, we set up a continuous transactional replication from the OLTP system to another DB server used for reporting. You wouldn't want to use log shipping for this purpose as it requires an exclusive lock on the database every time it restores a log, which would prevent your users from running reports.

With the optimizer in SQL Server today, I think the notion that the joins on a normalized database are "too expensive" for reporting is a bit outdated. Our design is fully 3rd normal form, several million rows in our main tables, and we have no problems running any of our reports. Having said that, if push came to shove, you could look into creating some indexed views on your reporting server to help out.

Joe Stefanelli
But transactional replication doesn't allow you to transform the data, correct? That would have to happen after it made its way over to the reporting DB?
Saraz
I suppose you could look into customizing the stored procs that replication uses on the subscriber or doing something with triggers on the subscriber tables. But as I mentioned, I'd start simple and try running the reports off of your normalized schema. Don't over-complicate things by assuming performance problems before you have them.
Joe Stefanelli
@Saraz: you can have a reporting denormalized schema build as an *Extension* to the normalized OLTP one. Using indexing strategies, like large (wide) covering indexes and specially indexed views. These indexes are automatically maintained by replication, at a cost of slower inserts/updates. Often snapshot isolation is also deployed to protect reporting from the lock contention with the replication agent applying the changes. Having two completely distinct schema (so distinct that replication cannot work) is really hard to keep in sync.
Remus Rusanu
A: 

We use transactional replication to another database.

We filter the data so we only get the data we need in our replication database

We also only select the columns we want, so the tables are 'smaller'.

Then we combine the data in the replication database either via views or we build triggers to add data from one table to another.

Michel
A: 

Canned answer:

In all-too-many cases an indexed view may solve your short term performance goals but at some later time become counterproductive. So if you choose to use an indexed view, you may need an exit strategy. Let me describe a few common problems with indexed views.

Indexed views may increase lock contention.

It is very easy to demonstrate. Create the following table:

CREATE TABLE dbo.ChildTable(ChildID INT NOT NULL 
  CONSTRAINT PK_ChildTable PRIMARY KEY,
  ParentID INT NOT NULL,
  Amount INT NOT NULL);
GO   

From one tab in SSMS, run this script:

BEGIN TRAN;
INSERT INTO dbo.ChildTable(ChildID, ParentID, Amount)
  VALUES(1,1,1); 

From another tab, run a similar one:

BEGIN TRAN;
INSERT INTO dbo.ChildTable(ChildID, ParentID, Amount)
  VALUES(2,1,1);
ROLLBACK;

Note that both inserts complete, they do not block each other. Rollback in both tabs, and create an indexed view:

CREATE VIEW dbo.ChildTableTotals WITH SCHEMABINDING
AS
SELECT ParentID, 
  COUNT_BIG(*) AS ChildRowsPerParent, 
  SUM(Amount) AS SumAmount
FROM dbo.ChildTable
GROUP BY ParentID;
GO
CREATE UNIQUE CLUSTERED INDEX ChildTableTotals_CI 
  ON dbo.ChildTableTotals(ParentID);

Rerun the two inserts. Note that the second one does not complete; it is blocked. The reason is very simple: the first insert modifies the corresponding entry in the indexed view, so the insert acquires and holds a lock on it.

It is just as easy to demonstrate that when you create an indexed view, deadlocks may become more likely too.

Note: this is not a problem with the way indexed views are implemented. If you roll out your own summary table, and develop triggers which directly modify it to keep it up-to-date, you will encounter the same problem. Only if you don't maintain your summary table all the time, you can get around this locking problem, but a more detailed discussion of this is beyond the scope of this post.

Edit: the example may look contrived to you, but the problem it demonstrates is very real and very common. Indexed views in OLTP environments are of limited use, because they seriously increase lock contention and cause many deadlocks. It is quite common that someone creates them in OLTP, but eventually drops, because they introduce more problems than they solve.

There are two common way of demonstrating problems caused by concurrency - we either write loops and run them from multiple connections, or explicitly begin transactions in two or more connections. I encourage everyone to come up with a simpler way to demonstrate this problem.

AlexKuznetsov
Any (hidden from me) import in "NOT NULL" + PRIMARY KEY constraint on the same field (in "CREATE TABLE dbo.ChildTable(ChildID INT NOT NULL CONSTRAINT PK_ChildTable PRIMARY KEY,")?
vgv8
@AlexKuznetsov, and as a result of this (rather contrived) example you want to completely rule out the benefits of indexed views? (Why do you want a begin tran on a simple insert?)
Noel Abrahams
@noel Abrahams: the example may look contrived to you, but the problem it demonstrates is very real and very common. I encourage you to come up with a simpler way to demonstrate this problem.
AlexKuznetsov
A: 

Proper indexing, covering indexes, and reformatting queries could probably do you a lot of good. However, if you're already doing that, then you could either mirror your databases, replicate them, or create an etl package and create an analysis services cube/s.

DForck42