views:

364

answers:

4

UPDATE: This issue is note related to the XML, I duplicated the table using an nvarchar(MAX) instead and still same issue. I will repost a new topic.

I have a table with about a million records, the table has an XML field. The query is running extremely slow, even when selecting just an ID. Is there anything I can do to increase the speed of this, I have tried setting text in row on, but SQL server will not allow me to, I receive the error "Cannot switch to in row text in table".

I would appreciate any help in a fix or knowledge that I seem to be missing.

Thanks

TABLE

/****** Object:  Table [dbo].[Audit]    Script Date: 08/14/2009 09:49:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Audit](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [ParoleeID] [int] NOT NULL,
    [Page] [int] NOT NULL,
    [ObjectID] [int] NOT NULL,
    [Data] [xml] NOT NULL,
   [Created] [datetime] NULL,
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,          ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

QUERY

DECLARE @ID int
SET @ID = NULL  
DECLARE @ParoleeID int
SET @ParoleeID = 158 
DECLARE @Page int
SET @Page = 2
DECLARE @ObjectID int
SET @ObjectID = 93 
DECLARE @Created datetime
SET @Created = NULL
SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where 
 (@ID IS NULL OR Audit.ID = @ID)  AND
 (@ParoleeID IS NULL OR Audit.ParoleeID = @ParoleeID)  AND
 (@Page IS NULL OR Audit.Page = @Page)  AND
 (@ObjectID IS NULL OR Audit.ObjectID = @ObjectID)  AND
 (@Created is null or(Audit.Created > @Created and Audit.Created < DATEADD (d, 1, @Created ))  )
A: 

Some information like the query you run, the table structure, the XML content etc would definitely help. A lot...

Without any info, I will guess. The query is running slow when selecting just an ID because you don't have in index on ID.

Updated

There are at least a few serious problems with your query.

  • Unless an ID is provided, the table can only be scanned end-to-end because there are no indexes
  • Even if an ID is provided, the condition (@ID is NULL OR ID = @ID) is no guaranteed to be SARGable so it may still result in a table scan.
  • And most importantly: the query will generate a plan 'optimized' for the first set of parameters it sees. It will reuse this plan on any combination of parameters, no matter which are NULL or not. That would make a difference if there would be some variations on the access path to choose from (ie. indexes) but as it is now, the query can only choose between using a scan or a seek, if @id is present. Due to the ways is constructed, it will pretty much always choose a scan because of the OR.

With this table design your query will run slow today, slower tomorrow, and impossibly slow next week as the size increases. You must look back at your requirements, decide which fields are impoortant to query on, index them and provide separate queryes for them. OR-ing together all possible filters like this is not going to work.

The XML you're trying to retrieve has absolutely nothing to do with the performance problem. You are simply brute forcing a table scan and expect SQL to magically find the records you want.

So if you want to retrieve a specific ParoleeID, Page and ObjectID, you index the fields you search on and run a run a query for those and only those:

CREATE INDEX idx_Audit_ParoleeID ON Audit(ParoleeID);
CREATE INDEX idx_Audit_Page ON Audit(Page);
CREATE INDEX idx_Audit_ObjectID ON Audit(ObjectID);
GO

DECLARE @ParoleeID int
SET @ParoleeID = 158    
DECLARE @Page int
SET @Page = 2
DECLARE @ObjectID int
SET @ObjectID = 93      

SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where Audit.ParoleeID = @ParoleeID
  AND Audit.Page = @Page
  AND Audit.ObjectID = @ObjectID;
Remus Rusanu
I have added some table/query info to my post. The query selects *, but even just selecting ID takes a long time. This particular query returns 44 records and it takes close to a minute to run. I did some testing and the query starts to bog down when I have about 700K rows, prior to that it ran fast.
Dustin Laine
And most importantly: the query will generate a plan 'optimized' for the first set of parameters it sees. It will reuse this plan on any combination of parameters, no matter which are NULL or not. --- This statement is wrong. SQL will optimise for ANY value unless you explicitly provide a hint if you know you have a better plan than the server does. Most of the time that is caused by not having good statistics to allow SQL to optimise your plan for the data you have.
Spence
See OPTIMIZE FOR ( @variable_name { UNKNOWN | = literal_constant } [ , ...n ] ) http://msdn.microsoft.com/en-us/library/ms181714.aspx
Spence
+1  A: 

Sql Server 2005 – Twelve Tips For Optimizing Query Performance by Tony Wright

  1. Turn on the execution plan, and statistics
  2. Use Clustered Indexes
  3. Use Indexed Views
  4. Use Covering Indexes
  5. Keep your clustered index small.
  6. Avoid cursors
  7. Archive old data
  8. Partition your data correctly
  9. Remove user-defined inline scalar functions
  10. Use APPLY
  11. Use computed columns
  12. Use the correct transaction isolation level

http://tonesdotnetblog.wordpress.com/2008/05/26/twelve-tips-for-optimising-sql-server-2005-queries/

RRUZ
+2  A: 

You need to create a primary XML index on the column. Above anything else having this will assist ALL your queries.

Once you have this, you can create indexing into the XML columns on the xml data.

From experience though, if you can store some information in the relation tables, SQL is much better at searching and indexing that than XML. Ie any key columns and commonly searched data should be stored relationally where possible.

Spence
Long time, but this was the answer. The index make the query in less than a second!
Dustin Laine
+1  A: 

I had the very same scenario - and the solution in our case is computed columns.

For those bits of information that you need frequently from your XML, we created a computed column on the "hosting" table, which basically reaches into the XML and pulls out the necessary value from the XML using XPath. In most cases, we're even able to persist this computed column, so that it becomes part of the table and can be queried and even indexed and query speed is absolutely no problem anymore (on those columns).

We also tried XML indices in the beginning, but their disadvantage is the fact that they're absolutely HUGE on disk - this may or may not be a problem. Since we needed to ship back and forth the whole database frequently (as a SQL backup), we eventually gave up on them.

OK, to setup a computed column to retrieve from information bits from your XML, you first need to create a stored function, which will take the XML as a parameter, extract whatever information you need, and then pass that back - something like this:

CREATE FUNCTION dbo.GetShopOrderID(@ShopOrder XML)
RETURNS VARCHAR(100)
AS BEGIN
  DECLARE @ShopOrderID VARCHAR(100)

  SELECT  
    @ShopOrderID = @ShopOrder.value('(ActivateOrderRequest/ActivateOrder/OrderHead/OrderNumber)[1]', 'varchar(100)')

  RETURN @ShopOrderID
END

Then, you'll need to add a computed column to your table and connect it to this stored function:

ALTER TABLE dbo.YourTable 
ADD ShopOrderID AS dbo.GetShipOrderID(ShopOrderXML) PERSISTED

Now, you can easily select data from your table using this new column, as if it were a normal column:

SELECT (fields) FROM dbo.YourTable
WHERE ShopOrderID LIKE 'OSA%'

Best of all - whenever you update your XML, all the computed columns are updated as well - they're always in sync, no triggers or other black magic needed!

Marc

marc_s
This is a good reply, but it does not fit my needs as I am not trying to query out a particular piece of XML data, just selecting. You can see my query above, which I just added thanks.
Dustin Laine
+1. Didn't know about the Persisted keyword :) That is actually a really neat way of fast querying down XML...
Spence