views:

65

answers:

4

I have a table with a xml column named Data:

CREATE TABLE [dbo].[Users](
    [UserId] [int] IDENTITY(1,1) NOT NULL,
    [FirstName] [nvarchar](max) NOT NULL,
    [LastName] [nvarchar](max) NOT NULL,
    [Email] [nvarchar](250) NOT NULL,
    [Password] [nvarchar](max) NULL,
    [UserName] [nvarchar](250) NOT NULL,
    [LanguageId] [int] NOT NULL,
    [Data] [xml] NULL,
    [IsDeleted] [bit] NOT NULL,...

In the Data column there's this xml

<data>
  <RRN>...</RRN>
  <DateOfBirth>...</DateOfBirth>
  <Gender>...</Gender>
</data>

Now, executing this query:

SELECT UserId FROM Users 
WHERE data.value('(/data/RRN)[1]', 'nvarchar(max)') = @RRN

after clearing the cache takes (if I execute it a couple of times after each other) 910, 739, 630, 635, ... ms.

Now, a db specialist told me that adding a function, a view and changing the query would make it much more faster to search a user with a given RRN. But, instead, these are the results when I execute with the changes from the db specialist: 2584, 2342, 2322, 2383, ...

This is the added function:

CREATE FUNCTION dbo.fn_Users_RRN(@data xml)
RETURNS nvarchar(100)
WITH SCHEMABINDING
AS
BEGIN
      RETURN @data.value('(/data/RRN)[1]', 'varchar(max)');
END;

The added view:

CREATE VIEW vwi_Users
WITH SCHEMABINDING
AS
SELECT UserId, dbo.fn_Users_RRN(Data) AS RRN from dbo.Users

Indexes:

CREATE UNIQUE CLUSTERED INDEX cx_vwi_Users ON vwi_Users(UserId)
CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN)

And then the changed query:

SELECT UserId FROM Users 
WHERE dbo.fn_Users_RRN(Data) = @RRN

Why is the solution with a function and a view going slower?

+1  A: 

Check the query execution plan and confirm whether or not the new query is even using the view. If the query doesn't use the view, that's the problem.


How does this query fair?

SELECT UserId FROM vwi_Users
WHERE RRN = '59021626919-61861855-S_FA1E11'

I see you're freely mixing nvarchar and varchar. Don't do that! It can cause full index conversions (eeeeevil).

David B
Thx David, no, varchar is a typo, sorry, I'll edit it.
Lieven Cardoen
And this query doesn't make it faster. It stays the same.
Lieven Cardoen
How about the query execution plan (in sql studio, click "Display Estimated Execution Plan")
David B
"stays the same" is good. It probably means that whatever is messing up one query is messing up the other. The query execution plan will confirm this.
David B
+2  A: 

the point of the view was to pre-compute the XML value into a regular column. To then use that precomputed value in the index on the view, shouldn't you actually query the view?

SELECT
    UserId
    FROM vwi_Users
    WHERE RRN= '59021626919-61861855-S_FA1E11'

also, make the index this:

CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN) INCLUDE (UserId)

it is called a covering index, since all columns needed in the query are in the index.

KM
Even this doesn't make it faster.
Lieven Cardoen
the only way to tell why is to look at the execution plan. So run `set showplan_xml on` and then run the first query and post the output here. Then run my query and post that output here as well.
KM
@Lieven Cardoen, since you said in another comment that `no, varchar is a typo, sorry, I'll edit it` make sure you run the query in my answer with `= N'59021626919-61861855-S_FA1E11'`
KM
+2  A: 

Have you tried to add that function result to your table (not a view) as a persisted, computed column??

ALTER TABLE dbo.Users
   ADD dbo.fn_Users_RRN(Data) PERSISTED

Doing so will extract that piece of information from the XML, store it in a computed, always up-to-date column, and the persisted flag makes it physically stored along side the other columns in your table.

If this works (the PERSISTED flag is a bit iffy in terms of all the limitations it has), then you should see nearly the same performance as querying any other string field on your table... and if the computed column is PERSISTED, you can even put an index on it if you feel the need for that.

marc_s
I think this would be the best solution. In our next release, we have added a column RRN, which makes it obviously a lot faster. In the current release I'll try your solution. But that still leaves me with the question why the query with function and view goes a lot slower than without...
Lieven Cardoen
@Lieven: did you analyze the query execution plans for both cases? Maybe those can give you a pointer in the right direction
marc_s
A: 

Scalar functions tend to perform very poorly in SQL Server. I'm not sure why if you make it a persisted computed column and index it, it doesn't have identical performance to a normal indexed-column, but it may be due to the UDF being called even though you think it's no longer needed to be called once the data is computed.

I think you know this from another answer, but your final query is wrongly calling the scalar UDF on every row (defeating the point of persisting the computation):

SELECT UserId FROM Users  
WHERE dbo.fn_Users_RRN(Data) = @RRN 

It should be

SELECT UserId FROM vwi_Users  
WHERE RNN = @RRN 
Cade Roux