ansaurus

Question

Performance issue with SQL Server stored procedure

Answer 1

+3 A:

I'd try to change the application to only call this one time per ID, but if that is not possible, try this (make sure that there is an index on similarity.id1 and another index on similarity.id2):

PROCEDURE [dbo].[readerSimilarity] 
-- Add the parameters for the stored procedure here
@id int,
@type int
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- Insert statements for procedure here
    IF @type=1 --by Article
    BEGIN
        SELECT
            id1, id2,similarity_byArticle
            FROM similarity
            WHERE id1 = @id AND similarity_byArticle!=0
        UNION
        SELECT
            id1, id2,similarity_byArticle
            FROM similarity
            WHERE id2 = @id AND similarity_byArticle!=0

    END
    ELSE IF @type=2 --by Parent
    BEGIN
        SELECT
            id1, id2,similarity_byParent
            FROM similarity
            WHERE id1 = @id AND similarity_byParent!=0
        UNION
        SELECT
            id1, id2,similarity_byParent
            FROM similarity
            WHERE id2 = @id AND similarity_byParent!=0

    END

    ELSE IF @type=3 --by Child
    BEGIN
        SELECT
            id1, id2,similarity_byChild
            FROM similarity
            WHERE id1 = @id AND similarity_byChild!=0
        UNION
        SELECT
            id1, id2,similarity_byChild
            FROM similarity
            WHERE id2 = @id AND similarity_byChild!=0

    END
    ELSE IF @type=4 --combined
    BEGIN
        SELECT
            id1, id2,similarity_combined
            FROM similarity
            WHERE id1 = @id AND similarity_combined!=0
        UNION
        SELECT
            id1, id2,similarity_combined
            FROM similarity
            WHERE id2 = @id AND similarity_combined!=0

    END

END

GO

EDIT based on OP's latest comment:

The whole graph is stored in the MSSQL-Database and I load it successively with the procedure into some Dictionary structures

You need to redesign your load process. You should call the database just one time to load all of this data. Since the IDs are already in a Database table, you can use a join in this query to get the proper IDs from the other table. edit your question with the table schema that contain the IDs to graph, and how they relate to the already posted code. Once you get a single query to return all the data, it will be much faster that 17,000 calls for a single row each time.

KM 2009-09-18 13:12:27

are you calling this procedure in a loop? if so post the details, there are ways to return one result set for all items in your loop and save massive amounts of time

KM 2009-09-18 13:33:19

problem with this is, that I don't know which ids I need. The clustering algorithm calculates which ids of the graph should belong to the cluster. I could try to load all ids in a given distance, but this would cause me to load a *lot* of data I do not need and I have no assurance that the given distance is enough to load all the ids that will finally belong to the cluster. And loading the whole graph would burst the machines memory (28.5 Mio edges). I redesign the algorithm. Now it is a bit more blurry, but I just need to load all the edges only when actually adding it to the cluster.

Aaginor 2009-09-21 07:58:05

how complex is the "clustering algorithm"? is is also looping over data one row at a time. Post the table definitions and the clustering algorithm, perhaps someone here can figure it out.

KM 2009-09-21 12:31:50

I optimized the code and if possible, I returned a set of results with one request via the method Charles Bretana offered. Now I have acceptable runtime :)

Aaginor 2009-09-22 11:31:30

Answer 2

+4 A:

If it runs that quickly, your problem is probably in the sheer number of repeated calls to the procedure. Is there a way that you could modify the stored procedure and code to return all the results the app needs in a single call?

Optimizing a query that runs in less than 2ms is probably not a fruitful effort. I doubt you will be able to shave more than fractions of a millisecond with query tweaks.

JohnFx 2009-09-18 13:14:16

I think your right. I have tweaked the sp with some of the tips here (using union instead of or), but this just made up a small amount of time.

Aaginor 2009-09-18 14:19:28

Even the fastest stored proc called 17K times is going to be a poor performer, the overhead of making the calls alone probably exceeds the time taken by the actual SQL code.

JohnFx 2009-09-18 15:00:45

Answer 3

A:

First create a view

CREATE VIEW ViewArticles
AS
SELECT id1, id2, similarity_byArticle 
FROM similarity 
WHERE (id1 = @id or id2 = @id) 
and similarity_byArticle != 0

In your code populate all the needed ids into a table.

Create a function which takes all the ids table as parameter.

CREATE FUNCTION
  SelectArticles
(
  @Ids TABLE
)
RETURNS TABLE
AS
RETURN
(
     SELECT id1, id2, similarity_byArticle FROM ViewArticles
     INNER JOIN @Ids I ON I.Id = id1
     UNION
     SELECT id1, id2, similarity_byArticle FROM ViewArticles
     INNER JOIN @Ids I ON I.Id = id2
)

Greco 2009-09-18 13:17:25

Answer 4

+1 A:

Pass all the ids into the stored proc at once, using a delimited list (Use a comma or a slash or whatever, I use a pipe character [ | ].. Add the User defined function (UDF) listed below to your database. It will convert a delimited list into a table which you can join to your similarity table. Then in your actual stored proc, you can write...

Create Procedure GetSimilarityIDs
@IdValues Text -- @IdValues is pipe-delimited [|] list of Id Values
As
Set NoCount On
Declare @IDs Table 
   (rowNum Integer Primary Key Identity Not Null,
    Id Integer Not Null)
Insert Into @IDs(Id)
Select Cast(sVal As Integer)
From dbo.ParseString(@IdValues, '|') -- specify delimiter
-- ---------------------------------------------------------

Select id1, id2, similarity_byArticle            
From similarity s Join @IDs i On i.Id = s.Id
Where similarity_byArticle <> 0
Return 0

-- *******************************************************

The below code is to create the generic function UDF that can parse any text string into a table of string values...:

Create FUNCTION [dbo].[ParseTextString] (@S Text, @delim VarChar(5))
Returns @tOut Table 
    (ValNum Integer Identity Primary Key, 
     sVal VarChar(8000))
As
Begin 
Declare @dLLen TinyInt       -- Length of delimiter
Declare @sWin  VarChar(8000) -- Will Contain Window into text string
Declare @wLen  Integer       -- Length of Window
Declare @wLast TinyInt     -- Boolean to indicate processing Last Window
Declare @wPos  Integer     -- Start Position of Window within Text String
Declare @sVal  VarChar(8000) -- String Data to insert into output Table
Declare @BtchSiz Integer     -- Maximum Size of Window
    Set @BtchSiz = 7900      -- (Reset to smaller values to test routine)
Declare @dPos Integer        -- Position within Window of next Delimiter
Declare @Strt Integer        -- Start Position of each data value within Window
-- -------------------------------------------------------------------------
If @delim is Null Set @delim = '|'
If DataLength(@S) = 0 Or
      Substring(@S, 1, @BtchSiz) = @delim Return
-- ---------------------------
Select @dLLen = Len(@delim),
       @Strt = 1, @wPos = 1,
       @sWin = Substring(@S, 1, @BtchSiz)
Select @wLen = Len(@sWin),
       @wLast = Case When Len(@sWin) = @BtchSiz
           Then 0 Else 1 End,
       @dPos = CharIndex(@delim, @sWin, @Strt)
-- ------------------------------------
  While @Strt <= @wLen
  Begin
      If @dPos = 0 -- No More delimiters in window
      Begin                      
          If @wLast = 1 Set @dPos = @wLen + 1 
          Else 
          Begin
              Set @wPos = @wPos + @Strt - 1
              Set @sWin = Substring(@S, @wPos, @BtchSiz)
              -- ----------------------------------------
              Select @wLen = Len(@sWin), @Strt = 1,
                     @wLast = Case When Len(@sWin) = @BtchSiz
                              Then 0 Else 1 End,
                     @dPos = CharIndex(@delim, @sWin, 1)
              If @dPos = 0 Set @dPos = @wLen + 1 
          End
      End
      -- -------------------------------
      Set @sVal = LTrim(Substring(@sWin, @Strt, @dPos - @Strt))
      Insert @tOut (sVal) Values (@sVal)
      -- -------------------------------
      -- Move @Strt to char after last delimiter
      Set @Strt = @dPos + @dLLen 
      Set @dPos = CharIndex(@delim, @sWin, @Strt)
   End
   Return
End

Charles Bretana 2009-09-18 13:55:56

Thanks for this code! I was able to shorten execution time for fetching data by ~30% ... but was getting hit hard by separating the result for every id (raised execution time from 30 sek to 300 sek *g*). But I guess I can optimize this, will have a look into it later.

Aaginor 2009-09-18 15:30:44

ansaurus

tags:

views:

answers:

Performance issue with SQL Server stored procedure

Stored Procedure Background

Calling the Stored Procedure `C# Code Snippet`

C# Code Background Information

Ideas

Questions

related questions

ansaurus

tags:

views:

answers:

Performance issue with SQL Server stored procedure

Stored Procedure Background

Calling the Stored Procedure C# Code Snippet

C# Code Background Information

Ideas

Questions

related questions

Calling the Stored Procedure `C# Code Snippet`