views:

51

answers:

1

Hi Folks,

I have a serious performance problem.

I have a database with (related to this problem), 2 tables.

1 Table contains strings with some global information. The second table contains the string stripped down to each individual word. So the string is like indexed in the second table, word by word.

The validity of the data in the second table is of less important then the validity of the data in the first table.

Since the first table can grow like towards 1*10^6 records and the second table having an average of like 10 words for 1 string can grow like 1*10^7 records, i use a nolock in order to read the second this leaves me free for inserting new records without locking it (Expect many reads on both tables).

I have a script which keeps on adding and updating rows to the first table in a MERGE statement. On average, the data beeing merged are like 20 strings a time and the scripts runs like ones every 5 seconds.

On the first table, i have a trigger which is beeing invoked on a Insert or Update, which takes the newly inserted or updated data and calls a stored procedure on it which makes sure the data is indexed in the second table. (This takes some significant time).

The problem is that when having the trigger disbaled, Reading the first table happens in a few ms. However, when enabling the trigger and your in bad luck of trying to read the first table while this is beeing updated, Our webserver gives you a timeout after 10 seconds (which is way to long anyways).

I can quess from this part that when running the trigger, the first table is kept (partially) in a lock untill the trigger is completed.

What do you think, if i'm right, is there a easy way around this?

Thanks in advance!

Cheers, Koen

As requested:

-- =============================================
-- Author:      Koen Bekkenutte
-- Create date: 05 - 02 - 2010
-- Description: Triggers the indexation of the phrases of a feeditem
-- =============================================
ALTER TRIGGER [dbo].[OnFeedItemsChanged] 
   ON  [dbo].[FeedItems] 
   AFTER INSERT,UPDATE
AS 
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    DECLARE @id int;
    SELECT @id = ID FROM INSERTED;
    IF @id IS NOT NULL
    BEGIN
        DECLARE @title nvarchar(MAX);
        SELECT @title = Title FROM INSERTED;
        DECLARE @description nvarchar(MAX);
        SELECT @description = [Description] FROM INSERTED;

        SELECT @title = dbo.RemoveNonAlphaCharacters(@title)
        SELECT @description = dbo.RemoveNonAlphaCharacters(@description)

        -- Insert statements for trigger here
        EXEC dbo.usp_index_itemstring @id, @title;
        EXEC dbo.usp_index_itemstring @id, @description;
    END
END

The FeedItems table is populated by this query:

MERGE INTO FeedItems i
USING @newitems d ON i.Service = d.Service AND i.GUID = d.GUID
WHEN matched THEN UPDATE
    SET i.Title = d.Title,
        i.Description = d.Description,
        i.Uri = d.Uri,
        i.Readers = d.Readers
WHEN NOT matched THEN INSERT
    (Service, Title, Uri, GUID, Description, Readers)
    VALUES
    (d.Service, d.Title, d.Uri, d.GUID, d.Description, d.Readers);

The sproc: IndexItemStrings is populating the second table, executing this proc does indeed take his time. The problem is that while executing this trigger. Queries applied to the FeedItems table are mostly timing out (even those queries who dont uses the second table)

First table:

USE [ICI]
GO

/****** Object:  Table [dbo].[FeedItems]    Script Date: 04/09/2010 15:03:31 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FeedItems](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Service] [int] NOT NULL,
    [Title] [nvarchar](max) NULL,
    [Uri] [nvarchar](max) NULL,
    [Description] [nvarchar](max) NULL,
    [GUID] [nvarchar](255) NULL,
    [Inserted] [smalldatetime] NOT NULL,
    [Readers] [int] NOT NULL,
 CONSTRAINT [PK_FeedItems] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FeedItems]  WITH CHECK ADD  CONSTRAINT [FK_FeedItems_FeedServices] FOREIGN KEY([Service])
REFERENCES [dbo].[FeedServices] ([ID])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[FeedItems] CHECK CONSTRAINT [FK_FeedItems_FeedServices]
GO

ALTER TABLE [dbo].[FeedItems] ADD  CONSTRAINT [DF_FeedItems_Inserted]  DEFAULT (getdate()) FOR [Inserted]
GO

Second table:

USE [ICI]
GO

/****** Object:  Table [dbo].[FeedItemPhrases]    Script Date: 04/09/2010 15:04:47 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FeedItemPhrases](
    [FeedItem] [int] NOT NULL,
    [Phrase] [int] NOT NULL,
    [Count] [smallint] NOT NULL,
 CONSTRAINT [PK_FeedItemPhrases] PRIMARY KEY CLUSTERED 
(
    [FeedItem] ASC,
    [Phrase] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FeedItemPhrases]  WITH CHECK ADD  CONSTRAINT [FK_FeedItemPhrases_FeedItems] FOREIGN KEY([FeedItem])
REFERENCES [dbo].[FeedItems] ([ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[FeedItemPhrases] CHECK CONSTRAINT [FK_FeedItemPhrases_FeedItems]
GO

ALTER TABLE [dbo].[FeedItemPhrases]  WITH CHECK ADD  CONSTRAINT [FK_FeedItemPhrases_Phrases] FOREIGN KEY([Phrase])
REFERENCES [dbo].[Phrases] ([ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[FeedItemPhrases] CHECK CONSTRAINT [FK_FeedItemPhrases_Phrases]
GO

And more:

-- =============================================
-- Author:      Koen Bekkenutte
-- Create date: 05 - 02 - 2010
-- Description: Extracts and matches phrases from a feeditem string
-- =============================================
ALTER PROCEDURE [dbo].[usp_index_itemstring] 
    -- Add the parameters for the stored procedure here
    @item int, 
    @text nvarchar(MAX) 
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- DECLARE a table containing all words within the text
    DECLARE @tempPhrases TABLE 
    ( 
        [Index] int,
        [Phrase] NVARCHAR(256) 
    );

    -- extract each word from text and store it in the temp table
    WITH Pieces(pn, start, [stop]) AS 
    ( 
        SELECT 1, 1, CHARINDEX(' ', @text) 
        UNION ALL 
        SELECT pn + 1, CAST([stop] + 1 AS INT), CHARINDEX(' ', @text, [stop] + 1) 
        FROM Pieces 
        WHERE [stop] > 0 
    )   
    INSERT INTO @tempPhrases
    SELECT pn, SUBSTRING(@text, start, CASE WHEN [stop] > 0 THEN [stop]-start ELSE LEN(@text) END) AS s 
    FROM Pieces
    OPTION (MAXRECURSION 0);    

    WITH CombinedPhrases ([Phrase]) AS 
    (
        -- SELECT ALL 2-WORD COMBINATIONS
        SELECT w1.[Phrase] + ' ' + w2.[Phrase] 
        FROM @tempPhrases w1
        JOIN @tempPhrases w2 ON w1.[Index] + 1 = w2.[Index]
        UNION ALL -- SELECT ALL 3-WORD COMBINATIONS
        SELECT w1.[Phrase] + ' ' + w2.[Phrase] + ' ' + w3.[Phrase]
        FROM @tempPhrases w1
        JOIN @tempPhrases w2 ON w1.[Index] + 1 = w2.[Index]
        JOIN @tempPhrases w3 ON w1.[Index] + 2 = w3.[Index]
        UNION ALL  -- SELECT ALL 4-WORD COMBINATIONS
        SELECT w1.[Phrase] + ' ' + w2.[Phrase] + ' ' + w3.[Phrase] + ' ' + w4.[Phrase]
        FROM @tempPhrases w1
        JOIN @tempPhrases w2 ON w1.[Index] + 1 = w2.[Index]
        JOIN @tempPhrases w3 ON w1.[Index] + 2 = w3.[Index]
        JOIN @tempPhrases w4 ON w1.[Index] + 3 = w4.[Index]
    )

    -- ONLY INSERT THE NEW PHRASES IN THE Phrase TABLE      
    INSERT INTO @tempPhrases
    SELECT 0, [Phrase] FROM CombinedPhrases

    -- DELETE PHRASES WHICH ARE EXCLUDED
    DELETE FROM @tempPhrases
    WHERE [Phrase] IN
    (
        SELECT [Text] FROM Phrases p
        JOIN ExcludedPhrases ex
        ON ex.ID = p.ID
    );

    MERGE INTO Phrases p
    USING 
    (
        SELECT DISTINCT Phrase FROM @tempPhrases
    ) t
    ON p.[Text] = t.Phrase
    WHEN NOT MATCHED THEN
        INSERT VALUES (t.Phrase);


    -- Finally create relations between the phrases and feeditem,   
    MERGE INTO FeedItemPhrases p
    USING 
    (
        SELECT @item as [Item], MIN(p.[ID]) as Phrase, COUNT(t.[Phrase]) as [Count]
        FROM Phrases p WITH (NOLOCK)
        JOIN @tempPhrases t ON p.[Text] = t.[Phrase]
        GROUP BY t.[Phrase]
    ) t
    ON p.FeedItem = t.Item
    AND p.Phrase = t.Phrase
    WHEN MATCHED THEN
        UPDATE SET p.[Count] = t.[Count]
    WHEN NOT MATCHED THEN
        INSERT VALUES (t.[Item], t.Phrase, t.[Count]);
END

and more:

ALTER Function [dbo].[RemoveNonAlphaCharacters](@Temp NVarChar(max)) 
Returns NVarChar(max) 
AS 
Begin 
    SELECT @Temp = REPLACE (@Temp, '%20', ' ');

    While PatIndex('%[^a-z ]%', @Temp) > 0 
        Set @Temp = Stuff(@Temp, PatIndex('%[^a-z ]%', @Temp), 1, '') 
    Return @TEmp 
End 
A: 

I looked around on the internet, and I couldn't find any way of making the trigger happen without claiming a lock. Therefore I choose to do the inserts via a stored procedure, which in turn performs the logic previously found in the trigger. This allowed me to execute the content of the trigger in a transaction AFTER the actual data was inserted and the insertion lock was lifted.

Hope this helps!

Polity