views:

149

answers:

1

Excuse the long post, but below I have included a full script to generate and populate my test harness.

My test harness has the following tables

|--------|  |-------------|  |-----|  |--------------|
|Column  |  |ColumnValue  |  |Row  |  |RowColumnValue|
|--------|  |-------------|  |-----|  |--------------|
|ColumnId|  |ColumnValueId|  |RowId|  |RowId         |
|Name    |  |ColumnId     |  |Name |  |ColumnValueId |
|--------|  |Value        |  |-----|  |--------------|
            |-------------|

They represent Rows and Columns in a table. The possible values of a cell in a column are stored in ColumnValue. The selected values for a Row are stored in RowColumnValue. (I hope that is clear)

I have populated the data with 10 Columns, 10,000 Rows, 50 Column Values per column (500) and 25 selected column values per Row (250,000).

I have some dynamic sql which returns all the rows, pivotted with the columns and contains an XML list of the selected column values for each column.

Note: For performance testing purposes I have wrapped the query in a SELECT COUNT(*) so that the query will not return a large amount of data over the network.

My test harness runs this query (with the count) in approx 5-6 seconds. The execution plan shows that 92% of the query is spent on a Clustered Index Seek on [ColumnValue].[PK_ColumnValue]. The Client Statistics show a Client processing time, Total execution time and Wait time on server replies all at 0.

I realise that 250k rows in the RowColumnValue table is quite a lot and I might be expecting too much from SQL Server. However, my expectation is that query should be able to run a lot faster than this. Or at the very least the execution plan should be presenting a different bottle neck rather than the Clustered Index Seek.

Can anyone shed any light on the problem or give me some suggestions on how to make this more efficient?

Dynamic SQL that runs the pivot to display the table:

DECLARE @columnDataList NVARCHAR(MAX)
SELECT
    @columnDataList = 
    CAST
    (
        (
            SELECT
                ', CONVERT(xml, [PVT].[' + [Column].[Name] + ']) [Column.' + [Column].[Name] + ']'
            FROM
                [Column]
            ORDER BY
                [Column].[Name]
            FOR XML PATH('')
        ) AS XML
    ).value('.', 'NVARCHAR(MAX)')

DECLARE @columnPivotList NVARCHAR(MAX)
SELECT
    @columnPivotList = 
    CAST
    (
        (
            SELECT
                ', [' + [Column].[Name] + ']'
            FROM
                [Column]
            ORDER BY
                [Column].[Name]
            FOR XML PATH('')
        ) AS XML
    ).value('.', 'NVARCHAR(MAX)')

EXEC('
    SELECT
        COUNT(*)
    FROM
    (
        SELECT
            [PVT].[RowId]
            ' + @columnDataList + '
        FROM
        (
            SELECT
                [Row].[RowId],
                [Column].[Name] [ColumnName],
                [XmlRowColumnValues].[XmlRowColumnValues] [XmlRowColumnValues]
            FROM
                [Row]
            CROSS JOIN
                [Column]
            CROSS APPLY
            (
                SELECT
                    [ColumnValue].[Value] [Value]
                FROM
                    [RowColumnValue]
                INNER JOIN
                    [ColumnValue]
                ON
                    [ColumnValue].[ColumnValueId] = [RowColumnValue].[ColumnValueId]
                WHERE
                    [RowColumnValue].[RowId] = [Row].[RowId]
                AND
                    [ColumnValue].[ColumnId] = [Column].[ColumnId]
                FOR XML PATH (''''), ROOT(''Values'')
            ) [XmlRowColumnValues] ([XmlRowColumnValues])
        ) [PivotData]
        PIVOT
        (
            MAX([PivotData].[XmlRowColumnValues])
        FOR
            [ColumnName]
            IN
            ([0]' + @columnPivotList + ')
        ) PVT
    ) RowColumnData
')

Script to generate and populate database:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Row](
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Row] PRIMARY KEY CLUSTERED 
(
    [RowId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Column](
    [ColumnId] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Column] PRIMARY KEY CLUSTERED 
(
    [ColumnId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[RowColumnValue](
    [RowId] [int] NOT NULL,
    [ColumnValueId] [int] NOT NULL,
 CONSTRAINT [PK_RowColumnValue] PRIMARY KEY CLUSTERED 
(
    [RowId] ASC,
    [ColumnValueId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ColumnValue](
    [ColumnValueId] [int] IDENTITY(1,1) NOT NULL,
    [ColumnId] [int] NOT NULL,
    [Value] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_ColumnValue] PRIMARY KEY CLUSTERED 
(
    [ColumnValueId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [FKIX_ColumnValue_ColumnId] ON [dbo].[ColumnValue] 
(
    [ColumnId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_ColumnValue_Column] FOREIGN KEY([ColumnId])
REFERENCES [dbo].[Column] ([ColumnId])
GO
ALTER TABLE [dbo].[ColumnValue] CHECK CONSTRAINT [FK_ColumnValue_Column]
GO
ALTER TABLE [dbo].[RowColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_RowColumnValue_ColumnValue] FOREIGN KEY([ColumnValueId])
REFERENCES [dbo].[ColumnValue] ([ColumnValueId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_ColumnValue]
GO
ALTER TABLE [dbo].[RowColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_RowColumnValue_Row] FOREIGN KEY([RowId])
REFERENCES [dbo].[Row] ([RowId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_Row]
GO

DECLARE @columnLoop INT
DECLARE @columnValueLoop INT
DECLARE @rowLoop INT

DECLARE @columnId INT
DECLARE @columnValueId INT
DECLARE @rowId INT

SET @columnLoop = 0

WHILE @columnLoop < 10
BEGIN

    INSERT INTO [Column] ([Name]) VALUES(NEWID())

    SET @columnId = @@IDENTITY

    SET @columnValueLoop = 0

    WHILE @columnValueLoop < 50
    BEGIN

        INSERT INTO [ColumnValue] ([ColumnId], [Value]) VALUES(@columnId, NEWID())

        SET @columnValueLoop = @columnValueLoop + 1

    END

    SET @columnLoop = @columnLoop + 1

END

SET @rowLoop = 0

WHILE @rowLoop < 10000
BEGIN

    INSERT INTO [Row] ([Name]) VALUES(NEWID())

    SET @rowId = @@IDENTITY

    INSERT INTO [RowColumnValue] ([RowId], [ColumnValueId]) SELECT TOP 25 @rowId, [ColumnValueId] FROM [ColumnValue] ORDER BY NEWID()

    SET @rowLoop = @rowLoop + 1

END
+1  A: 

I agree with @marc_s and @KM that this grand design is doomed from the start.

Millions of Microsoft's developer-hours have gone into building and fine-tuning a robust and powerful database engine, but you're going to reinvent it all by cramming everything into a small number of generic tables and re-implementing everything that SQL Server is already engineered to do for you.

SQL Server already has tables containing names of entities, names of columns, and so on. The fact that you don't normally interact directly with these system tables is a good thing: It's called abstraction. And it's unlikely that you're going to do a better job of implementing that abstraction than SQL Server does.

At the end of the day, with your approach (a) even the simplest queries will be monstrous; and (b) you're never going to come close to optimal performance, because you're forgoing all of the query optimization that you'd otherwise get for free.

Without knowing anything more about your application or your requirements, it's hard to give any sort of specific advice. But I'd suggest that some good old normalization would go a long way. Any well-implemented, non-trivial database has a lot of tables; ten tables plus ten xtab tables shouldn't scare you away.

And don't be afraid of SQL code generation as a way of implementing common interfaces across disparate tables. A little can go a long way.

Herb Caudill
For starters, I am in NO WAY trying to build a database engine to store a table... The Row / Column table names were purely to make it easy to create a test harness and show how I wanted it to be returned for the query in question. The actual tables are a way of classifying data, a little like a tag cloud with the tags being the ColumnValue's with an additional "type of tag" which is the Column table. The item being tagged is the Row table. I will spend some time putting this actual concept into the question to see if anyone can come up with a better implementation.
Robin Day
To add to that, the "Column" table would not be "10x", this was just in this example. There are currently ~3000 entries in the "Column" table, each would need it's own list of valid entries "ColumnValue" varying from 2-3 valid entries up to hundreds. Each Client has anything from 2-3 specific types of tagging up to some that classify their tags 20 different ways.
Robin Day
@Robin - sorry then if I went off half-cocked. I look forward to seeing the details - please add a comment here so I'll see the revised post (or new question if you choose to do it that way).
Herb Caudill
Whilst its quite a while later, I have a new question which is of a simliary nature. I'm trying to get rid of the EAV design. http://stackoverflow.com/questions/3013831/db-design-pattern-many-to-many-classification-categorised-tagging
Robin Day