views:

156

answers:

3

I have a MS SQL 2008 database which stores data for creating a weighted, undirected graph. The data is stored in tables with the following structure:

[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL

where [id1] and [id2] represents the two connected nodes and [weight] the weight of the edge that connects these nodes.

There are several different algorithms, that create the graph from some basic data. For each algorithm, I want to store the graph-data in a separate table. Those tables all have the same structure (as shown above) and use a specified prefix (similarityALB, similaritybyArticle, similaritybyCategory, ...) so I can identify them as graph-tables.

The client program can select, which table (i.e. by which algorithm the graph is created) to use for the further operations.

Access to the data is done by stored procedures. As I have different tables, I would need to use a variable tablename e.g.:

SELECT id1, id2, weight FROM @tableName

This doesn't work because SQL doesn't support variable tablenames in the statement. I have searched the web and all solutions to this problem use the dynamic SQL EXEC() statement e.g.:

EXEC('SELECT id1, id2, weight FROM ' + @tableName)

As most of them mentioned, this makes the statement prone to SQL-injection, which I'd like to avoid. A simple redesign idea would be to put all the different graphs in one table and add a column to identify the different graphs.

[graphId] [int] NOT NULL,
[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL

My problem with this solution is, that the graphs can be very large depending on the used algorithm (up to 500 Million entries). I need to index the table over (id1, id2) and (id2, id1). Now putting them all in one big table would makes the table even huger (and requests slower). Adding a new graph would result in bad performance, because of the active indicees. Deleting a graph could not be done by TRUNCATE anymore, I would need to use

DELETE * FROM myTable WHERE graphId=@Id

which performs very bad with large tables and creates a very large logfile (which would exceed my disk space when the graph is big enough). So I'd like to keep the independent tables for each graph.

Any suggestions how to solve this problems by either find a way to parametrize the tablename or to redesign the database structure while avoiding the aforementioned problems?

+1  A: 

SQL injection can easily be avoided in this case by comparing @tableName to the names of the existing tables. If it isn't one of them, it's bad input. (Obligatory xkcd reference: That is, unless you have a table called "bobby'; drop table students;")

Anyway, regarding your performance problems, with partitioned tables (since SQLServer 2005), you can have the same advantages like having several tables, but without the need for dynamic SQL.

ammoQ
I like the table name check most, think I will go that way. I had a look at partitioned tables - one problem there would be that I can't use truncate to delete whole graphs (as IronGoofy mentioned). Additionally, I'll have a look at how to use sp_executesql, 'cause it supports parameter substitution and execution plans (for reuse).
Aaginor
In Oracle, you can truncate a single partition within a partitioned table (ALTER TABLE foo TRUNCATE PARTITION bar;). In SQL-Server, you have to move the data from the partition into a new table, which you can drop afterwards... weird...
ammoQ
A: 

Partioned Table may be the answer to your problem. I've got another idea, that's "the other way around":

  • each graph has it's own table (so you can still truncate table)
  • define a view (with the structured you mentioned for your redefined table) as a UNION ALL over all graph-tables

I have no idea of the performance of a select on this view and so on, but it may give you what you are looking for. I'd be interested in the results if try this out ..

IronGoofy
I'd guess that it UNIONS the tables first (and thus has worse performance), but I'll check this out.
Aaginor
I checked it out: The first query over the view took about the time it would need to query over all tables. Any repeated calls to this query took about the same time as it need to query over a single table. (I made the view with a UNION over 5 tables) So it's only the first call, but still this is not good, I think
Aaginor
+1  A: 

Maybe I did not understand everything, but:

CREATE PROCEDURE dbo.GetMyData (
     @TableName AS varchar(50)
    )
AS 
BEGIN
    IF @TableName = 'Table_1' 
        BEGIN
            SELECT  id1
                   ,id2
                   ,[weight]
            FROM    dbo.Table_1
        END

    IF @TableName = 'Table_2' 
        BEGIN
            SELECT  id1
                   ,id2
                   ,[weight]
            FROM    dbo.Table_2
        END
END

and then:

EXEC dbo.GetMyData @TableName = 'Table_1'

A different technique involves using synonyms dynamically, for example:

DECLARE @TableName varchar(50)
SET @TableName = 'Table_1' 

-- drop synonym if it exists
IF object_id('dbo.MyCurrentTable', 'SN') IS NOT NULL 
    DROP SYNONYM MyCurrentTable ;

-- create synonym for the current table
IF @TableName = 'Table_1' 
    CREATE SYNONYM dbo.MyCurrentTable FOR dbo.Table_1 ;

IF @TableName = 'Table_2' 
    CREATE SYNONYM dbo.MyCurrentTable FOR dbo.Table_2 ;

-- use synonym
SELECT  id1, id2, [weight] 
FROM dbo.MyCurrentTable
Damir Sudarevic
I already use the IF (@Table='mytable1') ... ELSE IF (...) construct in my stored procedures, but I wanted to get away from it. Whenever I add, rename or delete a new table, I need to update ALL my SPs. But this synonym-thing looks interesting! Is it possible to make a single procedure or function that creates the synonym, that is then called from my stored procs, so they can use the synonym? Then I would only need to update ONE proc/function when I add/update/delete a table!
Aaginor
Yes synonym is a db object, it is an alternative name for an another db object.
Damir Sudarevic
I have implemented the Synonym-Object-Setting-Stored-Procedure and it worked fine ... until I started to access the SPs concurrently. Looks like I need to define a special Synonym for each procedure-call.
Aaginor
Synonyms have database scope, two processes can not fight to redirect a synonym before one of them finishes. Synonym is a DB object, like table, view, index..
Damir Sudarevic