I have a MS SQL 2008 database which stores data for creating a weighted, undirected graph. The data is stored in tables with the following structure:
[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL
where [id1] and [id2] represents the two connected nodes and [weight] the weight of the edge that connects these nodes.
There are several different algorithms, that create the graph from some basic data. For each algorithm, I want to store the graph-data in a separate table. Those tables all have the same structure (as shown above) and use a specified prefix (similarityALB, similaritybyArticle, similaritybyCategory, ...) so I can identify them as graph-tables.
The client program can select, which table (i.e. by which algorithm the graph is created) to use for the further operations.
Access to the data is done by stored procedures. As I have different tables, I would need to use a variable tablename e.g.:
SELECT id1, id2, weight FROM @tableName
This doesn't work because SQL doesn't support variable tablenames in the statement. I have searched the web and all solutions to this problem use the dynamic SQL EXEC() statement e.g.:
EXEC('SELECT id1, id2, weight FROM ' + @tableName)
As most of them mentioned, this makes the statement prone to SQL-injection, which I'd like to avoid. A simple redesign idea would be to put all the different graphs in one table and add a column to identify the different graphs.
[graphId] [int] NOT NULL,
[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL
My problem with this solution is, that the graphs can be very large depending on the used algorithm (up to 500 Million entries). I need to index the table over (id1, id2) and (id2, id1). Now putting them all in one big table would makes the table even huger (and requests slower). Adding a new graph would result in bad performance, because of the active indicees. Deleting a graph could not be done by TRUNCATE anymore, I would need to use
DELETE * FROM myTable WHERE graphId=@Id
which performs very bad with large tables and creates a very large logfile (which would exceed my disk space when the graph is big enough). So I'd like to keep the independent tables for each graph.
Any suggestions how to solve this problems by either find a way to parametrize the tablename or to redesign the database structure while avoiding the aforementioned problems?