I used the ANTS profiler to identify the remaining bottleneck in my C# application: the SQL Server stored procedure. I am using SQL Server 2008. Can anybody here help me increase performance, or give me pointers as to what I can do to make it better or more performant?
First, here's the procedure:
PROCEDURE [dbo].[readerSimilarity]
-- Add the parameters for the stored procedure here
@id int,
@type int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
IF (@type=1) --by Article
SELECT id1, id2, similarity_byArticle FROM similarity WHERE (id1 = @id OR id2 = @id)
AND similarity_byArticle != 0
ELSE IF (@type=2) --by Parent
SELECT id1, id2, similarity_byParent FROM similarity WHERE (id1 = @id OR id2 = @id)
AND similarity_byParent != 0
ELSE IF (@type=3) --by Child
SELECT id1, id2, similarity_byChild FROM similarity WHERE (id1 = @id OR id2 = @id)
AND similarity_byChild != 0
ELSE IF (@type=4) --combined
SELECT id1, id2, similarity_combined FROM similarity WHERE (id1 = @id OR id2 = @id)
AND similarity_combined != 0
END
The table 'similarity
' consists of two id
s (id1
and id2
) and a number of columns that store double
values. The constraint is that id1 < id2
.
Column Data ----- ---- ID1 PK, Indexed ID2 PK, Indexed The table contains 28.5 million entries.
Stored Procedure Background
The job of the stored procedure is to get all the rows that have the parameter id
in either id1
or id2
. Additionally, the column specified by the type-parameter cannot be zero.
The stored procedure is called multiple times for different ids
. Although only taking ~1.6 ms
per call, it sums up, when calling it 17,000 times.
The processor is running at only 25%, which seems to be because the application is waiting for the procedure call to return.
Do you see any way to speed things up?
Calling the Stored Procedure C# Code Snippet
private HashSet<NodeClustering> AddNeighbourNodes(int id)
{
HashSet<NodeClustering> resultSet = new HashSet<NodeClustering>();
HashSet<nodeConnection> simSet = _graphDataLoader.LoadEdgesOfNode(id);
foreach (nodeConnection s in simSet)
{
int connectedId = s.id1;
if (connectedId == id)
connectedId = s.id2;
// if the corresponding node doesn't exist yet, add it to the graph
if (!_setNodes.ContainsKey(connectedId))
{
NodeClustering nodeToAdd = CreateNode(connectedId);
GraphAddOuter(nodeToAdd);
ChangeWeightIntoCluster(nodeToAdd.id, s.weight);
_bFlowOuter += s.weight;
resultSet.Add(nodeToAdd);
}
}
// the nodes in the result set have been added
to the outernodes -> add to the outernodes count
_setNodes[id].countEdges2Outside += resultSet.Count;
return resultSet;
}
C# Code Background Information
This method is called each time a new id
is added to the cluster. It gets all the connected nodes of that id
(they are connected, when there is an entry in the db with id1=id
or id2=id
) via
_graphDataLoader.LoadEdgesOfNode(id);
Then it checks all the connected ids
and if they are not loaded yet:
if (!_setNodes.ContainsKey(connectedId))
It Loads them:
CreateNode(connectedId);
The Method:
_graphDataLoader.LoadEdgesOfNode(id);
is called again, this time with the connectedId
.
I need this to get all the connections of the new nodes with those nodes that are already in the set.
I probably could collect the ids
of all nodes i need to add and call my stored procedure only once with a list of the ids.
Ideas
I could probably load the connected ids connection at once via something like
SELECT id1, id2, similarity_byArticle FROM similarity WHERE
(id1 = @id OR id2 = @id OR
id1 IN (SELECT id1 FROM similarity WHERE id2 = @id) OR
id2 IN (SELECT id1 FROM similarity WHERE id2 = @id) OR
id1 IN (SELECT id2 FROM similarity WHERE id1 = @id) OR
id2 IN (SELECT id2 FROM similarity WHERE id1 = @id))
AND similarity_byArticle != 0
but then I would get more entries than I'd need, because I would get them for already loaded nodes too (which from my tests would make up around 75% of the call).
Questions
- How can I speed up the Stored Procedure?
- Can I do it differently, is there a more performant way?
- Can I use a
List<int>
as a SP-Parameter? - Any other thoughts?