views:

68

answers:

2

I have a table with large number of rows(~200 million) and I want to process these values in c#, after reading them from memory. Processing requires grouping entries by column values in a way that can't be done inside the sql server itself. Problem is that reading the whole data at once gives me a OutOfMemory exception, and takes a lot of time to execute even partially.

So I want to break my query into shorter pieces. One method is to obviously do an independent select and then use the where in clause. Another method that I have been suggested is to use sql cursors. I want to chose one of these methods(or another one if possible), especially with regards to the following points:

  1. What would be the performance impact of the schemes on the server? Which would perform faster?
  2. Can I safely parallelize the sql cursor queries? Would I get a performance benefit if I parallelize the first scheme(one with where in clause)?
  3. How many objects can I specify in where in clause? Is it only limited by the size of the query string?

Any other suggestions are also welcome.

Edit1: I have been given different solutions, but I would still like to know the answers to my original questions(out of curiousity).

+1  A: 

If you have to do the grouping logic in code, you can try to write the logic as a Managed Stored Procedure in sql server which can be used in the groping query.

Check out

This will allow you to group on the server before returning the dataset to your client.

[Edit - regarding your comments on using Dictionaries]

You can check out my project on Codeplex which has a disk persisting Dictionary<T,V>. This would prevent the out of memory exception. Would be interesting to see how it performs for your scenario. (If you are on a 32bit system, read the note on the intro page).

Mikael Svenson
Seems like a useful techonology... but I am currently using dictionaries to store the output of my logic and I dont know how to transfer dictionaries from SQL server to .NET app.
apoorv020
Can you use a temp table as your dictionary, and return the data from it instead after the grouping?
Mikael Svenson
I added a note on my answer about using disk persisting dictionaries.
Mikael Svenson
Hmmm... I think I can make a temporary table out of dictionary values, and return that out of my sql procedure.
apoorv020
+1  A: 

If you are using sql 2005 or higher you should check out sql based paging.

http://blogs.x2line.com/al/archive/2005/11/18/1323.aspx

It should work for what you are trying to do and is a better option than the two you listed.

spinon
Seems like what I need... I will have to read and see how this works. Thanks very much :).
apoorv020
But ORDER BY is an expensive operation... any idea how much performance this method would yield?
apoorv020
Here is another link to check out that might have some better information: http://www.eggheadcafe.com/tutorials/aspnet/fe1e8749-26dd-4db7-9c18-b6ea5c39aa73/sql-server-2005-paging-pe.aspx
spinon