I have some data analysis that needs to perform. On average, it would involve somewhere in between 50K-150K rows. From these rows I need to extract the summation of Sum(X) as well as Count(X) based on five different criteria. There are two ways of going about it:
- Write 10 different queries, each one designed to aggregate the data from column X using Sum() or Count(). Run each one and retrieve the result using SqlCommand.ExecuteScalar().
- Create a custom object to contain all of the different parameters that would be needed to evaluate the different conditions. Run one query that will return all of the data needed to make up the superset containing all of the different conditional subsets, using SqlCommand.ExecuteDataReader(). Read each row from the DataReader into a new object, adding each one into a List collection. One all data is retrieved, use Linq-to-Object to determine the different Sum() and Count() values needed based on different conditions.
I know that I could try each one out to see which is fastest, but I am interested in the community's advice on which one is likely to be faster. Assume Sql Server and Web Server each running on their own machines, each with sufficient memory.
Right now I am leaning towards option 1. Even though there are many more queries to the DB, the DB itself will do all of the aggregation work and very little data will pass in between the Sql Server and the Web Server. With option 2, there is only one query, but it will pass a very large amount of data to .Net, and then .Net will have to do all of the heavy lifting with regards to the aggregate functions (and though I don't have anything to base it on, I suspect that Sql Server is more efficient at running these types of big aggregate functions).
Any thoughts on which way to go (or a third option that I am missing)?