views:

41

answers:

4

I am working on an ASP.NET MVC project which allows users to construct arbitrarily complex queries by adding items clause at a time.

The application then generates appropriate SQL, runs it (currently using SQL Server 2008) and displays the results, with a breakdown which includes the number of matching records for each added item.

e.g.

UserQuery:

Has ConditionA (45)
And ConditionB (33)
Or ConditionC (55)
And ConditionD (15)

Total: 48

The problem is how best to build and run the SQL to produce these results with performance and scalability in mind.

The initial implementation built a query (using subqueries) for each item (combined with the previous) in turn, running them separately as scalars. Each execution involved generating the SQL and opening new SqlConnection, creating new SqlCommand and executing.

I spent a while re-writing this to produce a single query (which uses CTEs) to return a single row with the result of each item as a column.

This only required a single execution and performance seemed marginally favourable until the queries became complex and SQL Server started throwing errors:

The query processor ran out of internal resources and could not produce a query plan

What would be the most scalable and efficient way of building and running such a query?

+2  A: 

How about using LINQ?

You can have an IQueryable started as a straight from t in table select t and keep adding expressions to it (via Where) and then let the LINQ to SQL provider generate the query. All you need is some code to compile the the arbitrary ConditionA expressions (presumable text) into equivalent lambda expressions. This is a not trivial.

The hope is that the LINQ provider collapses all expressions into one single WHERE clause and avoids subqueries.

Ultimately though, you cannot have a query contain filtering expressions ad nauseam, sometime will become to complex no matter how you express it.

Remus Rusanu
Thanks for this - a useful suggestion as it made me consider the problem differently. I didn't get chance to try and implement it though.
TonE
A: 

You're supporting returning results from each condition along with all the conditions supplied, and that's unnecessary. If they want to know the results from one condition, they can run that condition alone to get it.

I would store the clauses in a table and have an sp append the conditions for their query into one WHERE clause. Might tax the server less than having one subquery for each condition clause.

Beth
A: 

I'm not sure the requirement behind this, but you may want to consider some alternatives.

First, to you really gain anything by showing the number of records affected by each clause? In your example, the numbers don't really help understand the overall impact of the query. Removing that would probably simplify things.

Can you go to a client-side solution, where you can use a javascript framework to filter the results?

I've been involved in projects that tried to do something similar; in the end, there were a small number of people who needed the ability to do complex ad-hoc queries, and we ended up just giving them appropriate reporting tools.

chris
A: 

The way forward for us was to construct the query using temporary tables for each clause, where each subsequent clause added was applied (via Union/Intersection/Exception) to the temporary table resulting from the previous clause.

A temp table is also created for results and is updated with a TempTable Id and row count for each as they are populated.

When the query has been processed the results are returned by selecting all rows from the results temp table which gives a full item by item breakdown.

This prevented the need for gigantic SQL query statments featuring many many subqueries and also prevented constant re-execution of the same SQL, providing massive improvement in scalbility and performance.

TonE