views:

203

answers:

9

We currently have a search on our website that allows users to enter a date range. The page calls a stored procedure that queries for the date range and returns the appropriate data. However, a lot of our tables contain 30m to 60m rows. If a user entered a date range of a year (or some large range), the database would grind to a halt.

Is there any solution that doesn't involve putting a time constraint on the search? Paging is already implemented to show only the first 500 rows, but the database is still getting hit hard. We can't put a hard limit on the number of results returned because the user "may" need all of them.

+1  A: 

Don't know which of these are possible

  1. Use a search engine rather than a database?
  2. Don't allow very general searches
  3. Cache the results of popular searches
  4. Break the database into shards on separate servers, combine the results on your application.
  5. Do multiple queries with smaller date ranges internally
Jeremy French
+2  A: 

We can't put a hard limit on the number of results returned because the user "may" need all of them.

You seem to be saying that you can't prevent the user from requesting large datasets for business reasons. I can't see any techical way around that.

anon
+1  A: 

It sounds like you really aren't paging. I would have the stored procedure take a range (which you calculated) for the pages and then only get those rows for the current page. Assuming that the data doesn't change frequently, this would reduce the load on the database server.

casperOne
+2  A: 

Index your date field and force a query to use that index:

CREATE INDEX ix_mytable_mydate ON mytable (mydate)
SELECT TOP 100 *
FROM mytable WITH (INDEX ix_mytable_mydate) 
WHERE mydate BETWEEN @start and @end

It seems that the optimizer chooses FULL TABLE SCAN when it sees the large range.

Could you please post the query you use and execution plan of that query?

Quassnoi
I agree, the execution plan would tell a lot - and also provide baseline to see if changes yield improvements. I am always very reluctant to force index use with an Index Hint - when SQL doesn't use index I expect it warrants investigation
Kristen
A: 

How is your table data physically structured i.e. partitioned, split across Filegroups and disk storage etc. ?

Are you using table partitioning? If not you should look into using aligned partitioning. You could partition your data by date, say a partition for each year as an example.

Where I to request a query spanning three years, on a multiprocessor system, I could concurrently access all three partitions at once, thereby improving query performance.

John Sansom
+3  A: 

It sounds to me like this is a design and not a technical problem. No one ever needs millions of records of data on the fly.

You're going to have to ask yourself some hard questions: Is there another way of getting people their data than the web? Is there a better way you can ask for filtering? What exactly is it that the users need this information for and is there a way you can provide that level of reporting instead of spewing everything?

Reevaluate what it is that the users want and need.

notnot
Of course they do. You'd have a problem delivering them, though. Take a look at what datawarehousing is doing. I can put an 1920*1200 pixels image on screen. That can represent 2 million records.
Stephan Eggermont
If it's something on the fly, people generally want some sort of breakdown or summary. If it's less frequent/more scheduled then the query time becomes less of an issue.It seems less likely that they users would need that much data on the fly. Maybe they'd prefer a daily report.
notnot
+4  A: 

If the user inputed date range is to large, have your application do the search in small date range steps. Possibly using a slow start approach: first search is limited to, say one month range and if it bings back less than the 500 rows, search the two preceding months until you have 500 rows.

You will want to start with most recent dates for descending order and with oldest dates for ascending order.

kmkaplan
A: 

How are you implementing the paging?

I remember I faced a problem like this a few years back and the issue was to do with how I implemented the paging. However the data that I was dealing with was not as big as yours.

Malachi
A: 

Parallelize, and put it in ram (or a cloud). You'll find that once you want to access large amounts of data at the same time, rdbms become the problem instead of the solution. Nobody doing visualizations uses a rdbms.

Stephan Eggermont