ansaurus

Question

Web users searching for too much data

Answer 1

+1 A:

Don't know which of these are possible

Use a search engine rather than a database?
Don't allow very general searches
Cache the results of popular searches
Break the database into shards on separate servers, combine the results on your application.
Do multiple queries with smaller date ranges internally

Jeremy French 2009-02-13 17:12:25

Answer 2

+2 A:

We can't put a hard limit on the number of results returned because the user "may" need all of them.

You seem to be saying that you can't prevent the user from requesting large datasets for business reasons. I can't see any techical way around that.

anon 2009-02-13 17:12:43

Answer 3

+1 A:

It sounds like you really aren't paging. I would have the stored procedure take a range (which you calculated) for the pages and then only get those rows for the current page. Assuming that the data doesn't change frequently, this would reduce the load on the database server.

casperOne 2009-02-13 17:13:03

Answer 4

+2 A:

Index your date field and force a query to use that index:

CREATE INDEX ix_mytable_mydate ON mytable (mydate)
SELECT TOP 100 *
FROM mytable WITH (INDEX ix_mytable_mydate) 
WHERE mydate BETWEEN @start and @end

It seems that the optimizer chooses FULL TABLE SCAN when it sees the large range.

Could you please post the query you use and execution plan of that query?

Quassnoi 2009-02-13 17:14:26

I agree, the execution plan would tell a lot - and also provide baseline to see if changes yield improvements. I am always very reluctant to force index use with an Index Hint - when SQL doesn't use index I expect it warrants investigation

Kristen 2009-02-15 10:32:36

Answer 5

A:

How is your table data physically structured i.e. partitioned, split across Filegroups and disk storage etc. ?

Are you using table partitioning? If not you should look into using aligned partitioning. You could partition your data by date, say a partition for each year as an example.

Where I to request a query spanning three years, on a multiprocessor system, I could concurrently access all three partitions at once, thereby improving query performance.

John Sansom 2009-02-13 17:21:39

Answer 6

+3 A:

It sounds to me like this is a design and not a technical problem. No one ever needs millions of records of data on the fly.

You're going to have to ask yourself some hard questions: Is there another way of getting people their data than the web? Is there a better way you can ask for filtering? What exactly is it that the users need this information for and is there a way you can provide that level of reporting instead of spewing everything?

Reevaluate what it is that the users want and need.

notnot 2009-02-13 17:22:05

Of course they do. You'd have a problem delivering them, though. Take a look at what datawarehousing is doing. I can put an 1920*1200 pixels image on screen. That can represent 2 million records.

Stephan Eggermont 2009-02-13 17:45:23

If it's something on the fly, people generally want some sort of breakdown or summary. If it's less frequent/more scheduled then the query time becomes less of an issue.It seems less likely that they users would need that much data on the fly. Maybe they'd prefer a daily report.

notnot 2009-02-13 17:52:19

Answer 7

+4 A:

If the user inputed date range is to large, have your application do the search in small date range steps. Possibly using a slow start approach: first search is limited to, say one month range and if it bings back less than the 500 rows, search the two preceding months until you have 500 rows.

You will want to start with most recent dates for descending order and with oldest dates for ascending order.

kmkaplan 2009-02-13 17:25:25

Answer 8

A:

How are you implementing the paging?

I remember I faced a problem like this a few years back and the issue was to do with how I implemented the paging. However the data that I was dealing with was not as big as yours.

Malachi 2009-02-13 17:39:35

Answer 9

A:

Parallelize, and put it in ram (or a cloud). You'll find that once you want to access large amounts of data at the same time, rdbms become the problem instead of the solution. Nobody doing visualizations uses a rdbms.

Stephan Eggermont 2009-02-13 21:25:59

ansaurus

tags:

views:

answers:

Web users searching for too much data

related questions