views:

70

answers:

10

In a certain app I must constantly query data that are likely to be amongst the last inserted rows. Since this table is going to grow a lot, I wonder if theres a standard way of optimizing the queries by making them start the lookup at the table's end. I think I would get the same optmization if the database stored data for the table in a stack-like structure, so the last inserted rows would be searched first.

+2  A: 

Add a counter or a time field in your table, sort on it and get top rows.

In other words: You should forget the idea that SQL tables are accessed in any particular order by default. A seqscan does not mean the oldest rows will be searched first, only that all rows will be checked. If you want to optimize some search you add indexes on some fields. What you are looking for is probably indexes.

kriss
...and create an index on that counter/time field/whatever.
FrustratedWithFormsDesigner
@FrustratedWithFormsDesigne: was typing :-)
kriss
+2  A: 
  1. If your data is indexed, it won't matter. The index is doing a binary search, not a sequential scan.
  2. Unless you're doing TOP 1 (or something like it), the SELECT will have to scan the whole table or index anyway.
egrunin
@egrunin: Regarding your first point: "The index is doing a binary search, not a sequential scan." Not always. Some queries can result in an "INDEX SCAN". If this is the case you can see it in the query plan.
Mark Byers
@Mark Byers: yes, that's why I mentioned indexes in point 2.
egrunin
+4  A: 

The SQL spec doesn't mention anything about maintaining the insertion order. In practice, most of decent DB's also doesn't maintain it. Then it stops here. Sorting the table first ain't going to make it faster. Just index the column(s) of interest (at least the ones which you use in the WHERE).

BalusC
I need to query an 'accounting entries' table. The table has a surrogate big integer key and I must find the row that matches a certain condition in two columns(c1/c2) and has the biggest key. If I create an index with c1 and c2, what can you tell me about the performance of the following meta-query : 'SELECT TOP 1 FROM t WHERE CONDITION(c1) and CONDITION2(c2) ORDER BY id DESC' ?
Thiado de Arruda
If at least `c1`, `c2` and `id` are indexed, it'll be blazing fast when compared to unindexed columns. It'll be as exponentially faster as the table grows.
BalusC
do I need to explicitly index the primary key 'id' ?
Thiado de Arruda
No, it's at its own already an index. All keys are already indexes. You just choose the kind of index it needs to be: Primary key (unique ID), Foreign key (reference to another table), Unique key (unique in table, but no primary), Indexed ("just" indexed in memory).
BalusC
A: 

If your table has a create date, then I'd reverse sort by that and take the top 1.

Andy Evans
+2  A: 

There is no standard way.

In some databases you can specify the sort order on an index.

SQL Server allows you to write ASC or DESC on an index:

[ ASC | DESC ]

Determines the ascending or descending sort direction for the particular index column. The default is ASC.

In MySQL you can also write ASC or DESC when you create the index but currently this is ignored. It might be implemented in a future version.

Mark Byers
+3  A: 

One of the "tenets" of a proper RDBMS is that this kind of matters shouldn't concern you or anyone else using the DB.

The DB engine is "free" to use whatever method it wants to store/retrieve records, so if you want to enforce a "top" behaviour do what other suggested: add a timestamp field to the table (or tables), add an index on it and query using it as a sort and/or query criteria (e.g.: you poll the table each minute, and ask for records with timestamp>=systime-1 minute)

p.marino
+1  A: 

If you have enough rows that its actually becomming a problem, and you know how many "the most recently inserted rows" should be, you could try a round-about method.

Note: Even for pretty big tables, this is less efficient, but once your main table gets big enough, I've seen this work wonders for user-facing performance.

Create a "staging" table that exactly mimics your table's structure. Whenever you insert into your main table, also insert into your "staging" area. Limit your "staging" area to n rows by using a trigger to delete the lowest id row in the table when a new row over your arbitrary maximum is reached (say, 10,000 or whatever your limit is).

Then, queries can hit that smaller table first looking for the information. Since the table is arbitrarilly limited to the last n rows, it's only looking in the most recent data. Only if that fails to find a match would your query (actually, at this point a stored procedure because of the decision making) hit your main table.

Some Gotchas:
1) Make sure your trigger(s) is(are) set up properly to maintain the correct concurrancy between your "main" and "staging" tables.
2) This can quickly become a maintenance nightmare if not handled properly- and depending on your scenario it be be a little finiky.
3) I cannot stress enough that this is only efficient/useful in very specific scenarios. If yours doesn't match it, use one of the other answers.

AllenG
+2  A: 

According to Data Independence you shouldn't care. That said a clustered index would probably suit your needs if you typically look for a date range. (sorting acs/desc shouldn't matter but you should try it out.)

If you find that you really need it you can also shard your database to increase perf on the most recently added data.

Conrad Frix
+1  A: 

ISO/ANSI Standard SQL does not consider optimization at all. For example the widely recognized CREATE INDEX SQL DDL does not appear in the Standard. This is because the Standard makes no assumptions about the underlying storage medium and nor should it. I regularly use SQL to query data in text files and Excel spreadsheets, neither of which have any concept of database indexes.

onedaywhen
+1  A: 

You can't do this.

However, there is a way to do something that might be even better. Depending on the design of your table, you should be able to create an index that keeps things in almost the order of entry. For example, if you adopt the common practice of creating an id field that autoincrements, then that index is just about in chronological order.

Some DBMSes permit you to declare a backwards index, one that descends instead of ascending. If you create a backwards index on the ID field, and if the optimizer uses that index, it will look at the most recent entries first. This will give you a rapid response for the first row.

The next step is to get the optimizer to use the index. You need to use explain plan to see if the index is being used. If you ask for the rows in order of id descending, the optimizer will almost certainly use the backwards index. If not you may be able to use hints to guide the optimizer.

If you still need to avoid reading all the rows in order to avoid wasting time, you may be able to use the LIMIT feature to dclare that you only want, say 10 rows, and no more, or 1 row and no more. That should do it.

Good luck.

Walter Mitty