views:

180

answers:

10

We have a table with 6 million records, and then we have a SQL which need around 7 minutes to query the result. I think the SQL cannot be optimized any more.

The query time causes our weblogic to throw the max stuck thread exception.

Is there any recommendation for me to handle this problem ?

Following is the query, but it's hard for me to change it,

SELECT * FROM  table1 
WHERE trim(StudentID) IN ('354354','0') 
AND concat(concat(substr(table1.LogDate,7,10),'/'),substr(table1.LogDate,1,5)) 
       BETWEEN '2009/02/02' AND '2009/03/02' 
AND TerminalType='1' 
AND RecStatus='0' ORDER BY StudentID, LogDate DESC, LogTime

However, I know it's time consuming for using strings to compare dates, but someone wrote before I can not change the table structure...

LogDate was defined as a string, and the format is mm/dd/yyyy, so we need to substring and concat it than we can use between ... and ... I think it's hard to optimize here.

+2  A: 

Without any further information about what kind of query you are executing and wheter you are using indexes or not it is hard to give any specific information.

But here are a few general tips.

  1. Make sure you use indexes on the columns you often filter/order by.
  2. If it is only a certain query that is way too slow, than perhaps you can prevent yourself from executing that query by automatically generating the results while the database changes. For example, instead of a count() you can usually keep a count stored somewhere.

Try to remove the trim() from the query by automatically calling trim() on your data before/while inserting it into the table. That way you can simply use an index to find the StudentID.

Also, the date filter should be possible natively in your database. Without knowing which database it might be more difficult, but something like this should probably work: LogDate BETWEEN '2009-02-02' AND '2009-02-02'

If you also add an index on all of these columns together (i.e. StudentID, LogDate, TerminalType, RecStatus and EmployeeID than it should be lightning fast.

WoLpH
I updated the query, could you give me some advices ?
MemoryLeak
If you cannot trim the data in the application doing the insert, you can force a TRIM using an on insert/update trigger to modify new.StudentId. But it seems very unlikely an Id column would be a string.
JulesLt
Will oracle auto determine the execute order of those "And" clauses ?
MemoryLeak
@MemoryLeak: Oracle will have no problem understanding how to execute a `BETWEEN` clause. And for the rest, if you only have `AND` clauses than the order does not matter. All conditions will have to be met.
WoLpH
I just think that the different execute order of the "And" can have different performance.
MemoryLeak
@MemoryLeak: it shouldn't make a difference. The queryplanner should automatically pick the best way to execute the query, regardless of the order of your filters.
WoLpH
+2  A: 

But the query time cause our weblogic to throw the max stuck thread exception.

If the query takes 7 minutes and cannot be made faster, you have to stop running this query real-time. Can you change your application to query a cached results table that you periodically refresh?

As an emergency stop-gap before that, you can implement a latch (in Java) that allows only one thread at a time to execute this query. A second thread would immediately fail with an error (instead of bringing the whole system down). That is probably not making users of this query happy, but at least it protects everyone else.

I updated the query, could you give me some advices ?

Those string manipulations make indexing pretty much impossible. Are you sure you cannot at least get rid of the "trim"? Is there really redundant whitespace in the actual data? If so, you could narrow down on just a single student_id, which should speed things up a lot.

You want a composite index on (student_id, log_date), and hopefully the complex log_date condition can still be resolved using a index range scan (for a given student id).

Thilo
+3  A: 

If your database supports it, you might want to try a materialized view.

If not, it might be worth thinking about implementing something similar yourself, by having a scheduled job that runs a query that does the expensive trims and concats and refreshes a table with the results so that you can run a query against the better table and avoid the expensive stuff. Or use triggers to maintain such a table.

Don Roby
A: 

With the little information you have provided, my hunch is that the following clause gives us a clue:

     ... WHERE trim(StudentID) IN ('354354','0') 

If you have large numbers of records with unidentified student (i.e. studentID=0) an index on studentID would be very imbalanced.

Of the 6 million records, how many have studentId=0?

Tim
Also, is student id a number or text? Looking at the quotation marks in your query it would seem to be defined as a text column. If it's an integer, remove the quotation marks.
Tim
+1  A: 

Without knowing what database you are using and what is your table structure, its very difficult to suggest any improvement but queries can be improved by using indexes, hints, etc.

In your query the following part concat(concat(substr(table1.LogDate,7,10),'/'), substr(table1.LogDate,1,5)) BETWEEN '2009/02/02' AND '2009/02/02'

is too funny. BETWEEN '2009/02/02' AND '2009/02/02' ?? Man, what are yuu trying to do?

Can you post your table structure here?

And 6 million records is not a big thing anyway.

Amit
+1 for the BETWEEN on strings ;) I'd love to see this code handling 7th of Gazillionember 2010.
eyescream
+4  A: 
SELECT * ... WHERE trim(StudentID) IN ('354354','0')

If this is normal construct, then you need a function based index. Because without it you force the DB server to perform full table scan.

As a rule of thumb, you should avoid as much as possible use of functions in the WHERE clause. The trim(StundentID), substr(table1.LogDate,7,10) prevent DB servers from using any index or applying any optimization to the query. Try to use the native data types as much as possible e.g. DATE instead of VARCHAR for the LogDate. StudentID should be also managed properly in the client software by e.g. triming the data before INSERT/UPDATE.

Dummy00001
+6  A: 

The odds are that this query is doing a full-file scan, because you're WHERE conditions are unlikely to be able to take advantage of any indexes.

Is LogDate a date field or a text field? If it's a date field, then don't do the substr's and concat's. Just say "LogDate between '2009-02-02' and '2009-02-03' or whatever the date range is. If it's defined as a text field you should seriously consider redefining it to a date field. (If your date really is text and is written mm/dd/yyyy then your ORDER BY ... LOGDATE DESC is not going to give useful results if the dates span more than one year.)

Is it necessary to do the trim on StudentID? It is far better to clean up your data before putting it in the database then to try to clean it up every time you retrieve it.

If LogDate is defined as a date and you can trim studentid on input, then create indexes on one or both fields and the query time should fall dramatically.

Or if you want a quick and dirty solution, create an index on "trim(studentid)".

If that doesn't help, give us more info about your table layouts and indexes.

Jay
Will oracle auto determine the execute order of those "And" clauses ? Since RecStatus='0' will filter out most of the data
MemoryLeak
Short answer: Basically, Yes. Longer answer: Oracle (or any SQL engine) has an "optimizer" that builds a "query plan", which is the logic it will use to satisfy the query. Basically this means deciding what order to read and join the tables -- a moot point here being this query uses only one table -- and what indexes to use. It's not so much what order it tests conditions, as what conditions it is able to satisfy with an index and what conditions it has to read record-by-record and keep or throw away. (continued ...)
Jay
If the RecStatus test is what will eliminate most of the unwanted records, then you should create an index on RecStatus. Normally you can only use one index per table per query, so if you normally want only a certain RecStatus within a certain StudentID, then you should create an index on (StudentID, RecStatus). (Oracle also has bitmap indexes where you can use more than one per query, but that's getting more complicated.)
Jay
A: 

Your main problem is that your query is treating everything as a string.

If LogDate is a Date WITHOUT a time component, you want something like the following

SELECT * FROM  table1 
WHERE  StudentID  IN (:SearchStudentId,0) 
AND  table1.LogDate = :SearchDate
AND TerminalType='1' 
AND RecStatus='0' 
ORDER BY EmployeeID, LogDate DESC, LogTime

If LogDate has a time component, and SearchDate does NOT have a time component, then something like this. (The .99999 will set the time to 1 second before midnight)

SELECT * FROM  table1 
WHERE  StudentID  IN (:SearchStudentId,:StudentId0) 
AND  table1.LogDate BETWEEN :SearchDate AND :SearchDate+0.99999 
AND TerminalType='1' 
AND RecStatus='0' 
ORDER BY EmployeeID, LogDate DESC, LogTime

Note the use of bind variables for the parameters that change between calls. It won't make the query much faster, but it is 'best practice'.

Depending on your calling language, you may need to add TO_DATE, etc, to cast the incoming bind variable into a Date type.

JulesLt
+1  A: 

It is told a lot your problem is in date field. You definitely need to change your date from a string field to a native date type. If it is a legacy field that is used in your app in this exact way - you may still create a to_date(logdate, 'DD/MM/YYYY') function-based index that transforms your "string" date into a "date" date, and allows a fast already mentioned between search without modifying your table data.

This should speed things up a lot.

be here now
A: 

If StudentID is a char (usually the reason for using trim()) you may be able to get better performance by padding the variables instead of trimming the field, like this (assuming StudentID is a char(10)):

StudentID IN (lpad('354354',10),lpad('0',10))

This will allow the index on StudentID to be used, if one exists.

Allan