views:

162

answers:

4

I have a series of queries against a very mega large database, and I have hundreds-of-thousands of ORs in WHERE clauses. What is the best and easiest way to optimize such SQL queries? I found some articles about creating temporary tables and using joins, but I am unsure. I'm new to serious SQL, and have been cutting and pasting results from one into the next.

SELECT doc_id, language, author, title FROM doc_text WHERE language='fr' OR language='es'
SELECT doc_id, ref_id FROM doc_ref WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR ...
SELECT ref_id, location_id FROM ref_master WHERE ref_id=098765 OR ref_id=987654 OR ref_id=876543 OR OR OR ...
SELECT location_id, location_display_name FROM location
SELECT doc_id, index_code, FROM doc_index WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR x100,000

These unoptimized query can take over 24 hours each. Cheers.

+6  A: 

I think I just answered my own question... NESTED TABLES!

SELECT doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title, doc_ref.ref_id, ref_master.location_id, location.location_display_name, doc_index.doc_id, doc_index.display_heading
FROM DOC_TEXT, DOC_REF, REF_MASTER, LOCATION, DOC_INDEX
WHERE
    doc_text.language='fr' OR doc_text.language='es'
AND
    doc_text.doc_id=doc_ref.doc_id
AND
    doc_ref.doc_id=ref_master.ref_id
AND
    ref_master.location_id=location.location_id
AND
    doc_text.doc_id=doc_index.doc_id
bugaboo
That's not nested tables. These are common inner joins. Welcome to the world of relational databases. ;)
Tomalak
Thanks. I managed to go from day-long wait times to ".20145 SECONDS". Wooooooo!
bugaboo
Yeah… it's the small things in life that are the most enjoyable. ;) (Oh, and +1 for finding out on your own)
Tomalak
Mark your answer as correct if indeed, you were code generating the where clause from a select of the previous. You are correct that INNER JOIN is the correct solution and trying to simulate them otherwise would be a massive performance loss.
MatthewMartin
@Matthew: Accepting one's own answer is possible after 48 hours, not right away.
Tomalak
+1 for the "NESTED TABLES!" - that made my day :)
Jeffrey Kemp
+4  A: 

The easiest way to get that done is this:

  • Make indexes on the columns that are being filtered on (language, ref_id, doc_id, etc), at least double check their existence. Make them clustered if they are the primary index of the table.
  • Create helper tables that contain the conditions (add/delete conditions through INSERT/DELETE statements), index them too.
  • instead of 1000 "OR" components, make an INNER JOIN:

So...

SELECT doc_id, language, author, title 
  FROM doc_text
 WHERE language='fr' OR language='es'

becomes

INSERT language_search (language) VALUES ('fr')
INSERT language_search (language) VALUES ('es')
/* and 50 more */

SELECT dt.doc_id, dt.language, dt.author, dt.title 
  FROM doc_text dt
       INNER JOIN language_search ls ON dt.language = ls.language
Tomalak
As it seems, the question author was in search of something less complex - a join across 5 tables was what actually trying to do. Ah well.
Tomalak
+2  A: 

Instead of having a lot of conditions on the same field, you can use the in keyword:

SELECT doc_id, ref_id FROM doc_ref WHERE doc_id in (1234567, 1234570, 1234572, 1234596, ...)

This will make the queries shorter, but it's not certain that the performance will differ much. You should make sure that you have indexes on the relevant fields, that usually makes a huge difference for the performance.

Edit

However, it seems that the reason that you have a lot of values to compare is that you are using the result from one query to create the next. This should of course be solved with a join instead of a dynamic query:

select
  doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title,
  doc_ref.ref_id, ref_master.location_id, location.location_display_name,
  doc_index.doc_id, doc_index.display_heading
from DOC_TEXT
inner join DOC_REF on doc_text.doc_id = doc_ref.doc_id
inner join REF_MASTER on doc_ref.doc_id = ref_master.ref_id
inner join LOCATION on ref_master.location_id = location.location_id
inner join DOC_INDEX on doc_text.doc_id = doc_index.doc_id
where
  doc_text.language in ('fr', 'es')
Guffa
"using the result from one query to create the next" should have occurred to me as well. My psychic debugging abilities have failed me on this one. ;) +1
Tomalak
A: 

I think your real problem is that you are not JOINing tables.

this is a guess, but I'll bet that you run a query and then get all the IDs in your application and then run another query WHERE all the rows match from the previous query. You would greatly improve performance by writing a query with a join:

SELECT
    *
    FROM YourTableA            a
        INNER JOIN YourTableB  b ON a.ID=b.ID
    WHERE a. .....

then process the single result set in your application.

KM