views:

136

answers:

3

in mysql i have two tables

tableA

col1   col2  SIM1 ..........col24
-----------------------------------
a       x     1             5 
b       y     1             3
c       z     0             2
d       g     2             1

tableB

colA   colB   SIM2
-------------------
x       g     1
y       f     0
x       s     0
y       e     2

Actually the number of records in the two tables in 0.4 million

i have a java program from which i am executing sql query using jdbc.

here is the query

     SELECT * 
      FROM TableA 
INNER JOIN TableB ON TableA.SIM1 =  TableB.SIM2 
INTO OUTFILE 'c:/test12226.csv' "+ 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '\"'  
LINES TERMINATED BY '\n'

This query is taking a really long time. for my application to be feasible this should not take more than 30 seconds. i understand the records are 0.4 million but such an operation in ms access takes less than 10 seconds. is java-mysql combination more time consuming than ms-access

i have allocated 1GB ram in debug configuration. please suggest.

+2  A: 

My guess is that one or both of TableA.SIM1 and TableB.SIM2 aren't indexed. Either that or they're different data types (eg VARCHAR and NUMERIC). Try:

CREATE INDEX index_name1 ON TableA (SIM1);
CREATE INDEX index_name2 ON TableB (SIM2);

Without indexes that query will be really slow. One table will be accessed record by record, which is fine since you're outputting the whole table. To find the corresponding record in the other table it needs to look up based on the SIM1 = SIM2 relationship.

To find records in the other table without an index it has to look through every record. This is a linear or O(n) lookup. Put half a million records in each table and that's an awful lot of comparisons required to find all the matches (billions in facts).

With the indexes the record matching is near-instant.

Think of it this way: indexing the columns is like putting a telephone book in alphabetical order. That makes it easy to find surnames. If the telephone book wasn't sorted at all how long would it take you to find someone's phone number?

Now multiply that by half a million.

cletus
i dont understand what is an indexed table ?
silverkid
You don't want to join every row in one table with every row in another table (Cartesian join) and then do your matching for rows that match. Thus, if you create an index on the table that can used to sort/join or provide row-level identification of the rows that need to be matched vs. doing the full set.
Xepoch
create index index_name on TableA (SIM1) is the correct syntax
silverkid
A: 

Do you have indexes setup on TableA.SIM1 and TableB.SIM2?

Francis Upton
A: 

When you are performing inner join between two tables containing 10000 rows each. It has to go through 10000*10000 rows (if the columns aren't indexed). If you want them to be fast, you have to index TableA.SIM1 and TableB.SIM2. This will bring down the query execution time.

To index use the following commands

create index on TableA (SIM1);
create index on TableB (SIM2);
Saeros