views:

73

answers:

1

I've been trying to perform a join on two tables in MySQL, and the query will run for a minute or two before I run out of memory without getting results. I'm far from a database expert, so I'm not sure if I'm writing my queries poorly, if I have some MySQL settings poorly configured, or if I really should be doing something else entirely with my query. FYI the database is located locally on my machine.

I have a large table (~2 million records) where one of the columns is an ID into a small table (~3000 records). In case this matters, the ID is not unique in the large table but is unique in the small table. I've tried various flavors of the following query, but nothing seems to be working:

SELECT big_table.*, 
       small_table.col 
  FROM big_table 
left outer join small_table on (big_table.small_id = small_table.id)

I'm doing a lot of analysis on the data that does require all 2 million rows, though not necessarily in a single query. Here are the results of my "show create table":

'big_table', 'CREATE TABLE 'big_table' (
  'BIG_ID_1', varchar(12) NOT NULL,
  'BIG_ID_2', int(100) NOT NULL,
  'SMALL_ID' varchar(8) DEFAULT NULL,
  'TYPICAL_OTHER_COLUMN' varchar(3) DEFAULT NULL,
  ...
  PRIMARY KEY ('BIG_ID_1', 'BIG_ID_2')
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1'

'small_table', 'CREATE TABLE `small_table` (
  `id`, varchar(8) NOT NULL DEFAULT '''',
  `col`, varchar(1) DEFAULT NULL,
  ...
  PRIMARY KEY (`id`),
  KEY `inx_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

Here is the "explain" result for one of my candidate queries:

id  select_type  table        type    possible_keys   key      key_len  ref                         rows     extra
1   SIMPLE       big_table    ALL     NULL            NULL     NULL     NULL                        1962193       
1   SIMPLE       small_table  eq_ref  PRIMARY,inx_id  PRIMARY  10       db_name.big_table.SMALL_ID  1             
+2  A: 

You are selecting about 2 million records in a single query. Depending on the amount of data in each row it could be hundreds of megabytes of data that you are requesting.

Things you might want to try:

  • If you don't need all columns then query for the columns you need instead of using SELECT table.*.
  • See if you can move some (or all) of the processing to the database instead of fetching the data and processing it in the client.
  • Avoid reading the entire result set into memory in one go.
  • Process the rows in batches of a few thousand at a time rather than fetching all of them at once.
Mark Byers
Yes of course restricting the result set with "WHERE" or "LIMIT" will speed it up, but then the query is not doing what I want. I really do want to process all 2 million records, so how would this help me? Or are you proposing that I turn this query into many small queries with "LIMIT 0, 10000", "LIMIT 100001, 200000", etc? In my naivety I would have assumed that the latter shouldn't be necessary or helpful...
Michael McGowan
@Michal McGowan: What sort of processing? Have you considered if it is possible to process the data in the database instead of bringing it onto the client for processing? I also think it would be a good idea to post the Java code you are using to execute the query and read the results. You may have a problem there. If possible you want to process one row at a time rather than reading the entire result set in one go. Splitting it into batches may also be a good idea but using a LIMIT with an offset is not the optimal way to do this as this will get slower and slower as the offset increases.
Mark Byers
I think processing it in the database will accomplish much of what I wanted; thanks. For some reason I foolishly didn't realize the select would put the whole thing in memory.
Michael McGowan
@Michael McGowan: I think whether or not the entire result set is read into memory depends on what API you are using and how you are using it. I was asking you to post your code, so that I could see if you are doing it correctly. But if you can do the operation in the database then I think that's an even better solution.
Mark Byers
@Mark Byers: Thanks for the help. I believe I made a java.sql.Statement object and called the executeQuery method. Even though in this instance I believe I can indeed process within the database, it would be nice to be able to iterate over each record in a recordset.
Michael McGowan