tags:

views:

62

answers:

4

Dear all,

I know there a similar threads around, but this is really the first time I realize that query speed might affect me - so it´s not that easy for me to really make the transfer from other folks problems.

That being said I have using the following query successfully with smaller data, but if I use it on what are mildly large tables (about 120,000 records). I am waiting for hours.

  INSERT INTO anothertable
  (id,someint1,someint1,somevarchar1,somevarchar1)
  SELECT DISTINCT md.id,md.someint1,md.someint1,md.somevarchar1,pd.somevarchar1
  FROM table1 AS md
  JOIN table2 AS pd
  ON (md.id = pd.id);

Tables 1 and 2 contain about 120,000 records. The query has been running for almost 2 hours right now. Is this normal? Do I just have to wait. I really have no idea, but I am pretty sure that one could do it better since it´s my very first try.

I read about indexing, but dont know yet what to index in my case?

Thanks for any suggestions - feel free to point my to the very beginners guides !

+1  A: 

Index the things you are joining on. In this case, create indexes on table1.id and table2.id. You should probably also have a foreign key from one table to the other, though without meaningful names, it is difficult to advise on the direction.

Marcelo Cantos
+1  A: 

Assuming id is an auto-incremental PK, the DISTINCT is useless, since each row would be unique. In that case, removing it should also boost the performance, as SELECT DISTINCT can be quite slow.

And as previously mentioned, make sure the id field has index on both tables (which it does if it's PK).

reko_t
sorry for not mentioning that fact. Id is not an auto_increment PK id,the tables do have some auto_increment id PK but it´s not used here because i cant match the data using these two.
ran2
A: 

The only think you could index, that maybe get you some speed are the keys of the joins(md.id and pd.id). As they are most likely primary keys, they should be indexed already. Maybe a clustered index will bring something.

Is the DISTINCT really necessary? It just removes duplicates, and this can only be possible, if there are duplicate entrys in your source tables. I think DISTINCT is the biggest problem here.

Marks
thx. DISTINCT is necessary here, because what I do is aggregate data. table1 contains observations from different points in time while table2 contains someobservation metadata. "anothertable" is a table that aggregates information, so every individual is only needed once, which is why I use DISTINCT.
ran2
A: 

Thx everybody for helping out a rookie! Indexing like it was suggested above did speed it up quite a bit. It was 17 minutes before... now its less than 4.

ran2