views:

55

answers:

4

I want to add only unique records into a database table. Is there any way to do this without having to do a select to see if the current record already exists, because that will get very time consuming as the table grows. I am open to all types of suggestions (if they exist).

Also as a possible alternative, perhaps there are some indexing options that make these selects faster, or perhaps the data can be ordered in such a way as to make the select statement execute faster.

I am using MySQL and Java.

+1  A: 

Somebody will have to perform the check, unless you can deduce from the data if it has been stored before or not, if that's possible depends on the data and use case.

Given that somebody will have to do the check anyway, why not let the database to check it for you? It will check its unique index (you DO have a unique index and have it enforced, right?) and return an error if the record already exists.

IOW, just try to insert and catch any resulting error, if the error is a duplicate key error, skip the record.

Vinko Vrsalovic
I expressed myself incorrectly, there is of course a unique index, but the record is not a record but a set of data which is equivalent to a record. But I guess I will do the select.
Ankur
In what way a set of data equivalent to a record and a record differ?
Vinko Vrsalovic
The data that is equivalent has the same fields but not the unique id - so it's not 100% the same but from a users point of view it has the equivalent data.
Ankur
A table can have more than one unique index and a unique index can be built using more than one column. Therefore, you can create another unique index using those fields that logically make two records equal. As stated above, the check has to be done somewhere - let the DB handle it.
dairemac
+3  A: 

The easiest way is to have the database enforce uniqueness (a unique key set for an index will most likely do fine) so any duplicates are rejected. Your code then needs to ignore the rejection messages.

Thorbjørn Ravn Andersen
+1  A: 

You might load the data to a tmp table, than from this table you can load all results where the tmp.id != id, and after that truncate the tmp.

If the transaction of insert is not so important you may create a unique constraint.

For faster access just create a private key this will create a clustered index for your table. and the access time will be really fast.

Vash
Thanks will look into these ideas
Ankur
+1  A: 

There are two possibilities.

  1. Assume it's not a duplicate, so perform an INSERT and cope with the err by doing an UPDATE.
  2. Assume it is a duplicate, so perform an UPDATE and cope with the error by doing an INSERT.

Which is better depends on the relative probabilities.

EJP
That's a good solution.
Ankur