views:

962

answers:

6

I'm going to running thousands of queries into SQL and I need to prevent the duplication of field 'domain'. Never had to do this before and any help would be appreciated.

+9  A: 

You probably want to create a "UNIQUE" constraint on the field "Domain" - this constraint will raise an error if you create two rows that have the same domain in the database. For an explanation, see this tutorial in W3C school -

http://www.w3schools.com/sql/sql_unique.asp

If this doesn't solve your problem, please clarify the database you have chosen to use (MySql?).

NOTE: This constraint is completely separate from your choice of PHP as a programming language, it is a SQL database definition thing. A huge advantage of expressing this constraint in SQL is that you can trust the database to preserve the constraint even when people import / export data from the database, your application is buggy or another application shares the database.

Tom Leys
A: 

I'm not really sure I understood your question, but perhaps you are looking for SQL's "UNIQUE" constraint. If the query tries to insert a pre-existing value to a field, you (PHP) will be notified about this constraint breach.

Henrik Paul
A: 

There are a bunch of ways to approach this. You could set a unique constraint (like a primary key) on that column. This will cause the insert to fail if that domain has also been inserted. You could also insert all of the duplicate domains and just delete them later on. This will work well if not that many of the domains are duplicated. There are a few questions posted already on finding duplicate rows.

Dana the Sane
+1  A: 

If this is an absolute database integrity requirement (It's not likely to change, nor does existing data have this problem), I would enforce it at the database with a unique constraint.

As far as detecting it before or after the attempt in order to notify the user, there are a number of techniques which could be used.

Cade Roux
A: 

This can be doen with sql, rather than with php.

i am assuming that you are using MySQl, but the same principles will work with different databases.

make the Domain column the primary key. (makes sense, as it has to unique.)

Rather than using INSERT, use UPDATE.

if the primary key already exists (that you are trying to put into the table), update will update the existing tuple, rather than creating a new tuple.

so you will overwrite existing data if it is different, and if it is identical the update will be skipped.

Bingy
A: 

Where is the data coming from? Is this something you only want to run once, or a couple of times, or often? If the domain-value already exists, do you just want to skip the insert or do something else (ie increment a counter)?

Depending on your answers, there are many possible solutions:

  1. Pre-sort your data, eliminate duplicates, then insert (assumes relatively static data, empty table to begin with)

  2. Use an associative array in PHP as a local domain-value cache (if table already contains data, start by reading existing content; not thread-safe, but works if it only runs once at a time)

  3. Make domain a UNIQUE column and write wrapper code to handle return errors

  4. Make domain a UNIQUE or PRIMARY KEY column and use an ON DUPLICATE KEY clause: INSERT INTO mydata ( domain, count ) VALUES ( 'firstdomain', 1 ), ( 'seconddomain', 1 ), ( 'thirddomain', 1 ) ON DUPLICATE KEY UPDATE count = count+1

  5. Insert all data into the table, then remove duplicates

Note that batching inserts (ie using multiple value clauses per statement) can be significantly faster.

Hugh Bothwell