I've got a table of URL's and I don't want any duplicate URL's. How do I check to see if a given URL is already in the table using PHP/MySQL?
Use SELECT statement or put in a unique index and let it fail on INSERT.
You could do this query:
SELECT url FROM urls WHERE url = 'http://asdf.com' LIMIT 1
Then check if mysql_num_rows() == 1 to see if it exists.
i don't know the syntax for MySQL, but all you need to do is wrap your INSERT with IF statement that will query the table and see if the record with given url EXISTS, if it exists - don't insert a new record.
if MSSQL you can do this:
IF NOT EXISTS (SELECT 1 FROM YOURTABLE WHERE URL = 'URL')
INSERT INTO YOURTABLE (...) VALUES (...)
If you don't want to have duplicates you can do following:
- add uniqueness constraint
- use "REPLACE" or "INSERT ... ON DUPLICATE KEY UPDATE" syntax
If multiple users can insert data to DB, method suggested by @Jeremy Ruten, can lead to an error: after you performed a check someone can insert similar data to the table.
If you just want a yes or no answer this syntax should give you the best performance.
select if(exists (select url from urls where url = 'http://asdf.com'), 1, 0) from dual
If you just want to make sure there are no duplicates then add an unique index to the url field, that way there is no need to explicitly check if the url exists, just insert as normal, and if it is already there then the insert will fail with a duplicate key error.
To guarantee uniqueness you need to add a unique constraint. Assuming your table name is "urls" and the column name is "url", you can add the unique constraint with this alter table command:
alter table urls add constraint unique_url unique (url);
The alter table will probably fail (who really knows with MySQL) if you've already got duplicate urls in your table already.
Are you concerned purely about URLs that are the exact same string .. if so there is a lot of good advice in other answers. Or do you also have to worry about canonization?
For example: http://google.com and http://go%4fgle.com are the exact same URL, but would be allowed as duplicates by any of the database only techniques. If this is an issue you should preprocess the URLs to resolve and character escape sequences.
Depending where the URLs are coming from you will also have to worry about parameters and whether they are significant in your application.
If you want to insert urls into the table, but only those that don't exist already you can add a UNIQUE contraint on the column and in your INSERT query add IGNORE so that you don't get an error.
Example: INSERT IGNORE INTO urls
SET url = 'url-to-insert'
The answer depends on whether you want to know when an attempt is made to enter a record with a duplicate field. If you don't care then use the "INSERT... ON DUPLICATE KEY" syntax as this will make your attempt quietly succeed without creating a duplicate.
If on the other hand you want to know when such an event happens and prevent it, then you should use a unique key constraint which will cause the attempted insert/update to fail with a meaningful error.