tags:

views:

169

answers:

6

I have a large list of email addresses, and I need to determine which of them are not already in my database. Can I construct a query that will tell me this information?

I am using SQL Server 2000, if it requires non-standard extensions

+4  A: 

For a huge list, I would recommend loading that list into a second table (e.g., TEMP_EMAIL_ADDRESS), then use:

SELECT
  EMAIL
FROM
  TEMP_EMAIL_ADDRESS
WHERE
  EMAIL NOT IN (SELECT EMAIL FROM EMAIL_ADDRESS)

Data Transformation

If your data is in a text file named emails.txt (one row per line), you can create the insert statements using the following DOS command:

FOR /F %i IN (emails.txt) DO echo INSERT INTO TEMP_EMAIL_ADDRESS (EMAIL) VALUES ('%i') >> insert-email.sql

That command will create a new file called insert-email.sql in the current directory, containing all the inserts you need to inject the existing list of e-mail addresses into the database.

Dave Jarvis
there should be bold around the "large list" part.
TheTXI
I think this is backwards. It would tell me which addresses _in the database_ are not _in my list_. But I want to know which addresses _in my list_ are not _in the database_
Gabe Moothart
It was backwards. Fortunately the fix is easy: reverse TEMP_EMAIL_ADDRESS with EMAIL_ADDRESS.
Dave Jarvis
+2  A: 

Why don't you import the email addresses as a table and just check that way against whatever table you already have?

Other than that, you could pass in the list of email addresses as an XML datatype or a table datatype and query against that.

TheTXI
+2  A: 

make a temporary table, load the e-mailaddresses into the temporary table and then do a NOT IN query such as

SELECT emailaddress FROM temp_table WHERE emailaddress NOT IN (SELECT emailaddress FROM table)

you could extend that with an INSERT INTO

jao
+1  A: 

Create a table "newaddresses" that has all of your query email addresses.

Do something like this:

(off the top of my head, syntax may not be 100% correct; hopefully gives you the right idea)

select n.email
from newaddresses n
left join alreadysubscribed a
    on n.email = a.email
where a.email is null

This joins the two tables on the email addresses, and then gives you a list of only the entries where the join failed.

Aric TenEyck
I'd vote for this one, since it does it without a sub-query, and so will be much more efficient
rwmnau
A: 

Maybe you could create an index on emails:

CREATE INDEX email_index ON my_table(email)

and then simply check with

SELECT email FROM my_table WHERE my_table.email = some_email
dpetek
A: 

Hmm... Nobody mentioned IF EXISTS. That would have been my first thought:

IF NOT EXISTS (SELECT * FROM MyTable where MyTable.email = "[email protected]

Although, perhaps the performance is no different from NOT IN.

Hope this helps.

EoRaptor013