tags:

views:

1202

answers:

7

Say I have an array of strings in a php array called $foo with a few hundred entries, and I have a MySQL table 'people' that has a field named 'name' with a few thousand entries. What is an efficient way to find out which strings in $foo aren't a 'name' in an entry in 'people' without submitting a query for every string in $foo?

So I want to find out what strings in $foo have not already been entered in 'people.'

Note that it is clear that all of the data will have to be on one box at one point. The goal would be doing this at the same time minimizing the number of queries and the amount of php processing.

+1  A: 

I'd put your $foo data in another table and do a LEFT OUTER JOIN with your names table. Otherwise, there aren't a lot of great ways to do this that don't involve iteration at some point.

UltimateBrent
how would you use the LEFT JOIN to return the $foo date *not* in the names table?
Steven Noble
Sorry, LEFT OUTER JOINhttp://dev.mysql.com/doc/refman/5.0/en/join.html
UltimateBrent
A: 

I'm not sure there is a more efficient way to do this other than to submit all the strings to the database.

Basically there are two options: get a list of all the strings in MySQL and pull them into PHP and do the comparisons, or send the list of all the strings to the MySQL server and let it do the comparisons. MySQL is going to do the comparisons much faster than PHP, unless the list in the database is a great deal smaller than the list in PHP.

You can either create a temporary table, but either way your pushing all the data to the database.

acrosman
Can you give an example of what the long select statement might look like?
Steven Noble
On further reflection, a long select probably will not work, you'll need the temp table idea accepted as the right answer.
acrosman
A: 

For a few hundred entries, just use array_diff() or array_diff_assoc()

Andrew
A: 

The best I can come up with without using a temporary table is:

 $list = join(",", $foo);

// fetch all rows of the result of 
// "SELECT name FROM people WHERE name IN($list)" 
// into an array $result

$missing_names = array_diff($foo, $result);

Note that if $foo contains user input it would have to be escaped first.

jakber
Ack! No placeholders! Not quoted! No escapes!
bart
Well, commented lines are obviously pseudo-code. And I mentioned the lack of escapes, did I not? Not quoting was an omission though, my bad.
jakber
A: 
$query = 'SELECT name FROM table WHERE name != '.implode(' OR name != '. $foo);

Yeash, that doesn't look like it would scale well at all.

cole
That should be "AND", not "OR".
Lucas Oman
A: 
CREATE TEMPORARY TABLE PhpArray (name varchar(50));

-- you can probably do this more efficiently
INSERT INTO PhpArray VALUES ($foo[0]), ($foo[1]), ...;

SELECT People.*
FROM People
 LEFT OUTER JOIN PhpArray USING (name)
WHERE PhpArray.name IS NULL;
Bill Karwin
+1  A: 

What about the following:

  1. Get the list of names that are already in the db, using something like: SELECT name FROM people WHERE name IN (imploded list of names)
  2. Insert each item from the return of array_diff()

If you want to do it completely in SQL:

  1. Create a temp table with every name in the PHP array.
  2. Perform a query to populate a second temp table that will only include the new names.
  3. Do an INSERT ... SELECT from the second temp table into the people table.

Neither will be terribly fast, although the second option might be slightly faster.

Darryl Hein