views:

97

answers:

4

Hi, how do I get the fields of a column that's without a match in another column?

I tried:

SELECT table1.page_title 
FROM table1, table2 
WHERE table1.page_title != table2.page_title

It produces a lot of duplicate fields so I did a:

SELECT DISTINCT table1.page_title 
FROM table1, table2 
WHERE table1.page_title != table2.page_title

but it just hangs.

Any help would be greatly appreciated, thanks!

P.S. I'm doing this so I could create an exclude list for mediawiki's MWDumper tool. I need it so that when I import the outputted sql, my current wiki entries will not be overwritten.

EDIT: Yes they're 2 different tables. Each has about 70,000+ records Also why are my queries slow? I'd appreciate it if someone could clarify so I could learn why :) Thanks again!

A: 

You could try a self-join, which I have used in the past, but I am not sure if that would be any faster, as I don't use MySQL. This page might give you some insight: http://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/

rpkelly
A: 

Are a and b different tables, both having a "page_title" column?

If so, try this:

SELECT DISTINCT page_title FROM a
WHERE page_title NOT IN (SELECT page_title FROM b)

If all you're interested in is removing duplicates (if you only have one table), then there are several ways to do it, two of which are:

SELECT DISTINCT page_title FROM a

or

SELECT page_title FROM a
GROUP BY page_title

The GROUP BY option is stronger albeit slower - you can add a HAVING clause to choose only those titles that appear e.g. more than twice:

SELECT page_title FROM a
GROUP BY page_title
HAVING COUNT(page_title) > 2

Hope that helps

(Thanks Aaron F for the comment)

Roee Adler
your first suggestion worked Rax, thanks! :)
alimango
Accepted answer with no up-votes, that's sad...
Roee Adler
A: 

Slight improvement on Rax's answer:

SELECT DISTINCT a.page_title FROM a
WHERE a.page_title NOT IN (SELECT DISTINCT b.page_title FROM b)

Do your tables have an index on their page_title column? What does the explain plan say for your queries?

I can't imagine you needing an index anyway, given only 70k rows in your table.

Aaron F.
Thanks, the first query must obviously have DISTINCT
Roee Adler
A: 

You could do this with a join:

SELECT DISTINCT table1.page_title 
FROM table1
LEFT JOIN table2 
    ON table1.page_title = table2.page_title
WHERE table2.page_title is null

If it's slow, add an index on (table2.page_title)

Andomar