tags:

views:

54

answers:

3

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.

The table structure is as follows:

table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)

Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like

select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null

causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.

TIA, No'am

+1  A: 

Something like that should do the trick:

SELECT name
FROM words
WHERE id NOT IN
   (SELECT DISTINCT wid   -- DISTINCT is actually redundant
    FROM choices 
    WHERE class == 1)
mjv
This works, but it's slow. There are 5103 rows in the 'choices' table with class = 1, and the query returns 24 rows in 36 seconds. At the moment I'm using two queries: the first returns all the words which are used in 'choices', then I build a string of these values and query the words table for words which aren't in this string. This takes a few seconds at most. I had hoped that a sql-pure solution would take less time that my multiple stage method.
No'am Newman
Such slowness is abnormal. A likely reason for this slowness is the `lack of a proper index` on the choices table. I suggest: "CREATE INDEX ClassWidIdx ON choices(class, wid)" (and, [unnecessary for this performance issue, but possibly of help in other cases] since the only other column in words is the short pid, you can also add pid, at the end of this index, hence making it a covering index, i.e. one that FireBird can use to return the pid as well without requiring the indirection back to the table itself).
mjv
I added the suggested index (wid, class), ran the query again - and got the answer back in 62 ms. What's that - 500 times faster? THANKS.
No'am Newman
A: 
SELECT words.name
FROM
    words
    LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL

Make sure you have an index on choices (class, wid).

Lukáš Lalinský
+1  A: 

Maybe this is a bit faster:

SELECT w.name
FROM words w
WHERE NOT EXISTS
   (SELECT 1
    FROM choices c 
    WHERE c.class = 1 and c.wid = w.id)
Douglas Tosi
This works twice as fast as mjv's answer - only 31 milliseconds (after having added the index on class,wid).
No'am Newman