tags:

views:

99

answers:

4

It seems pretty simple i have a table 'question' which stores a list of all questions and a many to many table which sits between 'question' and 'user' called 'question_answer'.

Is it possible to do one query to get back all questions within questions table and the ones a user has answered with the un answered questions being NULL values

question:

| id | question |

question_answer:

| id | question_id | answer | user_id |

I am doing this query, but the condition is enforcing that only the questions answered are returned. Will i need to resort to nested select?

SELECT * FROM `question` LEFT JOIN `question_answer`
ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id
A: 

Shouldn't you use RIGHT JOIN?

SELECT * FROM question_answer RIGHT JOIN question ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id
Petr Kozelek
+1  A: 

if user_id is in the outer joined to table then your predicate user_id = 14583461 will result in not returning any rows where user_id is null i.e. the rows with unanswered questions. You need to say "user_id = 14583461 or user_id is null"

Lord Peter
yeah that works only if every question has a row in question_answer table.
bertsisterwanda
A: 

something like this might help (http://pastie.org/1114844)

drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) not null
)engine=innodb;

drop table if exists question;
create table question
(
question_id int unsigned not null auto_increment primary key,
ques varchar(255) not null
)engine=innodb;

drop table if exists question_ans;
create table question_ans
(
user_id int unsigned not null,
question_id int unsigned not null,
ans varchar(255) not null,
primary key (user_id, question_id)
)engine=innodb;

insert into users (username) values 
('user1'),('user2'),('user3'),('user4');

insert into question (ques) values 
('question1 ?'),('question2 ?'),('question3 ?');

insert into question_ans (user_id,question_id,ans) values
(1,1,'foo'), (1,2,'mysql'), (1,3,'php'),
(2,1,'bar'), (2,2,'oracle'),
(3,1,'foobar');

select
 u.*,
 q.*,
 a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
order by
 u.user_id,
 q.question_id;

select
 u.*,
 q.*,
 a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
 u.user_id = 2
order by
 q.question_id;

edit: added some stats/explain plan & runtime:

runtime: 0.031 (10,000 users, 1000 questions, 3.5 million answers)

select count(*) from users
count(*)
========
10000 

select count(*) from question
count(*)
========
1000 

select count(*) from question_ans
count(*)
========
3682482 

explain
select
 u.*,
 q.*,
 a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where 
 u.user_id = 256
order by
 u.user_id,
 q.question_id;


id  select_type table   type    possible_keys   key         key_len ref                         rows    Extra
==  =========== =====   ====    =============   ===         ======= ===                         ====    =====
1   SIMPLE          u   const   PRIMARY         PRIMARY         4   const                       1       Using filesort
1   SIMPLE          q   ALL                                                                     687 
1   SIMPLE          a   eq_ref  PRIMARY         PRIMARY         8   const,foo_db.q.question_id  1   
f00
A cross join is not required here and will be expensive on even modest datasets.
ar
i doubt it will be that expensive considering they are specifying a single user. 1 user * n questions * n ans is kinda trivial.
f00
My apologies, here it's effectively an inner join as the join condition includes a predicate on the question table.
ar
explain the group by to me and have you tried your query on more than one user_id ? (i'ts nonsense) http://paste.pocoo.org/show/254375/
f00
@f00, the group by was left in from the original query as provided. The cross join is not pretty to me and increases the chance of a nasty Cartesian join if someone inexperienced was to change the query later on.
ar
A: 

Move the user_id predicate into the join condition. This will then ensure that all rows from question are returned, but only rows from question_answer with the specified user ID and question ID.

SELECT * FROM question
    LEFT JOIN question_answer ON question_answer.question_id = question.id 
                              AND user_id = 14583461
ORDER BY user_id, question_id
ar
i'm sry but i can't believe you've accpeted this answer http://paste.pocoo.org/show/254375/
f00
I'm removed the `GROUP BY` as it is unnecessary (it was only there as it was in the original query). And added an order by clause.To get questions for more than one user, you can just change the join condition to use `user_id IN (...)`.
ar
@ar - it still doesnt work properly, please take a look at the following: http://i37.tinypic.com/24y8w2s.jpg
f00