views:

2133

answers:

2

I have a table in my database, user_answers that stores users answers to a series of questions, with rows; user_id, question_id, answer_id and text_entry. Question text and answer text (if any) are stored in lookup tables. There are three types of questions, single-answer questions, multiple-answer questions and text-entry answer questions. So a single user might have entries like the following in the user_answers table:

user_id    question_id    answer_id    text_entry
-------    -----------    ---------    ----------
  123          100          1010         (null)
  123          200          2010         (null)
  123          200          2030         (null)
  123          300          3000       "code 789"

Lets say the questions_text table has:

question_id         text
-----------    -------------
    100           "Gender"
    200         "Interests"
    300         "Your code"

and the answers_text table has:

answer_id       text
---------    -----------
   1010       "Female"
   1020        "Male"
   2010       "Sports"
   2020      "Computers"
   2030       "Movies"
   3000        (null)

I want to extract the data into a csv with one line per user_id showing the answers, something like this:

User,Gender,Sports,Computers,Movies,Code
123,Female,1,0,1,code 789

I know how to generate the CSV file via SQLPlus (I only have access to the DB via SQLPlus for reasons beyond my control...) but I don't know how to generate the PL/SQL statement.

In PL/SQL I know I can generate a pivot of the Gender question by doing

SELECT
   user_id || ',' ||
   MIN(DECODE(question_id, '100', (SELECT text FROM answers_text where answer_id = answer_text.answer_id)))
FROM user_answers
GROUP BY user_id
ORDER BY user_id
;

(I'm not an SQL guy, so this is copied off the internets!)

This code is (at least as far as my testing is telling me) good for the single-answer questions but will not work on multiple-answer or text-entry type questions.

I saw some stuff online about using the case statement in PL/SQL like so:

MIN(CASE WHEN question_id = '200' AND answer_id = '2010' THEN '1' ELSE '0' END)

...but I can't figure out how to get the answers into columns. And all the SO questions I can find that might be related are sql-server specific.

Is there a way to generate my desired output from a single PL/SQL statement? Preferably written in a way that does not depend on the data in the tables as we have a number of databases that this might need to be run on.

+1  A: 

To accomplish what you are looking for (and not be specific to this data) I believe you are going to need some extra fields in your tables. For example, you will need to know which questions are Single-Answer, Multi-Answer, and Text-Entry without having to look at the data. You will also need to know which answers are possible for your Multi-answer questions without having to link through the data. From there, you can loop through the meta information about each question / answer combination and build yourself a query to run that will return the data in your desired format. Something like:

/* Create Tables with Data - Note 2 new columns added to questions_text */
create table user_answers
as
 select 123 user_id, 100 question_id, 1010 answer_id, null text_entry from dual
 union all
 select 123 user_id, 200 question_id, 2010 answer_id, null text_entry from dual
 union all
 select 123 user_id, 200 question_id, 2030 answer_id, null text_entry from dual
 union all
 select 123 user_id, 300 question_id, 3000 answer_id, 'code 789' text_entry from dual;

create table questions_text
as
 select 100 question_id, 'Gender' question_text, 'S' question_type, 1000 answer_set_id from dual
 union all
 select 200 question_id, 'Interests' question_text, 'M' question_type, 2000 answer_set_id from dual
 union all
 select 300 question_id, 'Your code' question_text, 'T' question_type, 3000 answer_set_id from dual;

create table answers_text
as
 select 1010 answer_id, 'Female' text, 1000 answer_set_id from dual
 union all
 select 1020 answer_id, 'Male' text, 1000 answer_set_id from dual
 union all
 select 2010 answer_id, 'Sports' text, 2000 answer_set_id from dual
 union all
 select 2020 answer_id, 'Computers' text, 2000 answer_set_id from dual
 union all
 select 2030 answer_id, 'Movies' text, 2000 answer_set_id from dual
 union all
 select 3000 answer_id, null text, 3000 answer_set_id from dual;


/* PL/SQL for creating SQL statement to return data in desired format */
declare
 v_sql VARCHAR2(32767);
begin
 v_sql := 'select ua.user_id "User",';
 FOR question IN (
  select question_id, question_text, question_type, answer_set_id
  from questions_text
 )
 LOOP
  IF question.question_type = 'M'
  THEN
   FOR answer IN (
    select answer_id, text
    from answers_text
    where answer_set_id = question.answer_set_id
   )
   LOOP
    v_sql := v_sql||chr(10)||'max(case when ua.question_id = '||question.question_id||' and ua.answer_id = '||answer.answer_id||' then 1 else 0 end) "'||answer.text||'",';
   END LOOP;
  ELSIF question.question_type = 'S'
  THEN
   v_sql := v_sql||chr(10)||'min(case when ua.question_id = '||question.question_id||' then at.text end) "'||question.question_text||'",';
  ELSIF question.question_type = 'T'
  THEN
   v_sql := v_sql||chr(10)||'min(case when ua.question_id = '||question.question_id||' then ua.text_entry end) "'||question.question_text||'",';
  END IF;
 END LOOP;
 v_sql := rtrim(v_sql,',');
 v_sql := v_sql||' from
 user_answers ua
 inner join questions_text qt
  on qt.question_id = ua.question_id
 inner join answers_text at
  on at.answer_id = ua.answer_id
 group by
  ua.user_id';
 -- replace dbms_output with code to write file
 dbms_output.put_line(v_sql);
END;
Craig
Thanks cmartin2. I ended up using something more dependent on the specific code for this case. Long term I plan to use something based on your answer -- the additional information you have is in the questions table or can be constructed (in the case of the answer_id_set which is the question_id + 10x0 where x increments for each question, i.e. 1010, 1020 etc.)
beggs
+1  A: 

Queries with an unknown number of columns are problematic at best. Will you really not know what the data will look like? You might want to look at this Ask Tom response for a package which might help get you the results you need.

Adam Hawkes