views:

255

answers:

2

This one seems to be a simple problem, but I can't make it work in a single select or nested select. Retrieve the authors and (if any) advisers of a paper (article) into one row.

I order to explain the problem, here are the two data tables (pseudo)

papers (id, title, c_year)
persons (id, firstname, lastname)

plus a link table w/one extra attribute (pseudo):

paper_person_roles(
  paper_id
  person_id
  act_role ENUM ('AUTHOR', 'ADVISER')
)

This is basically a list of written papers (table: papers) and a list of staff and/or students (table: persons)

An article my have (1,N) authors.
An article may have (0,N) advisers.
A person can be in 'AUTHOR' or 'ADVISER' role (but not at the same time).

The application eventually puts out table rows containing the following entries:

TH: || Paper_ID  |  Author(s)          |   Title                 |   Adviser(s)  |
TD: ||   21334   |John Doe, Jeff Tucker|Why the moon looks yellow|Brown, Rayleigh|
...

My first approach was like:
select/extract a full list of articles into the application, eg.

SELECT 
   q.id, q.title
FROM 
   papers AS q
ORDER BY 
   q.c_year
and save the results of the query into an array (in the application). After this step, loop over the array of the returned information and retrieve authors and advisers (if any), via prepared statement (? is the paper's id) from the link table like:
APPLICATION_LOOP(paper_ids in array)
  SELECT 
      p.lastname, p.firstname, r.act_role 
  FROM 
      persons AS p, paper_person_roles AS r
   WHERE 
      p.id=r.person_id AND r.paper_id = ?
   # The application does further processing from here (pseudo):
   foreach record from resulting records
     if  record.act_role eq 'AUTHOR' then join to author_column
     if  record.act_role eq 'ADVISER' then join to avdiser_column
   end
   print id, author_column, title, adviser_column
APPLICATION_LOOP
This works so far and gives the desired output. Would it make sense to put the computation back into the DB?

I'm not very proficient in nontrivial SQL and can't find a solution with a single (combined or nested) select call. I tried sth. like

SELECT
    q.title 
    (CONCAT_WS(' ',
     (SELECT p.firstname, p.lastname AS aunames
      FROM persons AS p, paper_person_roles AS r
      WHERE q.id=r.paper_id AND r.act_role='AUTHOR')
     )
    ) AS aulist
FROM 
    papers AS q, persons AS p, paper_person_roles AS r
in several variations, but no luck ...

Maybe there is some chance?

Thanks in advance

r.b.

+2  A: 

In my experiences, SQL databases aren't very good at aggregating tabular data like that into a single row of condensed data. Basically I think the approach you are using is fine, however the other alternative that jumps out to me is just to join the persons table such that you'll get one row back for each person that has a role for a given paper.

Something like:

    SELECT q.id, q.title, p.firstName, p.lastName, r.act_role FROM papers q, persons p, 
        paper_person_roles r where r.paper_id = q.id and r.person_id = p.id

Which for the given example you showed above would get you data like the following:

21334   |Why the moon looks yellow|John Doe   |AUTHOR
21334   |Why the moon looks yellow|Jeff Tucker|AUTHOR
21334   |Why the moon looks yellow|Brown      |ADVISER
21334   |Why the moon looks yellow|Rayleigh   |ADVISER

and which is easy enough to parse into the end result you're looking for.

With these sorts of things, it's all about tradeoffs:
- Are you spending too much time going back to the database over and over?
- Is there too much data that you can't join it all at once?
- Does your "optimization" end up making your code too difficult to read?

Frankly, if your code is working the way you want and you haven't hit performance snags yet, then keep it the way it is and come back to this decision on the day when you start seeing performance degrade as your dataset scales up.

BryanD
Bryan, your boldfaced questions are valid and I'll have to think about that. The simple 'all tables' select w/row postprocessing (as you suggested) should be acceptable in this case too (not too many rows in the db). I'd only include a clever multiple join sequence (as Peter proposed) into production code if I'm able to understand it thoroughly at some point in the future ;-)Thanks
rubber boots
+2  A: 

The following query worked with my test-data, please give it a try.

The two sub-queries are necessary to get the list of authors/advisers per paper.

Select
  p.id,
  p.title,
  p_aut.aut_name,
  p_adv.adv_name
From papers p
Left Join (
  Select pp_aut.paper_id,
         Group_Concat(Concat(p_aut.firstname, ' ', p_aut.lastname)) aut_name
  From paper_person_roles pp_aut
  Join persons p_aut On (p_aut.id = pp_aut.person_id)
  Where pp_aut.act_role='AUTHOR'
  Group By pp_aut.paper_id
) p_aut On ( p_aut.paper_id = p.id )
Left Join (
  Select pp_adv.paper_id,
         Group_Concat(Concat(p_adv.firstname, ' ', p_adv.lastname)) adv_name
  From paper_person_roles pp_adv
  Join persons p_adv On (p_adv.id = pp_adv.person_id)
  Where pp_adv.act_role='ADVISER'
  Group By pp_adv.paper_id
) p_adv On ( p_adv.paper_id = p.id )
Group By p.id, p.title
Peter Lang
Peter, your answer is incredible. This one does *really* work, I only had to remove the inner CONCAT(..) from the adviser join to get exactly the desired output (last name of advisers only). The database is not very large (until now), so the response time across these multiple joins is below 1/10'th of a second.I'll try to figure how to insert a blank space if no advisers returned (NULL field returned).Thank you very much!
rubber boots
You're welcome :) To insert a blank instead of `NULL`, you can use `COALESCE(p_aut.aut_name, ' ')`
Peter Lang
rubber boots