tags:

views:

1396

answers:

2

I have these tables:

media table - id int primary key, uri varchar. media_to_people - media_id int primary key, people_id int primary key people - id int primary key, name varchar, role int -- role specifies whether the person is an artist, publisher, writer, actor, etc relative to the media and has range(1-10)

This is a many to many relation

I want to fetch a media and all its associated people in a select. So if a media has 10 people associated with it, all 10 must come.

Further more, if multiple people with the same role exist for a given media, they must come as comma separated values under a column for that role.

Result headings must look like: media.id, media.uri, people.name(actor), people.name(artist), people.name(publisher) and so on.

I'm using sqlite.

+4  A: 

SQLite doesn't have the "pivot" functionality you'd need for starters, and the "comma separated values" part is definitely a presentation issue that it would be absurd (and possibly unfeasible) to try to push into any database layer, whatever dialect of SQL may be involved -- it's definitely a part of the job you'd do in the client, e.g. a reporting facility or programming language.

Use SQL for data access, and leave presentation to other layers.

How you get your data is

SELECT media.id, media.uri, people.name, people.role
FROM media
JOIN media_to_people ON (media.id = media_to_people.media_id)
JOIN people ON (media_to_people.people_id = people.id)
WHERE media.id = ?
ORDER BY people.role, people.name

(the ? is one way to indicate a parameter in SQLite, to be bound to the specific media id you're looking for in ways that depend on your client); the data will come from the DB to your client code in several rows, and your client code can easily put them into the single column form that you want.

It's hard for us to say how to code the client-side part w/o knowing anything about the environment or language you're using as the client. But in Python for example:

def showit(dataset):
  by_role = collections.defaultdict(list)
  for mediaid, mediauri, name, role in dataset:
    by_role[role].append(name)
  headers = ['mediaid', 'mediauri']
  result = [mediaid, mediauri]
  for role in sorted(by_role):
    headers.append('people(%s)' % role)
    result.append(','.join(by_role[role]))
  return ' '.join(headers) + '\n' + ' '.join(result)

even this won't quite match your spec -- you ask for headers such as 'people(artist)' while you also specify that the role's encoded as an int, and mention no way to go from the int to the string 'artist', so it's obviously impossible to match your spec exactly... but it's as close as my ingenuity can get;-).

Alex Martelli
+1 for recommending post-processing of the rows in a client application language!
Bill Karwin
Hmm, so there is no way to fetch the people.name columns specifically for different roles?I was thinking something like this:SELECT media.id, media.uri, artists.name, publishers.name FROM media JOIN media_to_people ON media_to_people.media_id = media.id JOIN people AS artists, people AS publishers ON (artists.id = media_to_people.people_id AND artists.role = 2) OR (publishers.role = 8 AND publishers.id = media_to_people.people_id)But this doesn't work as intended...
jetru
Well the header name is not so important as getting it in a row. I want all the stuff done in one query.I'm writing C. You can imagine why I want to avoid post processing my results. ;)
jetru
@Alex: I'm so sorry I wrote my answer. My intention was to show the OP how inappropriate it would have to be to write a single SQL query to do the whole job. I assumed if I marked my answer as 'community wiki' jetru wouldn't be able to accept it.
Bill Karwin
@Bill, heh, NP -- your answer had excellent code and explanation, what mode can you do?-)
Alex Martelli
Hmm. any possibility of accepting two answers? :D I needed that query because I need to get a module working fast(because there are other parts of the module that rely on this statement) and that query does the job exactly. Besides the database is on the local system for every user. Perhaps later down the line, when I have loads of time, I will replace it with post processing the data. :)
jetru
+1  A: 

I agree with Alex Martelli's answer, that you should get the data in multiple rows and do some processing in your application.

If you try to do this with just joins, you need to join to the people table for each role type, and if there are multiple people in each role, your query will have Cartesian products between these roles.

So you need to do this with GROUP_CONCAT() and produce a scalar subquery in your select-list for each role:

SELECT m.id, m.uri, 
 (SELECT GROUP_CONCAT(name) 
  FROM media_to_people JOIN people ON (people_id = id) 
  WHERE media_id = m.id AND role = 1) AS Actors,
 (SELECT GROUP_CONCAT(name) 
  FROM media_to_people JOIN people ON (people_id = id) 
  WHERE media_id = m.id AND role = 2) AS Artists,
 (SELECT GROUP_CONCAT(name) 
  FROM media_to_people JOIN people ON (people_id = id) 
  WHERE media_id = m.id AND role = 3) AS Publishers
FROM media m;

This is truly ugly! Don't try this at home!

Take our advice, and don't try to format the pivot table using only SQL.

Bill Karwin
This works perfectly! This is exactly what I need! Does it hit performance hard?
jetru
Yes! It hits performance hard!! That's why I told you not to do it! In addition to it being ugly, unmaintainable code.
Bill Karwin