views:

492

answers:

5

I'm trying to model artists and songs and I have a problem where I have a Song_Performance can be performed by many artists (say a duet) so I have an Artist_Group to represent who the songs is performed by.

Well, I now have a many-to-many relationship between Artist and Artist_Group, where an Artist_Group is uniquely identified by the collection of artists in that group. I can create an intersection entity that represents an Artist's participation in an Artist_Group (Artist_Group_Participation?)

I'm having trouble coming up with how to come up with a primary key for the Artist_Group entity that preserves the fact that the same set of artists represents the same group, and lacking a primary key for the Artist_Group entity means I'm lacking a foreign key for the Artist_Group_Participation entity.

The book "Mastering Data Modeling" by John Carlis and Joseph Maguire mention this shape and refer it to as a "Many-Many Collection Entity" and state that it is very rare, but doesn't state how to resolve it since obviously a many-to-many relationship can't be stored directly in a RDBMS. How do I go about representing this?

Edit:

Looks like everyone is suggesting an intersection table, but that's not my issue here. I have that. My issue is enforcing the constraint that you cannot add an Artist_Group entry where the group of artists that it contains are the same as an existing group, ignoring order. I thought about having the ID for Artist_Group be a varchar that is the concatenation of the various artists that comprise it, which would solve the issue if order mattered, but having an Artist_Group for "Elton John and Billy Joel" doesn't prevent the addition of a group for "Billy Joel and Elton John".

+1  A: 

The primary key for both the Artist and Artist_Group would be an numeric, incremental ID. Then you'd have an Artist_Group_Participation table that has two columns: artist_id and group_id. These would be foreign keys that refer to the ID of their respective tables. Then to SELECT everything you'd use a JOIN.

EDIT: Sorry, I misunderstood your question. The only other way I can think of is add an "artists" column to your Artist_Group table that contains a serialized array (assuming you're using PHP, but other languages have equivalents) of the artists and their IDs. Then just add a UNIQUE constraint to the column.

musicfreak
But an incremental id for Artist_Group doesn't enforce and an Artist_Group is uniquely identified by the set of artists in that group. It would be possible to create a 2nd Artist_Group with the same members (order doesn't matter) which would just create another Artist_Group instance with a different ID, but same members. I understand how to break up a many-to-many relationship when each member of the relationship has an independent identifier. The issue here is that the identifier consists of the combination of 1 to n different artists.
Davy8
@musicfreak: You'd want to make sure the serialized array is sorted, or else you could get (10,20) and (20,10) which should be the same list. Also this suffers from the usual problems a serialized array has, e.g. how long a string do you need?
Bill Karwin
@Bill Karwin: Yes, that's true, I forgot to mention that. I don't think length is a problem, though. I mean, how many people can a band have? Seven at most? Depending on the language you use, you could probably fit it into a VARCHAR(255).
musicfreak
http://en.wikipedia.org/wiki/We_Are_the_World This recording involved at least 38 artists (some of which count as groups themselves).
Bill Karwin
@Bill Karwin: Regardless, 38 artists shouldn't take up 255 bytes. But point taken.
musicfreak
+1  A: 

You could make each artist's ID correspond to a bit in a bitfield. So if Elton John is ID 12 and Billy Joel is ID 123, then the "group" formed by a duet between Elton John and Billy Joel is Artist_Group ID 10633823966279326983230456482242760704 (i.e. it has the 12th and 123rd bit set).

You could enforce the relationship using the intersection table. For example, using a CHECK constraint in PostgreSQL:

CREATE TABLE Artist_Group_Participation (
  artist_id int not null,
  artist_group_id int not null,
  PRIMARY KEY (artist_id, artist_group_id),
  FOREIGN KEY (artist_id) REFERENCES Artists (artist_id),
  FOREIGN KEY (artist_group_id) REFERENCES Artist_Group (artist_group_id),
  CHECK (B'1'<<artist_id & artist_group_id <> 0)
);

Admittedly, this is a hack. It applies extra significance to the Artist_Group surrogate key, when surrogate keys are supposed to be unique but not contain information.

Also if you have thousands of artists, and new artists every day, things could get unwieldy because the length of the Artist_Group key's data type needs to grow larger all the time.

Bill Karwin
But what happens when you get 1,000 artists?
musicfreak
Could you explain this one a little more? The way I'm understanding it is that setting the 12th bit and 123 bit would be the equivalent of 2^12 + 2^123 which isn't the number you came up with (calc says it's > 10^37, which is larger than even bigint) so obviously I'm understanding it wrong.
Davy8
@musicfreak: Yes, I acknowledge that it could get unwieldy.
Bill Karwin
@Davy8: Right, I tried calculating the value in Perl but it overflowed. I've corrected the value after calculating it in bc.
Bill Karwin
And if I did interpret correctly but your calculation was wrong then while technically accurate it's technically impractical, since just the example above is over the storage limit of traditional hard drives.
Davy8
@Davy8: I agree it's impractical as a primary key, but storing a string of 1,000 bits is certainly within the capability of any storage medium.
Bill Karwin
@Bill, you're right, I must be out of mind. For whatever reason I was thinkingt 2^1000 bits not just 1000 bits
Davy8
+1  A: 

I guess I'm missing the point of the "Artist_Group" relation.

The data model in my mind is:

Artist: an individual person.

Song: The song itself.

Performance: A particular performance or arrangement of a song. Usually this would have one song, but you could provide an m:n linking table to accommodate a medley. Ideally, this would be a single real performance, i.e., there would be an associated date.

Recording: A particular fixed version of a performance (CD or whatever). Usually a Performance only has one Recording, but having a separate table would handle the Grateful Dead / multiple-bootleg scenario, as well as re-release albums, radio play vs. live vs. CD versions, etc.

Performance_Artists: A linking table from a particular performance to a list of performers. For each, you could also have an attribute that describes their role(s) in the performance (vocalist, drummer, etc.).

There's no explicit relationship between a set of performers, except that they share performances in common. Thus, any table that attempts to combine random sets of artists outside the context of a recording is not an accurate relational model, as there is no real relationship.

If you are trying to represent an explicit relationship between a set of artists (i.e., they are in the same band), well, bands have names that have uniqueness (though not enough to be a primary key), and a band could be stored simply as an Artist, and then have an Artist_Member linking table that is self-referencing back to the individual Artist records. Or you could have a separate Band table, and a Band_Members table to assign artists to it, perhaps with dates of membership. Either way, just remember that band members change over time and band roles change from one song to the next, so associating a band with a performance should not substitute for linking performances directly to the artists involved.

richardtallent
Hmm.... I had what you said before, but I crossed it out because I found a problem with it, but I can't remember what it was. Maybe it was just my imagination... +1 anyway, and if I can't come up with the reason I originally rejected this solution then I'll accept.
Davy8
A: 

I guess you could build a primary key by sorting and concatenate the artist ids ??

group: 3,2,6 -> 2-3-6 and 6,3,2 -> 2-3-6

Tom
A: 

I don't have much experience in RDBMS. However, I have read papers of Codd and books by C.J. Date.

So, instead of using RDBMS jargon, I'll try to explain in more common sensical terms (at least to me!)

Here goes -

  1. Singer names should be standard on "First Name - Last Name" basis

  2. Each "Singer" should have an entry in the "Artists Group" table even if they have performed solo

  3. Each entry in the "Artists Group" will consist of multiple "Singer" ordered alphabetically. There should be a single occurance of a specific combination.

  4. Each song will have an entry of a unique record from "Artists Group" regardless of whether they are solo, duets or in a gang.

I don't know if this makes much sense, but it's my two cents!

globetrotter

related questions