tags:

views:

67

answers:

4

Hi,

I'm creating a music player, where the user can search for artists, albums, or songs.

I have created a script that reads all the tags from the mp3s in the music library, and updates a database of songs, in a single table, containing artist names, albums, track titles, etc. Currently, this works well, because it can scan for any changes in the music library, and add/delete rows for corresponding songs in the database. This scan routine is therefore a fairly short an easy to understand piece of code, because it maintains only a single table.

I understand the database would be more powerful if artists, albums, and tracks have their own table, and are all linked to each other. I haven't done anything about the search part yet -- how screwed am I, if I keep everything in one table?

Thanks.

+1  A: 

pretty screwed, indeed. it's hardly normalized. go for separate tables.

if you've never heard of normalization or understood why it was important, perhaps you should read this. it's a succinct, simple explanation without a lot of jargon.

or you could go straight to the source since you're already using mysql:

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

think about the cardinalities and relationships in your model:

  1. an album will have zero or more tracks; a track will belong to only one album (album-to-track is one to many)
  2. an artist can create zero or more albums; an album can be created by one or more artists (artist-to-album is many-to-many)

you'll want to think carefully about indexes, primary, and foreign keys. add indexes to non-key columns or groups that you'll want to search on.

this design would have four tables: album, track, artist, and artist_to_album many to many join table.

duffymo
@drachenstern - Very seriously. You assume that the OP doesn't know about normalization. It's a good assumption, but one that I chose not to make. I'll read your example and comment.
duffymo
@Duffymo ~ Now that you've come back and added quite a bit more into it, I retract my complaint.
drachenstern
one does what one can in the time one has. i don't always have the time to give the answer i want the first time.
duffymo
+2  A: 

Your database is not normalized. You say it's all in one table, but you haven't given any information about the schema.

The kinds of problems which non-normalized databases have include problems with consistency related to storing redundant information - if you have something like:

Album, Track, Artist

then to change the Album name, you have to change it on every track associated with the Album.

Of course, there are all kinds of "database" systems out there which are not normalized, but these usually have mechanisms to handle these kinds of things which are appropriate to their paradigms.

Cade Roux
A: 

So the subject you're asking about is called "Normalization" and while this is useful in many circumstances, it can't always be applied.

Consider the artist Pink. Some of her albums have her name as Pink and others P!nk which we recognize as the same visually, because we know it's her. But a database would by force see these two separately (which also makes searching for her songs harder, but that's another story). Also consider Prince, "The artist formally known as Prince", etc.

So it might be possible to have an artist ID that matches to both Pink and P!nk but that also matches to her albums Funhouse etc. (I'm really gonna stop with the examples now, as any more examples will need to be tabular).

So, I think the question becomes, how complex do you want your searching to be? As is, you're able to maintain a 1:1 correlation between tag and database info. It just depends how fancy you want things to be. Also, for the lookup I mentioned above, consider that most times that information is coming from the user, you really can't supply a lookup from P!nk to Pink any more than you would from Elephant to Pachyderm because you don't know what people are going to want to enter.

I think in this case, the naive approach is just as well.

drachenstern
Also consider not just artists, albums, titles, but also producers, mixers, studios, band members, lyricists, etc... title art also. And then there's special edition re-releases of the same info as the original disc. It becomes complicated. The real question is what's your intent, what's your use case? Most people don't need all these factors in their music list.
drachenstern
Issues like "Pink" and "P!nk" can be solved using some kind of scrubbing prior to insertion. I would have been far more impressed if you had given some description of normalization since you chided me for not doing so. Your answer is rather disappointing, given the tone you took with mine. I expected something far more insightful.
duffymo
Of course I didn't go into an indepth discussion of normalization here since Wikipedia has a good writeup on the topic and there's plenty of other threads here talking about normalization, but I identified it and tried to point out some things for him to consider. When I made my post your comment had about 10 words, which wasn't very helpful. As noted above, you've since corrected that. Just seemed like your heart wasn't in the answer.
drachenstern
I didn't anticipate so much controversy! :) Also, I didn't know about normalization, so I guess I have a few things to learn.I didn't mention other types of info, to keep my question simple. But producers, band members, composers etc, will also be added to the database.Use case: if a user types "Prince", it should list all artists, albums, and songs that contain the string "Prince". Also, if "Prince" is the producer of an album from a different artist, it should list that album.
Wurlitzer
I think I have to read more on the subject of normalization, because I tend to think keeping everything in a single table would be a simple and working solution.
Wurlitzer
Turns out I'm already familiar with the concept of normalization, I just didn't know what it was called.
Wurlitzer
+2  A: 

In regards to the Pink/P!nk situation, if that's a big deal to you, then yes, normalization would be useful.

Your songs table would reference an artist_id.

You'd also have a table of artist aliases, which would map the various names that a particular artist has gone by to that artist_id.

But this can get pretty complex, and technically, it may not even be correct in your situation, as if an artist chooses to release projects under different names, they may not want them all lumped together.

In general, normalized databases are a safe place to start, but there are plenty of good reasons to denormalize, and it is more important to understand those reasons then blindly always do things one way.

daryn
Thanks @Daryn, that's what I was trying to illustrate. I don't think it's a good idea to normalize this particular data set, not yet anyways...
drachenstern
I disagree - if the OP can't fit their head around four tables they're already in grave danger.
duffymo
The Pink/P!nk situation is not a big deal, as it would require to manually maintain the database, and there are endless cases of artists with different aliases. If Last.fm lists Pink and P!nk as two different entities, then having them as two different artists, is good enough for me.
Wurlitzer