views:

492

answers:

4

I am writing a search application specifically for music playlists.

The genre and file format differs from playlist to playlist, and sometimes within the playlist there are differences too. There is also a concept of "synonymous" tags (e.g. urban would cover both hiphop and r&b, but not the other way around).

Below is a list of search terms and my expected results.

gospel: should return all playlists with at least one gospel song. playlists with all gospel songs would be shown first. urban: should return all r&b and hiphop. again playlists with all urban tracks would come first. hiphop: should return all hiphop but not r&b. flac: should return all playlists that contain flac files. starting with the ones that are pure flac. hiphop flac: should return hiphop flacs first, followed by other hiphop audio hiphop AND flac: should return hiphop flacs only hiphop audio: should return hiphop flacs, hiphop mp3s, etc

As I'm just starting this project, I'm thinking of the best way to index all this. Would a fulltext search thing like Lucene be of any use here? Note I don't have any text describing these playlists, but I could generate some.

I'm thinking of organising all these terms as "tags" and storing them in the db many-to-many.

table: playlist ( pk(id), desc ) table: tag ( pk(id), desc ) table: playlist_has_tag ( pk(link_id, tag_id) )

To solve the urban == hiphop || rnb thing, I would maybe add a tag_synonyms table:

table: tag_synonyms ( pk(tag_id, synonym_tag_id) )

Then I'd have two records to indicate that urban encompasses hiphop and rnb: urban's tag id, hiphop's tag id urban's tag id, rnb's tag id

I'm feeling though that the query could be come quite convoluted using this approach.

Could CouchDB be of use here? I'm currently using PostgreSQL. Is there some software out there that will make this kind of thing easy?

I would like to be able to drill down and support complex search terms in the future like:

(hiphop OR house) AND filetype:mp3 AND artwork:no

And also incorporate things like duration, etc.

+1  A: 

If you try to think too hard on how to structure your data for searching, there is a good chance you will miss an important search that you could have really used in your app.

Alternatively (and this is from experience) you end up re-inventing all sorts of indexing techniques.

I have some experience with lucene (there is java and .net version, there was a C port but I am not sure how alive it is these days) - and it can do amazing things with data that is stored in any structure.

I like the look of couch db, just depends how much you want to experiment with something new and powerful, or go for something which is (currently) fairly battle hardened: lucene.

Michael Neale
A: 

I don't see how database software would play a role in your solution.

If I were to be the one implementing this, I would first ensure all related data is captured in a normalized way. This would include things like category, artwork, lyrics, etc.

The main advantage of this is your idea of 'complex' searches actually become quite simple.

Nick Stinemates
A: 

OK, just brainstorming here --

Perhaps using octal or binary to store your "format" types as a bitmask?

http://www.nitrogen.za.org/viewtutorial.asp?id=17

RandB: 1 HipHop:2 Gospel:4 Urban: 8

Now, these things are additive. You know that if something is tagged Urban, you're not going to store "8" in the flag field, but you'll store 11...Urban && HipHop && RandB. This is just a bit of "business intelligence" you'll have to have spelled out somewhere.

You can then use binary comparisons to figure out which flags you're looking for.

nathaniel
i am curious why every comment was downmodded -- they all seem to be cogent.
nathaniel
A: 

A fulltext index will serve you best if your users are going to be the ones defining the queries. Just create a custom text field that describes each attribute you want to be searchable e.g. "urban filetype:pdf gospel" and search that.

Seun Osewa