I am writing a search application specifically for music playlists.
The genre and file format differs from playlist to playlist, and sometimes within the playlist there are differences too. There is also a concept of "synonymous" tags (e.g. urban would cover both hiphop and r&b, but not the other way around).
Below is a list of search terms and my expected results.
gospel: should return all playlists with at least one gospel song. playlists with all gospel songs would be shown first. urban: should return all r&b and hiphop. again playlists with all urban tracks would come first. hiphop: should return all hiphop but not r&b. flac: should return all playlists that contain flac files. starting with the ones that are pure flac. hiphop flac: should return hiphop flacs first, followed by other hiphop audio hiphop AND flac: should return hiphop flacs only hiphop audio: should return hiphop flacs, hiphop mp3s, etc
As I'm just starting this project, I'm thinking of the best way to index all this. Would a fulltext search thing like Lucene be of any use here? Note I don't have any text describing these playlists, but I could generate some.
I'm thinking of organising all these terms as "tags" and storing them in the db many-to-many.
table: playlist ( pk(id), desc ) table: tag ( pk(id), desc ) table: playlist_has_tag ( pk(link_id, tag_id) )
To solve the urban == hiphop || rnb thing, I would maybe add a tag_synonyms table:
table: tag_synonyms ( pk(tag_id, synonym_tag_id) )
Then I'd have two records to indicate that urban encompasses hiphop and rnb: urban's tag id, hiphop's tag id urban's tag id, rnb's tag id
I'm feeling though that the query could be come quite convoluted using this approach.
Could CouchDB be of use here? I'm currently using PostgreSQL. Is there some software out there that will make this kind of thing easy?
I would like to be able to drill down and support complex search terms in the future like:
(hiphop OR house) AND filetype:mp3 AND artwork:no
And also incorporate things like duration, etc.