For my current project we want to present statistical data and rank it. For my case I'm talking about "Favouriting" of an artist, counting the times an artist's track has been played, displaying a count of how many playlists an artists track has been added to a playlist... These are all very domain specific issues, but it's a concrete example of my issue.
The main issue is that I'm going to be returning result sets that are returned in order for all these statistical attributes.
Here are some examples:
- Music Landing page should display the top 5 artists that have been favourited the most.
- Music Landing page should display top 5 tracks that have been played the most.
My first thought has determined I need a computed aggregate column. Since I want to order on these values that means a CLUSTERED INDEX would be optimal on each aggregate I want to order by. Secondly, since DML on CLUSTERED INDEX columns can be costly when they are not sequential on insertion I need to make this a scheduled job.
So, for the artist favourite stats, here's the DDL that I have come up with. Noted my T-SQL might be horribly off, but I think the intentions are clear.
CREATE TABLE Stats_ArtistFavourites (
FavouriteCount INT DEFAULT 0,
ArtistId INT PRIMARY KEY NONCLUSTERED,
FOREIGN KEY (ArtistId) REFERENCES Artists
)
CREATED CLUSTERED INDEX IDX_Favourites
ON Stats_ArtistFavourites (FavouriteCount, ArtistId) DESC
So as you can see, I would need to create a separate table for each stat I want to keep track of, otherwise I would have to ORDER BY columns that aren't in the CLUSTERED INDEX. The fact that this seems ugly makes me think I'm going about it all wrong.
Should I start thinking about integrating OLAP (I have very little experience with OLAP cubes)? Or maybe Lucene?