I have this table: id,bookmarkID,tagID I want to fetch the top N bookmarkIDs for a given list of tags. Does anyone know a very fast solution for this? the table is quite large(12 million records) I am using MySql
A:
It really depends on how the relational tag to bookmark data is structured. Ideally, each tag is mapped to one or more bookmarks which is essentially a huge reverse index of tags to bookmarks. If that is the case, you can fetch all rows that map tag to bookmark and from that apply a basic scoring function accross your results.
You could probably base it on the lucene scoring algorithm that includes the use/spread of the tag across the entire corpus, the density of tags for a given bookmark and some sort of normalizing factor based on when it was bookmarked.
Nick Gerakines
2010-05-13 23:58:05
A:
I mainly operate in MSSQL but I think something along the lines of this should work out for you:
SELECT bookmarkID
FROM myTable
WHERE tagID in ('tag1,tag2,tag3')
ORDER BY bookmarkID ASC
LIMIT 0,n
I could be wrong though, please let me know :)
Yoda
2010-05-14 00:02:13