In my app BugTracker.NET, I make an assumption that there won't be TOO many bugs. Maybe tens of thousands, but not tens of millions. That assumption allows me to cache the tags and the ids of the items they reference.
In the database, the tags are stored as they are entered, with the bugs, in a comma delimited text field.
When a tag field is added or changed, that kicks off a background thread that selects all bugids and their tags, parses the text, building a map where the key is the tag and the value is a list of all the ids that have that tag. I then cache that map in the Asp.Net Application object.
Below is the code I've just described.
The code could be optimized so that instead of going through all the bugs it just incrementally modified the cached map, but even unoptimized, it works fine.
When somebody does a search using a tag, I look up the value in the map, get the list of ids, and then fetch those bugs using SQL with "where id in (1, 2, 3...)" clause.
public static void threadproc_tags(object obj)
{
System.Web.HttpApplicationState app = (System.Web.HttpApplicationState)obj;
SortedDictionary<string,List<int>> tags = new SortedDictionary<string,List<int>>();
// update the cache
DbUtil dbutil = new DbUtil();
DataSet ds = dbutil.get_dataset("select bg_id, bg_tags from bugs where isnull(bg_tags,'') <> ''");
foreach (DataRow dr in ds.Tables[0].Rows)
{
string[] labels = btnet.Util.split_string_using_commas((string) dr[1]);
// for each tag label, build a list of bugids that have that label
for (int i = 0; i < labels.Length; i++)
{
string label = normalize_tag(labels[i]);
if (label != "")
{
if (!tags.ContainsKey(label))
{
tags[label] = new List<int>();
}
tags[label].Add((int)dr[0]);
}
}
}
app["tags"] = tags;
}