What tag schema(s) are the most efficient/effective?

It all depends on data volumes and content to tag distribution and density ratios

If you have a low tag distribution and density ratio (typical human generated data) you can simply generate an unique id or hash for each possible collection of tags in use by the data. Associate the 'tag collection' id with each data instance with those tags

This can work surprisingly well for many forms of human generated data

e.g. Stackoverflow has ~500,000 questions, and ~20,000 tags (too many dupe-ish tags!). Most questions have less than five tags. At worst case scenario you will have 500,000 'tag collection' id's to associate , but more realistically you will have several thousand

You also will either have to have instance tracking or garbage collection on the 'tag collection' collection as specific combination of tags fall out of use

e.g.

Tag: id, tagName
TagCollection: id, instanceCount
TagCollectionTag: tagCollectionIId, tagId
Data: id, title, content, tagCollectionId

Inserting tags is fast if a hash is used (hash on all tags of the collection). Otherwise you have to search the TagCollection and TagCollectionTag collections, but this should not be too large anyway

Searching is fast; search TagCollectionTag for instances containing the specific set of tags, and then find data rows with any of those tagCollectionId's

Hope that wasn't too confusing :-)

ansaurus

tags:

views:

answers:

What tag schema(s) are the most efficient/effective?

related questions