views:

211

answers:

5

How-to build tagging system like SO ?

I'm using a unique textbox on my asp.net mvc website to submit "tags".

First of all, i tried to split tags with commas "asp.net, c#, sql server". It works but if user forgot to seperate tags with commas i've a problem to split that string.

"asp.net c# sql server" : sql server should be a single tag, not two "sql" + "server".

Moreover i "can't" (he should not take care about this ...) ask user to use "-" character to seperate words of the tag : "sql-server"

Someone help ?

+5  A: 

either you match the string for existing tags, so then you can have tags with spaces (assuming you search for the bigger tags first so you find 'sql server' before you search for 'sql'. You could make this more robust by only allowing existing tags to be used, and have a separate mechanism for creating new tags. That way users could easily create tags with spaces, as anything entered in the new tag box would be a single tag like 'sql server 2005'.

EDIT:

Alternatively you could have some special syntax in the tags for creating new ones:

'sql,asp.net,[NEWTAG]sql server,c#' would use existing tags 'sql','asp.net','c#' and would create a new tag 'sql server'

/EDIT

or you split on spaces and don't allow tags with spaces

in your example how do you tell the difference between 'sql server' (1 tag) and 'sql' 'server' (2 tags)?

if you look on SO the tags are all space separated, so one tag is sql-server.

As long as you have the tags suggested to them as they are entering them I don't think this will be a problem

Sam Holder
in my first example, i seperate tags with commas, so i can tell that difference.
Yoann. B
but you said yourself, this is not a good solution. otherwise how can you tell if a user submits 'asp.net c# sql server' that this is not one tag?
Sam Holder
yes, that's my problem ...
Yoann. B
and the easiest solution is to disallow spaces as a separator, or match the given string for existing tags first, and use a separate mechanism for adding new tags
Sam Holder
added a third alternative in an edit
Sam Holder
thanks bebop i'll try that
Yoann. B
+6  A: 

StackOverflow has exactly the same problem with users incorrectly entering tags, for example if you had entered 'string manipulation' instead of 'string-manipulation'. You've just changed the tag separator from space to comma.

The fundamental problem is still the same, so it is no surprise that the solution is also the same:

  1. Educating your users via convenient help on your site.
  2. Moderators that helpfully go round fixing other users mistakes, cleaning up the site.
  3. Moderators also educating your users when they make mistakes so that they don't make the same mistake in future.

StackOverflow proves that this model can work well. An automated solution for correcting user errors will sometimes make errors itself because of the ambiguity you pointed out yourself. This will frustrate people who are doing it correctly only to be foiled by the software "fixing" their tags for them.

Mark Byers
+2  A: 

You might try a statistical spelling correction kind of approach: if there are a bunch of things already tagged "sql server" it could make an educated guess. Of course, it would get it wrong sometimes.

Darius Bacon
+8  A: 

There's one (easy) way I can think of to allow your user to include any character in a tag. That solution is to allow the user to enter only one tag at a time. You could have a textbox where the user enters the tag (autocompletion for existing tags is a definite plus), presses enter or a button when finished entry, and the entered tags appear below the textbox in a list of applied tags. Those applied tags must have a button for each tag in order to allow the user to remove the tag.

Wordpress has a similar tagging mechanism when you create posts, but they allow multiple tags to be entered at once by simply stating what character delimits tags. Asking for a delimiter is not a big deal, but if you don't want to mandate a particular delimiter, you'll simply have to restrict the user to entering a single tag at a time.

image of wordpress's tag box

Another Idea (edit)

I just read this today: Tokenizing Control

Benny Jobigan
A: 

My advice:

First off, up front, choose right away: do you want to allow tags with spaces in their names or not? Pick one or the other, don't try to create some crazy mish-mash with heuristic prediction about whether the user meant one thing or the other.

Either this:

sql server

always means 1 tag, or always means 2. Just choose right now what you want. One or the other. If you should choose to not allow spaces in tags (that means it's 2 tags), but you also want to allow users to separate tags with commas, e.g.:

sql,server

Then you could deal with the user entering a bunch of tags mixed, e.g.:

sql server,regular expressions,java c#

With code like this:

string[] tags = Regex.Split(input, @"(,|\s)+");

Which will get you:

tags[0]: sql
tags[1]: server
tags[2]: regular
tags[3]: expressions
tags[4]: java
tags[5]: c#
Mike Clark