views:

534

answers:

7

Everywhere I look I see that whenever a site implement a tags system, they convert the tags names to lowercase. Even here in StackOverflow.

I was thinking about why is it so. Other than preventing duplication I can't think of a reason to use lowercase. I believe it hurts the practical aspect of the tags. People are used to read "IBM" not "ibm" and "C#" not "c#". It takes a bit more time for the user to understand whats the meaning of the tag, and I'm wondering if I should allow Capitals in my tags system, or is it a convention and I got it all wrong.

I want to hear your opinion.

A: 

That sounds like a valid point to me. I'm sure they could come up with some simple parsing to capitalize each word (separated by dashes), but how would you know that its supposed to be IBM, instead of Ibm? I think someone would have to manually change the tag lookup table to accomplish this.

SkippyFire
Yea, but since the editors of websites are usually the ones who decide about the tags they know if its suppose to be IBM or ibm. And if users do the tagging then they usually choose from a given set of tags. I only say that allowing capitalization is an advantage more than a disadvantage.
Roy Peleg
Most sites I've seen which use tags use freeform tags - just type in whatever tag you want. Cases such as you describe where admins/editors build a list of all tags and users select tags from that list are, in my experience, very much the exception.
Dave Sherohman
+16  A: 

As you already noticed, it prevents duplication. People are not consistent in their capitalization. Just look at the tags here and notice that people can't decide whether it's "objective-c", "objc" or "objectivec". Throw in "Objective-C", "Objective-c" and so on, and you'd have a real mess.

Note I'm not saying it would be impossible to deal with capitals, just difficult. For example, how do you know the correct capitalization? Just accept the first one entered as correct? Rely on moderators to clean up?

Paul Tomblin
Comparison could be case-insensitive, and people with Editor permissions could clean up any new ones - folk seem to be happy to correct typos in posts, I reckon they would be happy to clean up Tags too.
Kristen
But you could easily add a tolower() comparison of the existing tags... or even have a column in the DB table already containing the lowered version of the string. Then there isn't really any overhead other than the actual space the extra column takes up.
SkippyFire
Yea, it can create a mess. But when choosing a tag you can see if it has duplications or not, you'll get a drop down. I mean, limiting functionality in this case for the sake of duplication doesn't sound ideal. I'm sure preventing duplication can be solved in other ways.
Roy Peleg
I try to clean up tags when I can (at least the ones I'm interested in). For instance, I change "OS" to the more popular "OperatingSystem." I'd be more than willing to help fix casing in tags.
Giovanni Galbo
+3  A: 

Different cases should be always be considered equivalent for tags.

Another reason to store your tags normalized. The single normalized version contains the accepted case, and tags are linked using many-to-many link table. Comparison against the tag table is done case-insensitive, so there will never be duplicates.

Cade Roux
You seem to be saying that the site administrators have to create a list of valid tags, or accept that the first person to enter a new tag has the correct capitalization.
Paul Tomblin
I think that moderation should be done on tags, just like its being done on any other user entered content. And assuming it is done right, there isn't any reason to limit tags to lowercase.
Roy Peleg
@Paul Neither. The first person to create the tag sets the case, which is attached to the tag (and can be changed by any moderator, just like a tag should be changeable).
Cade Roux
@Roy I agree, until we have proper tag maintenance, though, I think it has to be this way. I asked for tags to be forbidden - like server. Tagging something sql+server is wrong, and server is nearly meaningless. Also, we need to coalesce tags - the design of a good tagging system facilitates this
Cade Roux
+2  A: 

(I am not advising for any particular site or system in this answer - each specific system may have its own considerations)

I guess the reason is to prevent duplication and ease sorting or identification (it's easier if you do not need to consider multiple options). And possibly to maintain some consistency, as many web user interfaces are geared towards people that are likely to sometimes bother to capitalize correctly and otherwise not).

But then, those are a problems anyway because there is all too often more than one way to refer to something. If your tags are ever used as symbols in some sort of script, configuration, or code (e.g. mail filters, setting files, command lines), it's good to have some simple convention for specifing them, and if all symbols are of similar significance, allowing or distinguishing between different case variations, deliminations, etc. can be problematic. As a Unix user, I try to keep file names simple, short, lowercase, and without special characters, and moreso when they are (for example) mailbox names or source files - as they are likely to have to be typed, and specified in many contexts where doing otherwise will be inconvenient.

On the other hand, when using a sophisticated graphical or web-based interface which allows easy selection among a list, completion of typed entry, suggests closest matches, etc., it makes sense to allow some sort of mapping. Give each tag a short simple lowercase identifying name, but allow giving it also a "long" or "human" name, which will be shown where it makes sense. Tags can be uniquely identified and specified by their short name, but read more conveniently by their long name.

This is similar to how usernames work in many systems. I wouldn't choose a mixed-case username, and rather have usernames be treated case-insensitive (so I would just use the case that makes sense on the system I am in, which is lowercase in Unix but uppercase in some other old systems). Then, most systems have some other information stored about users, like their long or full name, which is nicer to read, and therefore many user interfaces (e.g. Windows XP, Mac OS, and I guess also some newer Unix desktop interfaces like GNOME and KDE) display on desktop login choosers, messages, etc.

In the case of tags for community systems on the web, I guess the solution to the duplication problem is some level of moderation to tags, even if just by the community itself, and the ability to rename and merge tags (unlike usernames in most cases) or edit their long names, in case something was mistagged.

Tom Alsberg
A: 

I agree that in principle this could be done in a more sophisticated manner. For example, you could implement a similarity metric that could recognize all of these as being likely synonyms:

  • IBM
  • ibm
  • I B M
  • I. B. M.
  • I.B.M.

However, there's a tradeoff between the increased runtime (not to mention development effort) and the increase in utility.

It's also been my general experience that as heuristics become more complex, their failure modes become more mysterious and bizarre. At least the convert-alphabetics-to-standard-case technique is easy for humans to understand and do in their heads when they have questions.

joel.neely
+2  A: 

Ask an engineer the reason why something is a certain way, and they'll go to great lengths to figure it out. ;)

In this case, I'd be inclined to explain the prevalence of lowercase by a combination of laziness (programmers not willing to consider the points you bring up) and imitation (once you see it done a certain way on site S, you tend to reimplement it for site S' with similar assumptions).

It certainly seems feasible to store tags in such a way that case doesn't matter (for purposes of sorting, querying and so on) but display the tags with the capitalization originally intended.

Morendil
A: 

When typing, you would have to turn on caps lock to make everything upper-case. People are lazy.

jm04469
OP is arguing for allowing mixed-case tags, not suggesting that all-lowercase should be replaced by all-uppercase.
Dave Sherohman