views:

36

answers:

1

My web application stores URL segments in a database. These URL segments are based on user-submitted content.

What collation should I use for character strings that will appear in URLs?

My assumption is ASCII General CI (?) based on this question: http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid

+2  A: 

It doesn't really matter as far as I can see. The characters valid in a URL are represented in any character set I know of, and I wouldn't use different collations between tables and columns - you'll get "illegal mix of collations" problems on any attempt to join them or perform any other kind of cross-column or cross-table operation (see my recent problem here).

Correct me if I'm wrong of course.

Pekka
Then is there a performance factor for using, say, UTF-8 General CI instead of a "simpler" encoding that could store the same data (ASCII General CI)?
Dolph
I'm no database guru so I can't say for sure, but logic tells me no, because the characters you mention take up one byte in both standard ASCII and UTF-8 collations. I'm pretty sure any overhead *must* be minuscule.
Pekka