tags:

views:

285

answers:

3

I have user submitted tags that can be any type of (valid) UTF-8 string. I want to know if it is safe to include them in the URL merly by running them through urlencode().

In other words, is urlencode() safe to use for valid UTF-8 strings? (by valid I mean id have already force-encoded them to UTF-8)

+1  A: 

Yes, urlencode() should make a safe URL string out of any input string. As long as whatever that URL is mapping to (folder/file/htaccess), doesn't have funky characters in it. Whenever sanitizing stuff from a user where they could be posting something funky I love this function:

utf8_encode()

Code Monkey
+1 for the Awesome username. Now I'll read your answer...
Xeoncross
Sorry, utf8_encode() is not a safe function. It is only meant to be used for safe strings (which user input is not).
Xeoncross
Huh... didn't realize that. Thanks! Right now i'm using that function to clean up the data coming right from a wordpress database. Its the only function I found that would take out those funky characters that wordpress puts in there (like double space, styled quotes). Is there a better way to do this?
Code Monkey
+8  A: 

urlencode does not depend on a specific character encoding. It just looks at the bytes, interprets them as ASCII characters and replaces any byte that is either not allowed in ASCII (0x80–0xFF) or not allowed in plain in a URL.

Now to your question: Yes, using urlencode does encode any string in any character encoding to be safely used – but only in the URL query! Because urlencode formats the input according to application/x-www-form-urlencoded that differs from the “normal” percent encoding in how the space is encoded: In application/x-www-form-urlencoded spaces are replaced by + while the “normal” percent encoding replaces them by %20.

If you want to “normal” percent encoding use rawurlencode instead.

Gumbo
A: 

Just to be entirely on the safe side, I would remove newlines first. They are not dangerous in themselves, but they can be stepping stones in exploiting other vulnerabilities.

Tgr