views:

48

answers:

0

I'm parsing my nginx logs, and I want to discover some details from the HTTP_REFERER string, for example, the query string used to find the web site. One user typed in "México" which gets encoded in the log as "query=M%E9xico".

Passing this through Rack::Utils.parse_query('query=M%E9xico') you get a hash, {"query" => "M?xico"}

When you to stuff "M?exico" into Postgres (but not the more forgiving SQLite), it pukes because the string isn't proper UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324, unescape is packing a hex string.

How can I convert the string back to UTF-8, or can I get parse_query to return UTF-8 in the first place.