Let's assume a user enter address of some resource and we need to translate it to:
<a href="valid URI here">human readable form</a>
HTML4 specification refers to RFC 3986 which allows only ASCII alphanumeric characters and dash in host part and all non-ASCII character in other parts should be percent-encoded. That's what I want to put in href attribute to make link working properly in all browsers. IDN should be encoded with Punycode.
HTML5 draft refers to RFC 3987 which also allows percent-encoded unicode characters in host part and a large subset of unicode in both host and other parts without encoding them. User may enter address in any of these forms. To provide human readable form of it I need to decode all printable characters. Note that some parts of address might not correspond to valid UTF-8 sequences, usually when target site uses some other character encoding.
An example of what I'd like to get:
<a href="http://xn--80aswg.xn--p1ai/%D0%BF%D1%83%D1%82%D1%8C?%D0%B7%D0%B0%D0%BF%D1%80%D0%BE%D1%81">
http://сайт.рф/путь?запрос</a>
Are there any tools to solve these tasks? I'm especially interested in libraries for Python and JavaScript.
Update: I know there is a way to do percent and Punycode (without proper normalization, but I can live with it) encoding/decoding in Python and JavaScript. The whole task needs much more work and there are some pitfalls (some characters should be always encoded or never encoded depending on context). I wonder if there are ready to use libraries for the whole problem, since it seems to be quite common and modern browsers already do such conversions (try typing http://%D1%81%D0%B0%D0%B9%D1%82.%D1%80%D1%84/
in Google Chrome and it will be replaced with http://сайт.рф/
, but use Host: xn--80aswg.xn--p1ai
in HTTP request).
Update2: Vinay Sajip pointed that Werkzeug has iri_to_uri and uri_to_iri functions that handles most cases correctly. I've found only 2 cases where it fails so far: percent-encoded host (quite easy to fix) and invalid utf-8 sequences (it's a bit tricky to do nicely, but shouldn't be a problem).
I'm still looking for library in JavaScript. It's not hard to write, but I'd prefer to avoid inventing the wheel.