I'm using the Microsoft AntiXss 3.1 library. We have a number of international sites which use non-Latin scripts. We're using SEO-friendly URL's, so we have non-ASCII characters that end up in the URL.
AntiXss.UrlEncode (at least in 3.1) treats "international characters" as safe, so we end up with an IRI instead of a URI:
http://somesite.com/ja-JP/applications/search/セキュリティ-b200009
HttpUtility.UrlEncode generates the correct encoding for a URI (RFC3986):
http://somesite.com/ja-JP/applications/search/%e3%82%bb%e3%82%ad%e3%83%a5%e3%83%aa%e3%83%86%e3%82%a3-b200009
but I'd rather follow our standard of using the AntiXss library.
I know that the AntiXss/WPL 4.0 has been released (and no longer appears to treat international characters as safe by default), but it has changed the API names, so I'd have to make significant changes to our application to upgrade.
So, I'd be happy with an answer to any of the following:
- How to coax AntiXss to do a UrlEncode that is compatible with the Uri standard.
- Some reassurance that if we go with the IRI compliant output of the AntiXss library (which is preferable), that we aren't setting ourselves up for compatability issues with older proxy servers in Thailand (or anywhere else - we can test against our browser matrix, but not all the intermediate networking equipment that might exist between us and our customers).
- HttpUtility.UrlEncode is what we should use and is not noticably less secure than AntiXss.UrlEncode.
- There's some other better solution I've not considered.
Thanks