I'm looking at encoding strings to prevent XSS attacks. Right now we want to use a whitelist approach, where any characters outside of that whitelist will get encoded. Right now, we're taking things like '(' and outputting '(' instead. As far as we can tell, this will prevent most XSS.
The problem is that we've got a lot of international users, and when the whole site's in japanese, encoding becomes a major bandwidth hog. Is it safe to say that any character outside of the basic ASCII set isn't a vulnerability and they don't need to be encoded, or are there characters outside the ASCII set that still need to be encoded?