We use the excellent validator plugin for jQuery here on Stack Overflow to do client-side validation of input before it is submitted to the server.
It generally works well, however, this one has us scratching our heads.
The following validator method is used on the ask/answer form for the user name field (note that you must be logged out to see this field on the live site; it's on every /question
page and the /ask
page)
$.validator.addMethod("validUserName",
function(value, element) {
return this.optional(element) ||
/^[\w\-\s\dÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇßØøÅåÆæÞþÐð]+$/.test(value); },
"Can only contain A-Z, 0-9, spaces, and hyphens.");
Now this regex looks weird but it's pretty simple:
- match the beginning of the string (^)
- match any of these..
- word character (\w)
- dash (-)
- space (\s)
- digit (\d)
- crazy moon language characters (àèìòù etc)
- now match the end of the string ($)
Yes, we ran into the Internationalized Regular Expressions problem. JavaScript's definition of "word character" does not include international characters.. at all.
Here's the weird part: even though we've gone to the trouble of manually adding tons of the valid international characters to the regex, it doesn't work. You cannot enter these international characters in the input box for user name without getting the..
Can only contain A-Z, 0-9, spaces, and hyphens
.. validation return!
Obviously the validation is working for the other parts of the regex.. so.. what gives?
The other strange part is that this validation works in the browser's JavaScript console but not when executed as a part of our standard *.js includes.
/^[\w-\sÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇßØøÅåÆæÞþÐð]+$/ .test('ÓBill de hÓra') = true
We've run into some really bizarre international character issues in JavaScript code before, resulting in some very, very nasty hacks. We'd like to understand what's going on here and why. Please enlighten us!