views:

100

answers:

2

Since

In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of country code top-level domains (ccTLDs) in the Internet that use the IDNA standard for native language scripts.

I'm pretty sure that the standard regexes most sites currently use won't mark these as valid, or am I wrong? Has anyone actually thought about how this would play out or has anyone done anything about it?

Hope I'm not jumping the gun here.

+3  A: 

When a user types an internationalized domain into a browser, it's translated to an ASCII form; e-mail, surely, must work the same way (however, I've never received mail from an IDNA domain and I have reason to believe browsers are the only implementors of it).

Mailing agents would have to know that when they see Unicode in an address, it must be translated to IDNA form, and then the MX records looked up. I don't think in all of my system administration I've ever accounted for this. Being able to accept something the browser will translate as IDNA in a form element is not something I know how to do. If it is indeed translated to IDNA and a regex attempts to validate it, it should work.

I wouldn't be surprised if an international domain fails most e-mail regular expressions, and I think the relevance of such a fail is less than 1%. IDNA is really an "address bar" system, and an awful hack; I would really be surprised if e-mail worked on top of it.

Everyone is freaking out like something is changing. It isn't. IDNA is just moving from the domain to the TLD, and business will be as usual like it was before. Don't overthink it, OP.

Jed Smith
Is IDNA the same as Punycode?
Kinopiko
@Kinopiko: The IDNA technique produces Punycode.
Jed Smith
Almost the same. It's punycode + nameprep unicode normalization + a prefix, "xn--" also called, for some reason, "ACE"
ZJR
+2  A: 

Old regexes will mark IDNA valid after they are correctly translated into ASCII DNS names.

So yes, we have a problem here. One cannot expect a user to simply input unicode into a textarea and receive an ASCII version of the domain name on the server side.

IDNA encoding is not nice, nor easy: Unicode chars are removed for the word they are in and placed after it, with a position marker.

Implementing it (e.g.) in javascript will be slow, sad and boring. An urlencode approach would have been better, and people with systems not supporting IDNA will have troubles figuring out what a given domain looks like in ascii.

I feel IDNA came out pretty ugly and that will hinder its adoption.

ZJR