Author: Glenn Fleishman / Source: Macworld
When is an “a” not necessary the “a” you think it is? When a browser shows it as part of the URL in the location or smart-search field. Due to the late entry of non-Roman characters to domain names, a backwards-compatible method of representing them aids phishing.
Unicode allows the representation of nearly all the glyphs—characters, symbols, ideograms, script element, and more—that form the basis of language and other written subjects, like math and games, in use around the world. While the Unicode Consortium started its work decades ago, but it’s only in the last few years that it’s finally permeated operating systems, browsers, and apps to the point where you can almost rely on it working almost everywhere.
But the Domain Name System (DNS) that operating systems use to turn human-readable location and resource names into the numeric and other data needed to make a connection dates back even before Unicode. And because of its ubiquity, making any change could break compatibility for hundreds of millions of people and devices—maybe more. This is why some sensible improvements, like having a cryptographic component to a domain name that prevented its being spoofed by a party that didn’t own the domain, has still not been rolled out.
Domains were encoded in a very tiny subset of all possible glyphs: the 26 Roman letters (capitalization aside) plus the numbers 0 to 9 and the hyphen. (The period or “dot” is used to separate elements of a domain name.) This rankled people who live or write or do commerce in scripts other than Roman English—even folks who just needed a ñ or a é were left out. Since that’s the vast majority of people on the planet, something had to give.
Hide one script in another
In 2003, the patch job was “punycode,” a funny name for using ASCII characters to represent glyphs outside of the DNS-allowed set. This would let me register, say,