Thanks for your interesting report. I checked with Gmail myself to confirm your report that anheß.de was being converted into anhess.de (it is being converted). However, I'm afraid that this isn't a vulnerability, as I discovered while preparing to open a security bug against Gmail, since the IDNA-2008 rules actually do require this behavior. If you were to try register the anheß.de domain with DENIC today they would tell you that it is already registered, as https://gwhois.org/anhe%C3%9F.de+dns
shows the normalized domain name as anhess.de.
According to the IDN technical standards, the "ß" (Eszet) is effectively mapped to "ss" based on the Nameprep mechanism. Therefore, in short, YES, you may register domain names that contain the "ß", but that information, when passed to the registry by your registrar, will essentially be registered as double-s: "ss".
The IDNA2008 protocol supports both the German Eszett (ß) and the Greek ending sigma (ς) on input as fully allowed characters. With that said, due to the introduction of the homoglyph bundling mechanism, both characters are part of the homoglyph bundling algorithm, meaning that currently registered domain names containing the characters “ss”, or the Greek normal sigma (σ), prevent domain names with the German Eszett (ß) or Greek ending sigma (ς) from being registered.
More technically, any IDNA domain name is actually registered as an ASCII domain name using Punycode
with an xn-- prefix. For example, the IDNA domain ähnlich.de would be converted into xn--hnlich-9ta.de. However, IDNA-2008 specification says that bundled "homoglyphs" (different ways of writing the same characters, such as ligatures
for ff, fi, fl, ffi, and ffl in English) must be normalized before Punycode conversion is performed. For the German Eszett, that normalization converts it to two 's' characters (just as Gmail and gwhois.com, and Afilias, and DENIC all do).
The downside of the homoglyph normalization is that there is no way to de-normalize double 's' to ß in domain names since the rules for ß versus ss are complicated and changed in 1996
, and in any case apply only to German words, not domain names, which are usually not words and even less often German words. The upside of the normalization is that the ASCII form of anheß.de is anhess.de and not xn--anhe-d6b.de or something like that.
If you really want to see something like an Eszett in your domain name, you could also (try to) register a version of the name with a Greek beta in place of the Eszett, like anheβ.de (xn--anhe-8ld.de), but it is likely to be rejected by a registry as an IDN homograph attack