11Internationalized Domain Names in Applications (IDNA)
22=====================================================
33
4- Support for the Internationalized Domain Names in
5- Applications (IDNA) protocol as specified in `RFC 5891
6- <https://tools.ietf.org/html/rfc5891> `_. This is the latest version of
7- the protocol and is sometimes referred to as “IDNA 2008”.
8-
9- This library also provides support for Unicode Technical
10- Standard 46, `Unicode IDNA Compatibility Processing
4+ Support for the `Internationalized Domain Names in
5+ Applications (IDNA)<https://tools.ietf.org/html/rfc5891> `_
6+ and `Unicode IDNA Compatibility Processing
117<https://unicode.org/reports/tr46/> `_.
128
139This acts as a suitable replacement for the “encodings.idna”
1410module that comes with the Python standard library, but which
1511only supports the older superseded IDNA specification (`RFC 3490
16- <https://tools.ietf.org/html/rfc3490> `_).
12+ <https://tools.ietf.org/html/rfc3490> `_). Using the latest
13+ version of IDNA along with UTS46 support provides more
14+ comprehensive language coverage with the latest versions of
15+ the respective standards, and reduces the potential of allowing
16+ domains with known security vulnerabilities.
1717
1818Basic functions are simply executed:
1919
@@ -29,7 +29,8 @@ Basic functions are simply executed:
2929 Installation
3030------------
3131
32- This package is available for installation from PyPI:
32+ This package is available for installation from PyPI via the
33+ typical mechanisms, such as:
3334
3435.. code-block :: bash
3536
4041-----
4142
4243For typical usage, the ``encode `` and ``decode `` functions will take a
43- domain name argument and perform a conversion to A-labels or U-labels
44+ domain name argument and perform a conversion to ASCII compatible encoding
45+ (known as A-labels), or to Unicode strings (known as U-labels)
4446respectively.
4547
4648.. code-block :: pycon
@@ -51,17 +53,6 @@ respectively.
5153 >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
5254 ドメイン.テスト
5355
54- You may use the codec encoding and decoding methods using the
55- ``idna.codec `` module:
56-
57- .. code-block :: pycon
58-
59- >>> import idna.codec
60- >>> print('домен.испытание'.encode('idna2008'))
61- b'xn--d1acufc.xn--80akhbyknj4f'
62- >>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))
63- домен.испытание
64-
6556 Conversions can be applied at a per-label basis using the ``ulabel `` or
6657``alabel `` functions if necessary:
6758
@@ -70,19 +61,17 @@ Conversions can be applied at a per-label basis using the ``ulabel`` or
7061 >>> idna.alabel('测试')
7162 b'xn--0zwm56d'
7263
64+
7365 Compatibility Mapping (UTS #46)
7466+++++++++++++++++++++++++++++++
7567
76- As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895 >`_, the
77- IDNA specification does not normalize input from different potential
78- ways a user may input a domain name. This functionality, known as
79- a “mapping”, is considered by the specification to be a local
80- user-interface issue distinct from IDNA conversion functionality.
81-
82- This library provides one such mapping — `Unicode IDNA Compatibility
83- Processing <https://unicode.org/reports/tr46/> `_ developed by the Unicode
84- Consortium. Strings are preprocessed according to Section 4.4
85- “Preprocessing for IDNA2008” prior to the IDNA operations.
68+ This library provides support for `Unicode IDNA Compatibility
69+ Processing <https://unicode.org/reports/tr46/> `_ which normalizes input from
70+ different potential ways a user may input a domain prior to performing the IDNA
71+ conversion operations. This functionality, known as a
72+ `mapping <https://tools.ietf.org/html/rfc5895 >`_, is considered by the
73+ specification to be a local user-interface issue distinct from IDNA
74+ conversion functionality.
8675
8776For example, “Königsgäßchen” is not a permissible label as *LATIN
8877CAPITAL LETTER K * is not allowed (nor are capital letters in general).
@@ -100,13 +89,6 @@ conversion.
10089 >>> print(idna.decode('xn--knigsgchen-b4a3dun'))
10190 königsgäßchen
10291
103- ``encodings.idna `` Compatibility
104- ++++++++++++++++++++++++++++++++
105-
106- Function calls from the Python built-in ``encodings.idna `` module are
107- mapped to their IDNA 2008 equivalents using the ``idna.compat `` module.
108- Simply substitute the ``import `` clause in your code to refer to the new
109- module name.
11092
11193 Exceptions
11294----------
@@ -120,16 +102,16 @@ when the error reflects an illegal combination of left-to-right and
120102right-to-left characters in a label; ``idna.InvalidCodepoint `` when
121103a specific codepoint is an illegal character in an IDN label (i.e.
122104INVALID); and ``idna.InvalidCodepointContext `` when the codepoint is
123- illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ
105+ illegal based on its position in the string (i.e. it is CONTEXTO or CONTEXTJ
124106but the contextual requirements are not satisfied.)
125107
126108Building and Diagnostics
127109------------------------
128110
129111The IDNA and UTS 46 functionality relies upon pre-calculated lookup
130112tables for performance. These tables are derived from computing against
131- eligibility criteria in the respective standards. These tables are
132- computed using the command-line script ``tools/idna-data ``.
113+ eligibility criteria in the respective standards using the command-line
114+ script ``tools/idna-data ``.
133115
134116This tool will fetch relevant codepoint data from the Unicode repository
135117and perform the required calculations to identify eligibility. There are
0 commit comments