ISO/IEC 8859-2
MIME / IANA | ISO-8859-2 |
---|---|
Alias(es) | iso-ir-101, csISOLatin2, latin2, l2, IBM1111 |
Language(s) | (see below) |
Standard | ECMA-94:1986, ISO/IEC 8859 |
Classification | Extended ASCII, ISO/IEC 8859 |
Extends | US-ASCII |
Based on | ISO-8859-1 |
Other related encoding(s) | Windows-1250, MacCroatian |
ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.[2] Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8 (on the web).
ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Less than 0.04% of all web pages use ISO-8859-2 as of October 2022.[3][4] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned code page 912 to ISO 8859-2,[5] until that code page was extended in 1999.[6] Code page 1111 is similar, but replaces byte B0 ° (degree sign) with U+02DA ˚ (ring above).
Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).
Language coverage
[edit]These code values can be used for the following languages:
- ^ The missing letter Å is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.
- ^ In 2017, the Council for German Orthography officially added a capital ẞ, but is not actually required as SS can be used instead.
- ^ This character set unifies Ș and Ț (S,T with commas below) with Ş and Ţ (S, T with cedillas), as did virtually all other character sets including Microsoft's Windows-1250 and the first version of Unicode. Unicode subsequently disunified them however Unicode notes as of 2014[citation needed] that disunifying the letters with comma below was a mistake, causing corruptions of Romanian data: pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.
Code page layout
[edit]Differences from ISO-8859-1 have the Unicode code point number underneath.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | Ą 0104 |
˘ 02D8 |
Ł 0141 |
¤ | Ľ 013D |
Ś 015A |
§ | ¨ | Š 0160 |
Ş 015E |
Ť 0164 |
Ź 0179 |
SHY | Ž 017D |
Ż 017B |
Bx | ° | ą 0105 |
˛ 02DB |
ł 0142 |
´ | ľ 013E |
ś 015B |
ˇ 02C7 |
¸ | š 0161 |
ş 015F |
ť 0165 |
ź 017A |
˝ 02DD |
ž 017E |
ż 017C |
Cx | Ŕ 0154 |
Á | Â | Ă 0102 |
Ä | Ĺ 0139 |
Ć 0106 |
Ç | Č 010C |
É | Ę 0118 |
Ë | Ě 011A |
Í | Î | Ď 010E |
Dx | Đ 0110 |
Ń 0143 |
Ň 0147 |
Ó | Ô | Ő 0150 |
Ö | × | Ř 0158 |
Ů 016E |
Ú | Ű 0170 |
Ü | Ý | Ţ 0162 |
ß |
Ex | ŕ 0155 |
á | â | ă 0103 |
ä | ĺ 013A |
ć 0107 |
ç | č 010D |
é | ę 0119 |
ë | ě 011B |
í | î | ď 010F |
Fx | đ 0111 |
ń 0144 |
ň 0148 |
ó | ô | ő 0151 |
ö | ÷ | ř 0159 |
ů 016F |
ú | ű 0171 |
ü | ý | ţ 0163 |
˙ 02D9 |
See also
[edit]References
[edit]- ^ "Microsoft Outlook Message Encodings". 10 January 2017.
- ^ "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2022-02-27.
- ^ "Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022". w3techs.com. Retrieved 2022-10-23.
- ^ "Historical trends in the usage statistics of character encodings for websites, February 2022".
- ^ "Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data". GitHub.
- ^ "Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data". GitHub.
External links
[edit]- ISO/IEC 8859-2:1999
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- ISO-IR 101 Right-Hand Part of Latin Alphabet No.2 (February 1, 1986)
- ISO 8859-2 (Latin 2) Resources