ISO 8859, more formally ISO/IEC 8859, is a set of currently 15 related ISO standards for 8-bit character encodings for use by computers. These standards are based upon ASCII, the most widely used 7-bit character encoding.
Introduction
While the 128 ASCII characters are sufficient to exchange information in modern English without preventing comprehension, most other languages that use the Roman alphabet need additional symbols not covered by ASCII, such as ß (German), å (Swedish and other Nordic languages), etc. ISO 8859 sought to remedy this problem by extending 7-bit ASCII to eight bits, allowing positions for another 128 characters. However, more characters were needed to achieve this than could fit in a single 8-bit character encoding, so several were developed (until now ten alone for Latin). All the encodings, however, encode the first 128 positions (from 0 to 127) in the same way as each other and the same way as ASCII. Positions 128 to 159 contain control characters. The upper 96 code points of each ISO 8859 encoding differ.
Characters
The ISO 8859 standards are designed for reliable information exchange, not typography. As a result, the standards omit symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO 8859 standards, or use Unicode instead.
As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages. French didn't get its œ and Œ ligatures because French speakers had not previously needed them enough to demand them on their keyboards, and French did not get the Ÿ either, because this character is only used in French in all caps text. These oversights were finally fixed with ISO 8859-15, which also introduced the new Euro character. Likewise Dutch did not get the 'ij' and 'IJ' letters, because Dutch speakers had gotten used to typing 'ij' and 'IJ' instead. Romanian did not initially get its 'Ș/ș' and 'Ț/ț' letters, because they were falsely identified with 'Ş/ş' and 'Ţ/ţ'. The latter oversight was fixed with ISO 8859-16.
Most of the ISO 8859 encodings provide diacritic marks required for various European languages. Others provide non-Roman alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it is using Latin based characters, Vietnamese does not fit into 96 positions either; Japanese syllabric Kana scripts on the other hand might, but like several other alphapets of the world isn't encoded.
Character Sets
The encodings defined by ISO-8859 include:
- ISO 8859-1 (Latin-1 or Western European) — perhaps the most widely used ISO 8859 standard, covering most Western European languages: Albanian, Basque, Catalan, Danish, Dutch (partial¹), English, Faeroese, Finnish (partial²), French (partial²), German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish, Spanish, Kurdish, and Swedish, as well as the African languages Afrikaans and Swahili. The missing Euro symbol and capital Ÿ are in the revised version ISO 8859-15.
- ISO 8859-2 (Latin-2 or Central European) — supports those Central and Eastern European languages that use a Roman alphabet, including Polish, Czech, Slovak, Slovenian, and Hungarian. The missing Euro symbol can be found in version ISO 8859-16.
- ISO 8859-3 (Latin-3 or South European) — Turkish, Maltese, and Esperanto; largely superseded by ISO 8859-9 for Turkish and Unicode for Esperanto.
- ISO 8859-4 (Latin-4 or North European) — Estonian, Latvian, Lithuanian, Greenlandic, and Saami.
- ISO 8859-5 (Cyrillic) — Covers most East European languages that use a Cyrillic alphabet, including Russian, Ukrainian, and Belarusian.
- ISO 8859-6 (Arabic) — Covers the most common Arabic glyphs, although not nearly all of them.
- ISO 8859-7 (Greek) — Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography.
- ISO 8859-8 (Hebrew) — Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical and visual.
- ISO 8859-9 (Latin-5 or Turkish) — Largely the same as ISO 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
- ISO 8859-10 (Latin-6 or Nordic) — a rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.
- ISO 8859-11 (Thai) — Contains most glyphs needed for the Thai language.
- ISO 8859-12 — was supposed to cover Celtic, but this draft was rejected. Numbering continued with -13.
- ISO 8859-13 (Latin-7 or Baltic Rim) — Added some glyphs for Baltic languages which were missing from Latin-4 and Latin-6.
- ISO 8859-14 (Latin-8 or Celtic) — Mostly a rearrangement of the ISO-8859-12 draft. Covers Celtic languages like Gaelic and the Breton language.
- ISO 8859-15 (Latin-9) — a revision of 8859-1 that removes some little-used symbols, replacing them with the Euro symbol € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French and Finnish.
- ISO 8859-16 (Latin-10 or South-Eastern European) — Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovenian, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The currency sign is replaced with the Euro symbol.
¹: only the IJ/ij (Dutch Y) is missing, which can be represented as IJ
²: missing characters are in ISO 8859-15
The ISO 8859 standards are designed to allow many language combinations within one charset, but even among the Latin script there are a lot of combinations, that are not possible without transcriptions. While writing English is possible with ASCII alone and thus in all ISO 8859 sets, efforts were made to make conversions as smooth as possible, e.g. German has all its seven special chars at the same positions in all latin variants (1-4, 9+10, 13-16) and on many positions the characters only differ in the diacritics between the sets.
As of 2004 no new drafts exist, as it seems 8-bit character encodings are finally being replaced by Unicode. ISO 8859-17 is therefore meaningless in character encoding context.
Alternatives
An alternative character set standard called Unicode was developed to unify coverage of the other character sets. It supports over a million code points (ISO 8859: 191) by using several character encodings of 8-bit, 16-bit, or variable-length words. Because Unicode does away with the limitations of 8-bit character encodings, it is often preferred for new applications. However, ISO 8859 has the advantage of being well-established, and simpler software is needed to manipulate it: the equation of one byte to one character holds, there are no combining characters or variant forms, and fonts remain conveniently small. Unicode's first 256 characters are identical to ISO-8859-1 (Latin 1). Most modern software uses Unicode internally, and maps the older 8-bit encodings like ISO 8859 to Unicode using conversion tables.
Table
Binary | Oct | Dec | Hex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 13 | 14 | 15 | 16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10100000 | 240 | 160 | A0 | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP | NBSP |
10100001 | 241 | 161 | A1 | ¡ | Ą | Ħ | Ą | Ё | ʽ | ¡ | Ą | ก | " | Ḃ | ¡ | Ą | ||
10100010 | 242 | 162 | A2 | ¢ | ˘ | ˘ | ĸ | Ђ | ʼ | ¢ | ¢ | Ē | ข | ¢ | ḃ | ¢ | ą | |
10100011 | 243 | 163 | A3 | £ | Ł | £ | Ŗ | Ѓ | £ | £ | £ | Ģ | ฃ | £ | £ | £ | Ł | |
10100100 | 244 | 164 | A4 | ¤ | ¤ | ¤ | ¤ | Є | ¤ | ¤ | ¤ | Ī | ค | ¤ | Ċ | € | € | |
10100101 | 245 | 165 | A5 | ¥ | Ľ | Ĩ | Ѕ | ¥ | ¥ | Ĩ | ฅ | " | ċ | ¥ | " | |||
10100110 | 246 | 166 | A6 | ¦ | Ś | Ĥ | Ļ | І | ¦ | ¦ | ¦ | Ķ | ฆ | ¦ | Ḋ | Š | Š | |
10100111 | 247 | 167 | A7 | § | § | § | § | Ї | § | § | § | § | ง | § | § | § | § | |
10101000 | 250 | 168 | A8 | ¨ | ¨ | ¨ | ¨ | Ј | ¨ | ¨ | ¨ | Ļ | จ | Ø | Ẁ | š | š | |
10101001 | 251 | 169 | A9 | © | Š | İ | Š | Љ | © | © | © | Đ | ฉ | © | © | © | © | |
10101010 | 252 | 170 | AA | ª | Ş | Ş | Ē | Њ | × | ª | Š | ช | Ŗ | Ẃ | ª | Ș | ||
10101011 | 253 | 171 | AB | « | Ť | Ğ | Ģ | Ћ | « | « | « | Ŧ | ซ | « | ḋ | « | « | |
10101100 | 254 | 172 | AC | ¬ | Ź | Ĵ | Ŧ | Ќ | ، | ¬ | ¬ | ¬ | Ž | ฌ | ¬ | Ỳ | ¬ | Ź |
10101101 | 255 | 173 | AD | | | | | | | | | | | ญ | | | | |
10101110 | 256 | 174 | AE | ® | Ž | Ž | Ў | ® | ® | Ū | ฎ | ® | ® | ® | ź | |||
10101111 | 257 | 175 | AF | ¯ | Ż | Ż | ¯ | Џ | ― | ‾ | ¯ | Ŋ | ฏ | Æ | Ÿ | ¯ | Ż | |
10110000 | 260 | 176 | B0 | ° | ° | ° | ° | А | ° | ° | ° | ° | ฐ | ° | Ḟ | ° | ° | |
10110001 | 261 | 177 | B1 | ± | ą | ħ | ą | Б | ± | ± | ± | ą | ฑ | ± | ḟ | ± | ± | |
10110010 | 262 | 178 | B2 | ² | ˛ | ² | ˛ | В | ² | ² | ² | ē | ฒ | ² | Ġ | ² | Č | |
10110011 | 263 | 179 | B3 | ³ | ł | ³ | ŗ | Г | ³ | ³ | ³ | ģ | ณ | ³ | ġ | ³ | ł | |
10110100 | 264 | 180 | B4 | ´ | ´ | ´ | ´ | Д | ΄ | ´ | ´ | ī | ด | " | Ṁ | Ž | Ž | |
10110101 | 265 | 181 | B5 | µ | ľ | µ | ĩ | Е | ΅ | µ | µ | ĩ | ต | µ | ṁ | µ | " | |
10110110 | 266 | 182 | B6 | ¶ | ś | ĥ | ļ | Ж | Ά | ¶ | ¶ | ķ | ถ | ¶ | ¶ | ¶ | ¶ | |
10110111 | 267 | 183 | B7 | · | ˇ | · | ˇ | З | · | · | · | · | ท | · | Ṗ | · | · | |
10111000 | 270 | 184 | B8 | ¸ | ¸ | ¸ | ¸ | И | Έ | ¸ | ¸ | ļ | ธ | ø | ẁ | ž | ž | |
10111001 | 271 | 185 | B9 | ¹ | š | ı | š | Й | Ή | ¹ | ¹ | đ | น | ¹ | ṗ | ¹ | č | |
10111010 | 272 | 186 | BA | º | ş | ş | ē | К | Ί | ÷ | º | š | บ | ŗ | ẃ | º | ș | |
10111011 | 273 | 187 | BB | » | ť | ğ | ģ | Л | ؛ | » | » | » | ŧ | ป | » | Ṡ | » | » |
10111100 | 274 | 188 | BC | ¼ | ź | ĵ | ŧ | М | Ό | ¼ | ¼ | ž | ผ | ¼ | ỳ | Œ | Œ | |
10111101 | 275 | 189 | BD | ½ | ˝ | ½ | Ŋ | Н | ½ | ½ | ½ | ― | ฝ | ½ | Ẅ | œ | œ | |
10111110 | 276 | 190 | BE | ¾ | ž | ž | О | Ύ | ¾ | ¾ | ū | พ | ¾ | ẅ | Ÿ | Ÿ | ||
10111111 | 277 | 191 | BF | ¿ | ż | ż | ŋ | П | ؟ | Ώ | ¿ | ŋ | ฟ | æ | ṡ | ¿ | ż | |
11000000 | 300 | 192 | C0 | À | Ŕ | À | Ā | Р | ΐ | À | Ā | ภ | Ą | À | À | À | ||
11000001 | 301 | 193 | C1 | Á | Á | Á | Á | С | ء | Α | Á | Á | ม | Į | Á | Á | Á | |
11000010 | 302 | 194 | C2 | Â | Â | Â | Â | Т | آ | Β | Â | Â | ย | Ā | Â | Â | Â | |
11000011 | 303 | 195 | C3 | Ã | Ă | Ã | У | أ | Γ | Ã | Ã | ร | Ć | Ã | Ã | Ă | ||
11000100 | 304 | 196 | C4 | Ä | Ä | Ä | Ä | Ф | ؤ | Δ | Ä | Ä | ฤ | Ä | Ä | Ä | Ä | |
11000101 | 305 | 197 | C5 | Å | Ĺ | Ċ | Å | Х | إ | Ε | Å | Å | ล | Å | Å | Å | Ć | |
11000110 | 306 | 198 | C6 | Æ | Ć | Ĉ | Æ | Ц | ئ | Ζ | Æ | Æ | ฦ | Ę | Æ | Æ | Æ | |
11000111 | 307 | 199 | C7 | Ç | Ç | Ç | Į | Ч | ا | Η | Ç | Į | ว | Ē | Ç | Ç | Ç | |
11001000 | 310 | 200 | C8 | È | Č | È | Č | Ш | ب | Θ | È | Č | ศ | Č | È | È | È | |
11001001 | 311 | 201 | C9 | É | É | É | É | Щ | ة | Ι | É | É | ษ | É | É | É | É | |
11001010 | 312 | 202 | CA | Ê | Ę | Ê | Ę | Ъ | ت | Κ | Ê | Ę | ส | Ź | Ê | Ê | Ê | |
11001011 | 313 | 203 | CB | Ë | Ë | Ë | Ë | Ы | ث | Λ | Ë | Ë | ห | Ė | Ë | Ë | Ë | |
11001100 | 314 | 204 | CC | Ì | Ě | Ì | Ė | Ь | ج | Μ | Ì | Ė | ฬ | Ģ | Ì | Ì | Ì | |
11001101 | 315 | 205 | CD | Í | Í | Í | Í | Э | ح | Ν | Í | Í | อ | Ķ | Í | Í | Í | |
11001110 | 316 | 206 | CE | Î | Î | Î | Î | Ю | خ | Ξ | Î | Î | ฮ | Ī | Î | Î | Î | |
11001111 | 317 | 207 | CF | Ï | Ď | Ï | Ī | Я | د | Ο | Ï | Ï | ฯ | Ļ | Ï | Ï | Ï | |
11010000 | 320 | 208 | D0 | Ð | Đ | Đ | а | ذ | Π | Ğ | Ð | ะ | Š | Ŵ | Ð | Ð | ||
11010001 | 321 | 209 | D1 | Ñ | Ń | Ñ | Ņ | б | ر | Ρ | Ñ | Ņ | ั | Ń | Ñ | Ñ | Ń | |
11010010 | 322 | 210 | D2 | Ò | Ň | Ò | Ō | в | ز | Ò | Ō | า | Ņ | Ò | Ò | Ò | ||
11010011 | 323 | 211 | D3 | Ó | Ó | Ó | Ķ | г | س | Σ | Ó | Ó | ำ | Ó | Ó | Ó | Ó | |
11010100 | 324 | 212 | D4 | Ô | Ô | Ô | Ô | д | ش | Τ | Ô | Ô | ิ | Ō | Ô | Ô | Ô | |
11010101 | 325 | 213 | D5 | Õ | Ő | Ġ | Õ | е | ص | Υ | Õ | Õ | ี | Õ | Õ | Õ | Ő | |
11010110 | 326 | 214 | D6 | Ö | Ö | Ö | Ö | ж | ض | Φ | Ö | Ö | ึ | Ö | Ö | Ö | Ö | |
11010111 | 327 | 215 | D7 | × | × | × | × | з | ط | Χ | × | Ũ | ื | × | Ṫ | × | Ś | |
11011000 | 330 | 216 | D8 | Ø | Ř | Ĝ | Ø | и | ظ | Ψ | Ø | Ø | ุ | Ų | Ø | Ø | Ű | |
11011001 | 331 | 217 | D9 | Ù | Ů | Ù | Ų | й | ع | Ω | Ù | Ų | ู | Ł | Ù | Ù | Ù | |
11011010 | 332 | 218 | DA | Ú | Ú | Ú | Ú | к | غ | Ϊ | Ú | Ú | ฺ | Ś | Ú | Ú | Ú | |
11011011 | 333 | 219 | DB | Û | Ű | Û | Û | л | Ϋ | Û | Û | Ū | Û | Û | Û | |||
11011100 | 334 | 220 | DC | Ü | Ü | Ü | Ü | м | ά | Ü | Ü | Ü | Ü | Ü | Ü | |||
11011101 | 335 | 221 | DD | Ý | Ý | Ŭ | Ũ | н | έ | İ | Ý | Ż | Ý | Ý | Ę | |||
11011110 | 336 | 222 | DE | Þ | Ţ | Ŝ | Ū | о | ή | Ş | Þ | Ž | Ŷ | Þ | Ț | |||
11011111 | 337 | 223 | DF | ß | ß | ß | ß | п | ί | ‗ | ß | ß | ฿ | ß | ß | ß | ß | |
11100000 | 340 | 224 | E0 | à | ŕ | à | ā | р | ـ | ΰ | א | à | ā | เ | ą | à | à | à |
11100001 | 341 | 225 | E1 | á | á | á | á | с | ف | α | ב | á | á | แ | į | á | á | á |
11100010 | 342 | 226 | E2 | â | â | â | â | т | ق | β | ג | â | â | โ | ā | â | â | â |
11100011 | 343 | 227 | E3 | ã | ă | ã | у | ك | γ | ד | ã | ã | ใ | ć | ã | ã | ă | |
11100100 | 344 | 228 | E4 | ä | ä | ä | ä | ф | ل | δ | ה | ä | ä | ไ | ä | ä | ä | ä |
11100101 | 345 | 229 | E5 | å | ĺ | ċ | å | х | م | ε | ו | å | å | ๅ | å | å | å | ć |
11100110 | 346 | 230 | E6 | æ | ć | ĉ | æ | ц | ن | ζ | ז | æ | æ | ๆ | ę | æ | æ | æ |
11100111 | 347 | 231 | E7 | ç | ç | ç | į | ч | ه | η | ח | ç | į | ็ | ē | ç | ç | ç |
11101000 | 350 | 232 | E8 | è | č | è | č | ш | و | θ | ט | è | č | ่ | č | è | è | è |
11101001 | 351 | 233 | E9 | é | é | é | é | щ | ى | ι | י | é | é | ้ | é | é | é | é |
11101010 | 352 | 234 | EA | ê | ę | ê | ę | ъ | ي | κ | ך | ê | ę | ๊ | ź | ê | ê | ê |
11101011 | 353 | 235 | EB | ë | ë | ë | ë | ы | ً | λ | כ | ë | ë | ๋ | ė | ë | ë | ë |
11101100 | 354 | 236 | EC | ì | ě | ì | ė | ь | ٌ | μ | ל | ì | ė | ์ | ģ | ì | ì | ì |
11101101 | 355 | 237 | ED | í | í | í | í | э | ٍ | ν | ם | í | í | ํ | ķ | í | í | í |
11101110 | 356 | 238 | EE | î | î | î | î | ю | َ | ξ | מ | î | î | ๎ | ī | î | î | î |
11101111 | 357 | 239 | EF | ï | ď | ï | ī | я | ُ | ο | ן | ï | ï | ๏ | ļ | ï | ï | ï |
11110000 | 360 | 240 | F0 | ð | đ | đ | № | ِ | π | נ | ğ | ð | ๐ | š | ŵ | ð | đ | |
11110001 | 361 | 241 | F1 | ñ | ń | ñ | ņ | ё | ّ | ρ | ס | ñ | ņ | ๑ | ń | ñ | ñ | ń |
11110010 | 362 | 242 | F2 | ò | ň | ò | ō | ђ | ْ | ς | ע | ò | ō | ๒ | ņ | ò | ò | ò |
11110011 | 363 | 243 | F3 | ó | ó | ó | ķ | ѓ | σ | ף | ó | ó | ๓ | ó | ó | ó | ó | |
11110100 | 364 | 244 | F4 | ô | ô | ô | ô | є | τ | פ | ô | ô | ๔ | ō | ô | ô | ô | |
11110101 | 365 | 245 | F5 | õ | ő | ġ | õ | ѕ | υ | ץ | õ | õ | ๕ | õ | õ | õ | ő | |
11110110 | 366 | 246 | F6 | ö | ö | ö | ö | і | φ | צ | ö | ö | ๖ | ö | ö | ö | ö | |
11110111 | 367 | 247 | F7 | ÷ | ÷ | ÷ | ÷ | ї | χ | ק | ÷ | ũ | ๗ | ÷ | ṫ | ÷ | ś | |
11111000 | 370 | 248 | F8 | ø | ř | ĝ | ø | ј | ψ | ר | ø | ø | ๘ | ų | ø | ø | ű | |
11111001 | 371 | 249 | F9 | ù | ů | ù | ų | љ | ω | ש | ù | ų | ๙ | ł | ù | ù | ù | |
11111010 | 372 | 250 | FA | ú | ú | ú | ú | њ | ϊ | ת | ú | ú | ๚ | ś | ú | ú | ú | |
11111011 | 373 | 251 | FB | û | ű | û | û | ћ | ϋ | û | û | ๛ | ū | û | û | û | ||
11111100 | 374 | 252 | FC | ü | ü | ü | ü | ќ | ό | ü | ü | ü | ü | ü | ü | |||
11111101 | 375 | 253 | FD | ý | ý | ŭ | ũ | § | ύ | ı | ý | ż | ý | ý | ę | |||
11111110 | 376 | 254 | FE | þ | ţ | ŝ | ū | ў | ώ | ş | þ | ž | ŷ | þ | ț | |||
11111111 | 377 | 255 | FF | ÿ | ˙ | ˙ | ˙ | џ | ÿ | ĸ | ’ | ÿ | ÿ | ÿ |
At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used isn't able to display them.
External References
- Descriptions and code charts for most ISO 8859 standards are found in "ISO 8859 Alphabet Soup": http://www.lysator.liu.se/~jmo/czyborra_index.html