Jump to content

Base32: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Xyb (talk | contribs)
add "5 bits", review intro and rev See also, change "software" position
Line 1: Line 1:
'''Base32''' is one of several base 32 transfer encodings. Base32 uses a 32-character set comprising the twenty-six upper-case letters A–Z, and the digits 2–7.
'''Base32''' is the [[Base (exponentiation)|base]]-32 [[numeral system]]. It uses a set of 32 [[Numerical digit|digits]], that can be encoded into 5 [[bits]] (2<sup>5</sup>). The most common way to represent digits in [[human-readable]] way, is using a standard 32-character set, comprising the twenty-six upper-case letters A–Z, and the digits 2–7; but many other variations are used in different contexts.


Example ([[InterPlanetary File System|IPFS]] CIDv1 in Base32 upper-case encoding): {{Code|code=BAFYBEICZSSCDSBS7FFQZ55ASQDF3SMV6KLCW3GOFSZVWLYARCI47BGF354}}
Base32 is primarily used to encode binary data, but Base32 is also able to encode binary text like ASCII.

Example ([[InterPlanetary File System|IPFS]] CIDv1 in Base32 upper-case encoding):

{{Code|code=BAFYBEICZSSCDSBS7FFQZ55ASQDF3SMV6KLCW3GOFSZVWLYARCI47BGF354}}

== Software ==
Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers.

Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols.

Base32 implementations in C/C++,<ref>http://sourceforge.net/projects/cyoencode/</ref><ref>https://www.gnu.org/software/gnulib/</ref> Perl,<ref>{{cite web|url=https://metacpan.org/release/MIME-Base32|title=MIME-Base32 - Base32 encoder and decoder|accessdate=2018-07-29|website=MetaCPAN}}</ref> Java,<ref>https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base32.html</ref>, JavaScript<ref>https://www.npmjs.com/package/base32</ref> Python<ref>https://docs.python.org/3/library/base64.html</ref>, Go<ref>https://golang.org/pkg/encoding/base32</ref> and Ruby<ref>https://rubygems.org/gems/base32</ref> are available.


== Advantages ==
== Advantages ==
Line 228: Line 217:
Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks.
Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks.
Games known to use such a system include ''[[Mario Is Missing!]]'', ''[[Mario's Time Machine]]'', ''[[Tetris Blast]]'', and [[Middle-earth in video games|''The Lord of the Rings'' (Super NES)]].
Games known to use such a system include ''[[Mario Is Missing!]]'', ''[[Mario's Time Machine]]'', ''[[Tetris Blast]]'', and [[Middle-earth in video games|''The Lord of the Rings'' (Super NES)]].


== Software ==
Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers.

Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols.

Base32 implementations in C/C++,<ref>http://sourceforge.net/projects/cyoencode/</ref><ref>https://www.gnu.org/software/gnulib/</ref> Perl,<ref>{{cite web|url=https://metacpan.org/release/MIME-Base32|title=MIME-Base32 - Base32 encoder and decoder|accessdate=2018-07-29|website=MetaCPAN}}</ref> Java,<ref>https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base32.html</ref>, JavaScript<ref>https://www.npmjs.com/package/base32</ref> Python<ref>https://docs.python.org/3/library/base64.html</ref>, Go<ref>https://golang.org/pkg/encoding/base32</ref> and Ruby<ref>https://rubygems.org/gems/base32</ref> are available.


==See also==
==See also==
{|
* [[Ascii85]] (also called Base85)
|"Powers of 2" related bases:
* [[Base4]]
* [[Base16]]
* [[Base64]]
* [[Base64]]
|&nbsp;&nbsp;&nbsp;&nbsp;
|Other bases:
* [[Base36]]
* [[Base58]]
* [[Base58]]
* [[Ascii85]] (also called Base85)
*[[Base36]]
|&nbsp;&nbsp;&nbsp;&nbsp;
* [[Base16]]
|Applications of base32:
* [[Binary-to-text encoding]] for a comparison of various encoding algorithms
* [[Binary-to-text encoding]] for a comparison of various encoding algorithms
* [[Geohash]]
* [[Geohash]]
|}


==References==
==References==

Revision as of 23:26, 8 July 2020

Base32 is the base-32 numeral system. It uses a set of 32 digits, that can be encoded into 5 bits (25). The most common way to represent digits in human-readable way, is using a standard 32-character set, comprising the twenty-six upper-case letters A–Z, and the digits 2–7; but many other variations are used in different contexts.

Example (IPFS CIDv1 in Base32 upper-case encoding): BAFYBEICZSSCDSBS7FFQZ55ASQDF3SMV6KLCW3GOFSZVWLYARCI47BGF354

Advantages

Base32 has a number of advantages over Base64:

  1. The resulting character set is all one case, which can often be beneficial when using a case-insensitive filesystem, DNS names, spoken language, or human memory.
  2. The result can be used as a file name because it cannot possibly contain the '/' symbol, which is the Unix path separator.
  3. The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the RFC 4648 symbol set omits the digits for one, eight and zero, since they could be confused with the letters 'I', 'B', and 'O'.)
  4. A result excluding padding can be included in a URL without encoding any characters.

Base32 also has advantages over hexadecimal/Base16:

  1. Base32 representation takes roughly 20% less space. (1000 bits takes 200 characters, compared with 250 for Base16)

Disadvantages

Base32 representation takes roughly 20% more space than Base64. Also, because it encodes 5 bytes to 8 characters (rather than 3 bytes to 4 characters), padding to an 8-character boundary is a greater burden on short messages.

Length of Base64 and Base32 notations as percentage of binary data
Base64 Base32
8-bit 133% 160%
7-bit 117% 140%

RFC 4648 Base32 alphabet

The most widely used Base32 alphabet is defined in RFC 4648. It uses an alphabet of AZ, followed by 27. 0 and 1 are skipped due to their similarity with the letters O and I (thus "2" actually has a decimal value of 26).

In some circumstances padding is not required or used (the padding can be inferred from the length of the string modulo 8). RFC 4648 states that padding must be used unless the specification of the standard referring to the RFC explicitly states otherwise. Excluding padding is useful when using base32 encoded data in URL tokens or file names where the padding character could pose a problem.

The RFC 4648 Base 32 alphabet
Value Symbol Value Symbol Value Symbol Value Symbol
0 A 8 I 16 Q 24 Y
1 B 9 J 17 R 25 Z
2 C 10 K 18 S 26 2
3 D 11 L 19 T 27 3
4 E 12 M 20 U 28 4
5 F 13 N 21 V 29 5
6 G 14 O 22 W 30 6
7 H 15 P 23 X 31 7
padding =

Alternative versions

Changing the Base32 alphabet, all alternative standards have similar combinations of alphanumeric symbols.

z-base-32

z-base-32[1] is a Base32 encoding designed to be easier for human use and more compact. It includes 1, 8 and 9 but excludes l, v and 2. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8, and omits trailing padding characters. z-base-32 was used in the Mnet open source project, and is currently used in Phil Zimmermann's ZRTP protocol, and in the Tahoe-LAFS open source project.

z-base-32 alphabet
Value Symbol Value Symbol Value Symbol Value Symbol
0 y 8 e 16 o 24 a
1 b 9 j 17 t 25 3
2 n 10 k 18 1 26 4
3 d 11 m 19 u 27 5
4 r 12 c 20 w 28 h
5 f 13 p 21 i 29 7
6 g 14 q 22 s 30 6
7 8 15 x 23 z 31 9

Crockford's Base32

Another alternative design for Base32 is created by Douglas Crockford, who proposes using additional characters for a checksum.[2] It excludes the letters I, L, and O to avoid confusion with digits. It also excludes the letter U to reduce the likelihood of accidental obscenity.

Libraries to encode binary data in Crockford's Base32 are available in a variety of languages.

Crockford's Base32 alphabet
Value Encode Digit Decode Digit Value Encode Digit Decode Digit
0 0 0 o O 16 G g G
1 1 1 i I l L 17 H h H
2 2 2 18 J j J
3 3 3 19 K k K
4 4 4 20 M m M
5 5 5 21 N n N
6 6 6 22 P p P
7 7 7 23 Q q Q
8 8 8 24 R r R
9 9 9 25 S s S
10 A a A 26 T t T
11 B b B 27 V v V
12 C c C 28 W w W
13 D d D 29 X x X
14 E e E 30 Y y Y
15 F f F 31 Z z Z

Electrologica

An earlier form of base 32 notation was used by programmers working on the Electrologica X1 to represent machine addresses. The "digits" were represented as decimal numbers from 0 to 31. For example, 12-16 would represent the machine address 400 (= 12*32 + 16).

base32hex

Triacontakaidecimal[citation needed] is another alternative design for Base 32, which extends hexadecimal in a more natural way and was first proposed by Christian Lanctot, a programmer working at Sage software, in a letter to Dr. Dobb's magazine in March 1999[3] as a proposed solution for solving the Y2K bug and referred to as "Double Hex". This version was described in RFC 2938 under the name "Base-32". RFC 4648, while acknowledging existing use of this version in NSEC3, refers to it as base32hex and discourages labelling it as "base32".

Similarly to hexadecimal, the digits used are 0-9 followed by consecutive letters of the alphabet. This matches the digits used by the JavaScript parseInt() function[4] and the Python int() constructor[5] when a base larger than 10 (such as 16 or 32) is specified. It also retains hexadecimal's property of preserving bitwise sort order of the represented data, unlike RFC 4648's base-32 or base-64.[6]

Unlike many other base 32 notation systems, triacontakaidecimal is contiguous and includes characters that may visually conflict. With the right font it is possible to visually distinguish between 0, O and 1, I. Other fonts are unsuitable because the context that English usually provides is not provided by a notation system that is expressing numbers. However, the choice of font is not controlled by notation or encoding which is why it's risky to assume a distinguishable font will be used.

The "Extended Hex" Base 32 Alphabet
Value Symbol Value Symbol Value Symbol Value Symbol
0 0 9 9 18 I 27 R
1 1 10 A 19 J 28 S
2 2 11 B 20 K 29 T
3 3 12 C 21 L 30 U
4 4 13 D 22 M 31 V
5 5 14 E 23 N
6 6 15 F 24 O
7 7 16 G 25 P
8 8 17 H 26 Q pad =

Geohash

See Geohash algorithm, used to represent latitude and longitude values in one (bit-interlaced) positive integer.[7] The base32 representation of Geohash use all decimal digits (0-9) and almost lower case alphabet, except letters "a", "i", "l", "o", as showed by the following character map:

Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Base 32 0 1 2 3 4 5 6 7 8 9 b c d e f g
 
Decimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Base 32 h j k m n p q r s t u v w x y z

Video games

Before NVRAM became universal, several video games for Nintendo platforms used base 32 numbers for passwords. These systems omit vowels to prevent the game from accidentally giving a profane password. Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks. Games known to use such a system include Mario Is Missing!, Mario's Time Machine, Tetris Blast, and The Lord of the Rings (Super NES).


Software

Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers.

Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols.

Base32 implementations in C/C++,[8][9] Perl,[10] Java,[11], JavaScript[12] Python[13], Go[14] and Ruby[15] are available.

See also

"Powers of 2" related bases:      Other bases:      Applications of base32:

References

  • RFC 4648