xascii85
Standard Encoding interface on top of "encoding/ascii85".
Copyright
Copyright (c) 2022 Teal.Finance contributors
This file is part of Teal.Finance/BaseXX licensed under the MIT License.
See the LICENSE file or https://opensource.org/licenses/MIT.
SPDX-License-Identifier: MIT
Ascii85 advantages
The main idea is to encode by chunk of 4 bytes, instead of 3 bytes for Base64.
There are 95 printable ASCII characters including the space.
To represent 4 bytes, 5 printable ASCII characters are required:
95⁵ = 7 737 809 375 <-- Minimum 5 printable ASCII characters
256⁴ = 4 294 967 296
95⁴ = 81 450 625
The minimum set is 85 characters:
86⁵ = 4 704 270 176
85⁵ = 4 437 053 125 <-- Minimum 85 different ASCII characters
256⁴ = 4 294 967 296
84⁵ = 4 182 119 424
83⁵ = 3 939 040 643
Therefore, 85 is the minimal number of different characters,
to encode any sequence of 4 bytes as 5 printable ASCII characters.
Interface
The idea is to provide the same interface as "encoding/base64".
See https://pkg.go.dev/encoding/base64
func NewEncoding(encoder string) *Encoding
interface Encoding {
Decode(dst, src []byte) (n int, err error)
Encode(dst, src []byte) (n int)
// Here Encode() returns the number of written bytes.
// This is different with encoding/base64.
// Ascii85 encoded length cannot be known from just
// the number of bytes to encode, whereas it can with Base64.
DecodeString(s string) ([]byte, error)
EncodeToString(src []byte) string
DecodedLen(n int) int // Returns the Max.
EncodedLen(n int) int // Returns the Max.
// Not implemented.
// Strict() *Encoding
// WithPadding(padding rune) *Encoding
}
Definition in PostScript documentation
Asci85 encodes binary data in an ASCII base-85 representation.
This encoding uses nearly all of the printable ASCII character set.
The resulting expansion factor is 4:5, making this encoding
much more efficient than hexadecimal.
Encoding alphabet
ASCII characters from 0x21 ! through 0x75 u.
Comparison to other encodings
Base95 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ (and space)
Base94 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Base92 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%& ()*+,-./:;<=>?@[ ]^_`{|}~
Base91 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!"#$%& ()*+, ./:;<=>?@[ ]^_`{|}~
Ascii85 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu z!"#$%&'()*+,-./:;<=>?@[\]^_`
Z85 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz! #$%& ()*+ -./: <=>?@[ ]^ { }
Base70 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz + -./ _ ~
Base64 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz + /
Base62 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Base58 123456789ABCDEFGH JKLMN PQRSTUVWXYZabcdefghijk mnopqrstuvwxyz
Hexa 0123456789ABCDEF
Specification in PostScript documentation
The ASCII85Encode filter encodes binary data in the ASCII base-85 encoding.
Generally, for every 4 bytes of binary data, it produces 5 ASCII printing
characters in the range ! through u. It inserts a newline in the encoded
output at least once every 80 characters, thereby limiting the lengths of lines.
When the ASCII85Encode filter is closed, it writes the 2-character sequence ~>
as an EOD marker.
Binary data bytes are encoded in 4-tuples (groups of 4). Each 4-tuple is
used to produce a 5-tuple of ASCII characters. If the binary 4-tuple is
(b1 b2 b3 b4) and the encoded 5-tuple is (c1 c2 c3 c4 c5), then the relation
between them is
(b1 × 256³) + (b2 × 256²) + (b3 × 256¹) + b4 =
(c1 × 85⁴) + (c2 × 85³) + (c3 × 85²) + (c4 × 85¹) + c5
In other words, 4 bytes of binary data are interpreted as a base-256 number
and then converted into a base-85 number. The five “digits” of this number,
(c1 c2 c3 c4 c5), are then converted into ASCII characters by adding 33,
which is the ASCII code for !, to each. ASCII characters in the range ! to u
are used, where ! represents the value 0 and u represents the value 84.
As a special case, if all five digits are 0, they are represented by a
single character z instead of by !!!!!.