Note: These definitions were taken from https://github.com/w3f/polkadot-spec/blob/main/docs. They are provided here for reference. All credits to respective authors.
SCALE
The Polkadot Host uses the "Simple Concatenated Aggregate Little-Endian" (SCALE) codec to encode byte arrays as well as other data structures. SCALE provides a canonical encoding to produce consistent hash values across their implementation, including the Merkle hash proof for the State Storage.
Decoding
refers to the decoding of a blob of data. Since the SCALE codec is not self-describing, it’s up to the decoder to validate whether the blob of data can be deserialized into the given type or data structure.
It’s accepted behavior for the decoder to partially decode the blob of data. This means any additional data that does not fit into a data structure can be ignored.
caution Considering that the decoded data is never larger than the encoded message, this information can serve as a way to validate values that can vary in size, such as sequences. The decoder should strictly use the size of the encoded data as an upper bound when decoding in order to prevent denial of service attacks.
Notation
- Let be the set of all byte sequences.
- Let , denotes the -th byte of , and denotes the -th bit of the -th byte of .
Definitions
Little Endian
By the little-endian representation of a non-negative integer, , represented as
in base 256, we refer to a byte array such that
Accordingly, we define the function :
Scale Types
Fixed Length Integers
The SCALE codec, , for fixed length integers not defined here otherwise, is equal to the little-endian encoding of those values.
Tuple
The SCALE codec for Tuple, , such that:
Where ’s are values of different types, is defined as:
In the case of a tuple (or a structure), the knowledge of the shape of data is not encoded even though it is necessary for decoding. The decoder needs to derive that information from the context where the encoding/decoding is happening.
Varying Data Type
This library does not provide means for encoding/decoding varying data types, but the definitions are provided here for completeness and reference. The implementation is left to the user of the library.
We define a varying data type to be an ordered set of data types.
A value of varying data type is a pair where for some and is its value of type , which can be empty. We define , unless it is explicitly defined as another value in the definition of a particular varying data type.
The SCALE codec for value of varying data type , formally referred to as is defined as follows:
The SCALE codec does not encode the correspondence between the value and the data type it represents; the decoder needs prior knowledge of such correspondence to decode the data.
Boolean
The SCALE codec for a boolean value defined as a byte as follows:
Compact
SCALE Length encoding , also known as a compact encoding, of a non-negative number is defined as follows:
and the rest of the bits of store the value of in little-endian format in base-2 as follows:
such that:
Note that denotes the length of the original integer being encoded and does not include the extra byte describing the length. The encoding can be used for integers up to: $$2^{(63+4)8} -1 = 2^{536} -1$$
Sequence
The SCALE codec for sequence such that:
where ’s are values of the same type (and the decoder is unable to infer value of from the context) is defined as:
where is defined here.
In some cases, the length indicator is omitted if the length of the sequence is fixed and known by the decoder upfront. Such cases are explicitly stated by the definition of the corresponding type.
String
The SCALE codec for a string value is an encoded sequence consisting of UTF-8 encoded bytes.
This can be achieved via encoding the UTF-8 sequence as a
uint8[]orbytes, which is supported by this library.
Option Type
The Option type is a varying data type of which indicates if data of type is available (referred to as some state) or not (referred to as empty, none or null state). The presence of type none, indicated by , implies that the data corresponding to type is not available and contains no additional data. Whereas the presence of type indicated by implies that the data is available.
Result Type
The Result type is a varying data type of which is used to indicate if a certain operation or function was executed successfully (referred to as "ok" state) or not (referred to as "error" state). implies success, implies failure. Both types can either contain additional data or are defined as empty types otherwise.
Dictionaries, Hashtables, Maps
SCALE codec for dictionary or hashtable D with key-value pairs , such that:
is defined as the SCALE codec of as a sequence of key-value pairs (as tuples):
where is encoded the same way as but argument refers to the number of key-value pairs rather than the length.