Tuesday, August 23, 2016
Shape Security is proud to announce the release of SuperPack, a language-agnostic schemaless binary data serialisation format.
First of all, what does it mean to be schemaless?
Data serialisation formats like JSON or MessagePack encode values in a way that the structure of those values (schema) can be determined by simply observing the encoded value. These formats, like SuperPack, are said to be “schemaless”.
In contrast, a schema-driven serialisation format such as Protocol Buffers makes use of ahead-of-time knowledge of the schema to pack the encoded values into one exteremely efficent byte sequence free of any schema markers. Schema-driven encodings have some obvious downsides. The schema must remain fixed (ignoring versioning), and if the encoding party is not also the decoding party, the schema must be shared among them and kept in sync.
Choose the right tool for the job. Usually, it is better to choose a schema-driven format if it is both possible and convenient. For other occasions, we have a variety of schemaless encodings.
What separates it from the others?
In short, SuperPack payloads are very compact without losing the ability to represent any type of data you desire.
The major differentiator between SuperPack and JSON or bencode is that it is extensible. Almost everyone has had to deal with JSON and its very limited set of data types. When you try to JSON serialise a JS undefined value, a regular expression, a date, a typed array, or countless other more exotic data types through JSON, your JSON encoder will either give you an error or give you an encoding that will not decode back to the input value. You will never have that problem with SuperPack.
SuperPack doesn’t have a very rich set of built-in data types. Instead, it is extensible. Say we wanted to encode/decode (aka transcode) regular expressions, a data type that is not natively supported by SuperPack. This is all you have to do:
SuperPackTranscoder.extend( // extension point: 0 through 127 0, // detect values which require this custom serialisation x => x instanceof RegExp, // serialiser: return an intermediate value which will be encoded instead r => [r.pattern, r.flags], // deserialiser: from the intermediate value, reconstruct the original value ([pattern, flags]) => RegExp(pattern, flags), );
And if we want to transcode TypedArrays:
SuperPackTranscoder.extend( 1, ArrayBuffer.isView, a => [a[Symbol.toStringTag], a.buffer], ([ctor, buffer]) => new self[ctor](buffer), );
The philosophy behind SuperPack is that, even if you cannot predict your data’s schema in advance, the data likely has structures or values that are repeated many times in a single payload. Also, some values are just very common and should have efficient representations.
Numbers between -15 and 63 (inclusive) are a single byte; so are booleans,
null, undefined, empty arrays, empty maps, and empty strings. Strings which
don’t contain a null (
\0) character can avoid storing their length by using a
C-style null terminator. Boolean-valued arrays and maps use a single bit per
When an encoder sees multiple strings with the same value, it will store them in a lookup table, and each reference will only be an additional two bytes. Note that this string deduplication optimisation could have been taken further to allow deduplication of arbitrary structures, but that would allow encoders to create circular references, which is something we’d like to avoid.
When an encoder sees multiple maps with the same set of keys, it can make an optional optimisation that is reminiscent of the schema-directed encoding approach but with the schema included in the payload. Instead of storing the key names once for each map, it can use what we call a “repeated keyset optimisation” to refer back to the object shape and encode its values as a super-efficient contiguous byte sequence.
The downside of this compactness is that, unlike JSON, YAML, or edn, SuperPack payloads are not human-readable.
After surveying existing data serialisation formats, we knew we could design one that would be better suited to our particular use case. And our use case is not so rare as to make SuperPack only useful to us; it is very much a general purpose serialisation format. If you want to create very small payloads for arbitrary data of an unknown schema in an environment without access to a lossless data compression algorithm, SuperPack is for you. If you want to see a more direct comparison to similar formats, see the comparison table in the specification.
I’m sold. How do I use it?
$ npm install --save superpack