Feeds and Messages

Ssb is designed around a data model optimized for simple data replication and strong cryptographic integrity guarantees in a social setting. The central entities in this model are feeds and messages. Messages are the pieces of data inserted into the system, feeds describe who authored a piece of data.

Messages are carry schemaless, free-form pieces of data, together with some metadata necessary for message replication and verification. Each message belongs to exactly one feed, and each message contains a backlink to the previous message posted to that feed. Feeds thus form linked lists. Links are implemented via cryptographically secure hashes, in that sense feeds behave mostly like blockchains. But whereas blockchains are traditionally used to create a global, single source of truth, ssb chooses a different approach.

Instead of one single, global linked list for all data, each ssb user has their own linked list. To ensure that no malicious actor can append data to other user's list, all messages are signed. We call this data structure - a signed hash-based linked list - a sigchain.

The sigchain-per-user design has some very desirable properties:

  • data can be moved across untrusted machines, you still know that some piece of data has indeed be posted by a certain identity
  • immutability of data, nothing gets lost
  • the order between messages can not be confused
  • replicating data becomes simple: just exchange the number of messages you know about for a certain feed, and your peer can send you everything that is newer

Of course, this design also has some drawbacks:

  • the simple replication scheme does not allow subscription to only parts of the data - it's all or nothing
  • immutability and non-repudiation are not appropriate for all use-cases
  • a single identity can not append data from multiple machines in parallel, as that would result in a tree rather than a linked list

In some sense, ssb can be seen as an experiment whether the advantages of a sigchain-based distributed database outweigh the disadvantages. So far, it appears to be working sufficiently well.

The remainder of this chapter describes the exact format of the metadata ssb maintains to build the sigchain. Conceptually, to form the sigchain, the metadata of each message must include:

  • the feed the message belongs to
  • the hash of the previous message from the same feed, or null
  • the free-form content of the message
  • a signature to prove that the author knew the private key of the feed

The actual metadata formats also need to include some extra information.

Metadata

The abstract model for metadata is a tuple containing the following entries:

  • previous: either a multihash, or the distinct value null
  • author: a multikey
  • sequence: an unsigned 53 bit integer
  • timestamp: an IEEE 754 64 bit float except the infinities, negative zero, and NaNs
  • content: a data value that is either:
    • an object containing an entry "type", whose value is a string that takes between 3 and 53 (inclusive) code units when encoded as utf16
    • a multibox
  • swapped: a boolean indicating how to encode the metadata
  • signature: A signature of the message, generated by the cryptographic primitive of the author
    • this can only be computed once all other metadata is known

Json Encoding

Metadata can be encoded as json, just like regular data. The json encoding is a json object containing the entries listed above, with the following additional regulations:

  • the previous multihash must use the message hash encoding as a string
  • the author multikey must use the multikey encoding as a string
  • the sequence is serialized as a float, since that's the only number type available
    • it must not be negative
    • it must not contain a decimal point
  • the signature value is a string whose content is the concatenation of:
    • the canonic base64 encoding of the message's signature itself (see next section)
    • the characters .sig. ([0x2E, 0x73, 0x69, 0x67, 0x2E])
    • a primitive-specific suffix, depending of the primitive of the author
      • for ed25519, this is ed25519 ([0x65, 0x64, 0x32, 0x35, 0x35, 0x31, 0x39])
  • the swapped boolean is omitted
  • an entry "hash": "sha256" is added
  • if swapped, the order of entries must be previous, sequence, author, timestamp, hash, content, signature, else it must be previous, author, sequence, timestamp, hash, content, signature
  • if content is a multibox, it must use the multibox encoding as a string

When creating new messages, the (purely logical) value of swapped should be false. Or you might set it to true, or generate it randomly - nobody can stop you, and everything will still work.

Messages

To this json encoding corresponds exactly one data value with a fixed order of object entries. This value is referred to as a message.

Computing a Message's Signature

To obtain the signature of a message, first compute the json encoding specified above, but without the "signature" entry. Then compute the signing encoding of the corresponding data value, using the entry order of the json as a tie-breaker for the entry-order of the signing encoding. Finally use the signing primitive of the message's author on the signing encoding, to obtain the signature.

Computing the Hash of a Message

To compute the hash of a message, use the value hash computation, with the signing encoding where the object entry order that would produce the correct signature.

Computing the Size of a Message

To compute the size of a message, use the value length computation, with the signing encoding where the object entry order that would produce the correct signature.

Message Validation

A message is considered valid if and only if all of the following conditions are met:

  • its json encoding is a possible output of the message creation algorithm
    • in particular, the claimed signature must match the data and public key
  • its length is smaller than 16385
  • if previous is null, the sequence number of the message must be the float 1, else it must be one larger than that of the message whose hash is the value of previous