2021-10-31

This article describes the role and sources of randomness in cryptographic protocols

The post is co-authored with Tom.

Background

Randomness is ubiquitous across the cryptography underpinning blockchain infrastructures — in the Merkle trees securing state, in the random-selection mechanism in proof of work, in the selection of committees in proof of stake, and in the zero knowledge proofs underpinning both scaling and privacy.

Hash functions are the most important source of randomness in these protocols.

What’s a Hash Function?

A hash function takes in a ‘message’ (this could be any data — the state of a blockchain, the contents of a transaction, etc) and ‘spits out’ a digest of a fixed length, in some predefined ‘space’ of numbers. This could be all numbers up to $2^{256}$, or it could be elements of a prime field (common in zero knowledge proofs), or even sometimes elliptic curve points, which are numbers of a different kind.

The point is, the ‘magnitude’ of the output given the input should be completely unpredictable, until the hash is actually computed.

Hashes are typically seen in two settings:

As a short ‘unique fingerprint’ on data (typically the ‘state’ of the blockchain, or the contents of a transaction), and

A way of replacing questions that should be supplied by the verifier in certain ‘interactive protocols’. This is a common feature of zero knowledge proofs. The security of interactive protocols depends on a sequence of questions and answers between the ‘verifier’ and the ‘prover’. Hashes can be used to remove the need for a direct back-and-forth dialogue between the prover (usually the spender of money) and the verifier (the blockchain nodes checking all transactions are correct)

Property #1 for Hashes: Randomness

A core property of hash functions is therefore they scramble input data in a seemingly random fashion. When new hash functions are proposed, they must be subjected to rigorous statistical tests to satisfy the cryptography community that they do indeed exhibit their claimed randomness. Any failure to do so would produce features that expose protocols relying on them to potential attack.

This randomness is useful in many contexts.

Pseudorandom Function Families (PRFs) are functions that, similar to hash functions, map a message to an output, but additionally have a key as an input. PRFs can be built from these random-looking hashes, and, for example, can be used to build message authentication codes, to protect message authenticity and integrity. These hashes can also help when converting interactive proofs to non-interactive. If during the interactive proof a verifier sends uniformly random elements to the prover as challenges, the Fiat-Shamir heuristic can use a hash to replace the interactive random challenge with a hash of the proving process messages so far, as this should produce a sufficient random element for this purpose.

Property #2 for Hashes: Collision Resistance

Additionally, we want it to be the following:

Collision-resistant - hard to find two different messages that produce the same result