see part 1
This work is already done.
Succinct has already implemented sha256 in Plonky3 (as an SP1 precompile), so why do this as a learning excersize? Isn’t there something more useful to do as a learning excersize?
Succinct Labs’ sha256 precompile is heavily coupled to SP1 itself. The architecture of SP1 has deeply-baked optimizations built into its proof system. Sha256 is a great example of a computation that leverages cross-table lookups, and lookups from precomputed tables. It is ubiquitous in applications that benefit from ZK tech. Learning how to implement it optimally in a circuit will be useful when we later want to arithmetize it for new proof systems like Nethermind’s LatticeFold, or PSE’s Sonobe- research implementations that seem to favor CCS, a constraint system that doesn’t have any good tooling yet, although it can capture R1CS, Plonkish, or AIR without any overhead; and I wonder if that implies that it can express some computations more efficiently.
This is a mostly pedagogical endeavour.
Plonky3’s keccak-air is simple
In one of the examples, the author (jonathanpwang et al) passes a Vec<[u64; 25]>
into the circuit as its inputs. It’s not yet clear to me how to interpret those u64s (stay tuned for next part when / if I figure it out).
Then, in the trace generation, we can see that xor operations are performed on the values, which leads me to suspect that the circuit is written assuming these columns are bits.
This is also apparent from the code and comments in the constraints.
SP1’s is not as simple
Rather than treating AIR columns as bits, SP1’s AIR columns seem to have a notion of a Word
- column values represent more than one bit of information; they use four columns (31 bit BabyBear field elements) to represent a 32 bit word, like on a 32 bit RISC-V CPU.
/// The size of a word in bytes.
pub const WORD_SIZE: usize = 4;
/// An array of four bytes to represent a 32-bit value.
///
/// We use the generic type `T` to represent the different representations of a byte, ranging from
/// a `u8` to a `AB::Var` or `AB::Expr`.
#[derive(
AlignedBorrow, Clone, Copy, Debug, Default, PartialEq, Eq, Hash, Serialize, Deserialize,
)]
#[repr(C)]
pub struct Word<T>(pub [T; WORD_SIZE]);
In the sha256 precompile, we can see many of the columns are defined as Word
s:
/// A set of columns needed to compute the SHA-256 compression function.
///
/// Each sha compress syscall is processed over 80 columns, split into 10 octets. The first octet is
/// for initialization, the next 8 octets are for compression, and the last octet is for finalize.
/// During init, the columns are initialized with the input values, one word at a time. During each
/// compression cycle, one iteration of sha compress is computed. During finalize, the columns are
/// combined and written back to memory.
#[derive(AlignedBorrow, Default, Debug, Clone, Copy)]
#[repr(C)]
pub struct ShaCompressCols<T> {
/// struct fields ommitted
pub a: Word<T>,
pub b: Word<T>,
pub c: Word<T>,
pub d: Word<T>,
pub e: Word<T>,
pub f: Word<T>,
pub g: Word<T>,
pub h: Word<T>,
/// more struct fields ommitted
}
So… how do you write algebraic constraints for sha256’s many bitwise operations, over Word
values?
Represent bitwise operations on Word
s… with algebra?!?!
At first I thought I was going crazy, and I looked into whether or not it was possible to compute bitwise XOR for 32 bit values using arithmetic operations. Google led me to this StackExchange answer, which momentarily spiked my blood pressure until I realized it was assuming boolean values. Dammit. If there is something I’m missing here, let me know.
Use lookups like a real man
It turns out, this is one of the places where SP1 uses some slightly fancier tech.
The Word
values are constrained with range checks, and binary operations are defined on another AIR. This is achieved via a cross-table lookup.
Lookups are a blind spot in my personal ZK knowledge. My original understanding of lookups was for precomputed values for useful functions like binary operations on u8
s , but AIR whiz kids use lookups to glue multiple starks containing variables together.
There is no cross-table lookup example in Plonky3
I was unable to find a simple pedagogical example of a cross-table lookup in pure plonky3 (outside SP1).
This means my learning project can go in multiple directions
- I implement cross-table lookups in Plonky3
- I implement a simpler sha256 AIR using fields as bits.
As of August 2nd, I am leaning towards exploring option #1, because it might be a more interesting and useful learning project, as well as more state-of-the-art. And it gives me an excuse to finally dive into lookups, which have been one of my knowledge gaps.