DOC# PLONKY SLUG plonky3_small_fast_cheap PRINTED 2026-05-06 03:47 UTC

Plonky3, the small-fast-cheap revolution

Why plonky3 — small fields, FRI commitments, no trusted setup — is the proof system to watch in 2026. The Mersenne31 / BabyBear / Goldilocks landscape, the FRI folding step, and why your laptop is suddenly a viable prover.

FROM
Dax the Dev <[email protected]>
SOURCE
https://blog.skill-issue.dev/blog/plonky3_small_fast_cheap/
FILED
2026-05-02 17:00 UTC
REVISED
2026-05-02 17:00 UTC
TIME
9 min read
SERIES
ZK SNARKs in production
TAGS
#plonky3 #fri #stark #mersenne31 #babybear #goldilocks #zk #phd

For a decade the dominant question in proof-system engineering was which curve. BN254 because Ethereum verifies it cheaply. BLS12-381 because Zcash and Filecoin standardised on it. The conversation orbited 254-bit and 381-bit pairing-friendly prime fields, and the engineering economy followed: every multiplier, every NTT, every MSM was tuned for those sizes.

Then Polygon Zero shipped plonky2 in 2022, then plonky3 in 2024, and the question changed. The new question is which 31-bit prime. Mersenne31. BabyBear. KoalaBear. Fields small enough that two limbs fit in a single 64-bit word. Fields where AVX-512 SIMD lanes hold sixteen field elements at once. Fields where a consumer laptop is suddenly a viable prover for circuits that used to require a small datacentre.

This is the small-fast-cheap revolution. It is also the most underrated story in production cryptography in 2026, because most of the conversation about it is happening inside Polygon, Succinct, and a handful of zkVM teams, and it hasn’t yet hit the popular “ZK in 2026” articles. This post is my attempt to write the article I keep wishing existed.

The case for small fields

Every proof-system operation eventually reduces to multiply two field elements modulo a prime. The cost of one of those multiplies is essentially:

cost(Fp)=O(log2p/W2)\text{cost}(\mathbb{F}_p) = O(\lceil \log_2 p / W \rceil^2)

where WW is your machine’s word size (typically 64 bits) — i.e., the cost is quadratic in the number of machine words required to hold a field element. For BN254’s 254-bit prime that’s 4 limbs, so 16\sim 16 low-level multiplies per high-level field multiplication. For Mersenne31 — the prime p=2311p = 2^{31} - 1 — that’s one limb, so one low-level multiply. Sixteen times faster on the floor.

The headline cost is fewer cycles per multiply. The hidden cost — and the one that actually shifts the deployment landscape — is SIMD parallelism. AVX2 holds eight 32-bit lanes; AVX-512 holds sixteen. With BN254 you can fit two field elements in an AVX-512 register and parallelism is awkward. With Mersenne31 you fit sixteen, and operations like NTTs become embarrassingly parallel.

There is one cost. Soundness. A 31-bit prime gives you ~31 bits of security per query in a STARK / FRI-based protocol. To get to the standard 100-bit security, you query the FRI oracle multiple times (~100 queries), or you work in a quadratic / quartic / quintic extension field during the protocol’s soundness-critical steps. Plonky3 does both: prover work happens in the base field for speed, and the random-evaluation challenges (where soundness lives) happen in an extension field.

This is the core trick. Big fields where you need security; small fields everywhere else. It buys an order of magnitude in prover time without compromising the threat model.

The four small-field contenders

There are four primes the 2026 ecosystem cares about. They’re all chosen because they admit fast modular reduction (no expensive division per multiply) and they all fit comfortably in a 64-bit word.

FieldPrimeWhy this prime
Mersenne31p=2311p = 2^{31} - 1Mersenne prime — reduction is one shift + one add; smallest sensible prime field
BabyBearp=231227+1p = 2^{31} - 2^{27} + 1NTT-friendly — has a 2-adicity of 27, so domain sizes up to 2272^{27} admit fast FFTs
KoalaBearp=231224+1p = 2^{31} - 2^{24} + 1NTT-friendly — slightly worse 2-adicity (24) but better extension-field arithmetic
Goldilocksp=264232+1p = 2^{64} - 2^{32} + 164-bit prime; used by plonky2 and Risc Zero; fits in one machine word

Plonky3 supports all of them and lets you pick at compile time. The choice changes the constant in front of the prover time and the security analysis but doesn’t change the protocol shape.

In production:

The convergence is striking: every serious 2026 zkVM is on a small field. The big-field era for zkVMs specifically is closing.

FRI — the polynomial commitment behind everything small

The reason small fields work in proof systems at all is FRI (Fast Reed-Solomon Interactive Oracle Proof), introduced in Ben-Sasson, Bentov, Horesh, Riabzev (2018). FRI is a polynomial commitment scheme that works over any field — no pairing-friendliness required, no trusted setup, no SRS. The trade-off is proof size: FRI proofs are tens of kilobytes, where KZG proofs are 600 bytes.

For the prover, FRI is the most expensive thing in the protocol. Most of it is folding: at each round you take a polynomial of degree dd and reduce it to a polynomial of degree d/2d/2 by combining adjacent coefficient pairs. Repeat log2d\log_2 d times and you arrive at a constant-degree polynomial that the verifier can check directly.

The folding step is one line of arithmetic:

f(x2)=f(x)+f(x)2+rf(x)f(x)2xf'(x^2) = \frac{f(x) + f(-x)}{2} + r \cdot \frac{f(x) - f(-x)}{2x}

where rr is a random challenge from the verifier. If ff has degree dd, ff' has degree d/2\lfloor d/2 \rfloor. The verifier checks consistency at a small number of query points drawn at random.

Below is a tiny Sandpack demo that visualises the folding step on a small polynomial — you pick a degree-7 polynomial, the demo folds it to degree-3, then degree-1, then a constant, and shows the coefficients at each step.

FRI folding — visualised on a tiny polynomial [ vanilla-ts ]
run

What’s worth internalising from the demo: each fold is a linear combination over field elements. There’s nothing exotic here. The reason FRI is fast in production is that the inner loop of “combine pairs of coefficients with a random multiplier” is exactly the kind of thing AVX-512 was built for. Sixteen lanes. Per cycle. Per core.

Why “consumer hardware” matters in 2026

Here are wall-clock prover times for a 1-million-cycle zkVM trace, measured across the major 2026 zkVM stacks on a consumer machine — a 2024 MacBook Pro with M3 Max, 14 cores, 48 GB RAM. (Numbers from public benchmarks, normalised to the same reference input.)

StackFieldProver timeNotes
RISC Zero (zkVM)Goldilocks~3 minutesSTARK + AIR
SP1 (zkVM)BabyBear~95 secondsplonky3-based
Stwo (zkVM)Mersenne31~80 secondscircle-STARK on M31
zkSync (Boojum)Goldilocks~5 minutesolder arithmetisation

Two years ago, none of these were under five minutes. Today the leaderboard is a tight band between 80 seconds and 3 minutes, and the difference is dominated by which small field. The big-field equivalent (a pure BN254 PLONK prover at the same trace) would take 30+ minutes on the same machine.

This is what “consumer hardware is now a viable prover” means in 2026. The substantial barrier — the one that kept zkVMs off consumer hardware until 2024 — was the cost of MSMs and NTTs over big fields. Small fields removed that barrier.

The four-prime tradeoff

Four prime choices for proof-system arithmetic in 2026. BN254 is the EVM-verifier endpoint; small fields are where the prover lives.
OptionCostLatencyBlast radiusNotes
BN254 (~254 bits) Pairing-friendly; 4 limbs per element; small SIMD parallelism Slow per-op; required for EVM verification Standard; battle-tested by Ethereum and every Groth16 circuit The default in 2020-2024; still required for EVM verifier outputs
BLS12-381 (~381 bits) Pairing-friendly; 6 limbs per element Slower than BN254 in-circuit; better aggregate signatures Standard; Filecoin / Ethereum consensus signatures Use when you need 128-bit security pairings, not for prover work
Mersenne31 ($2^{31}-1$) Tiny; trivial reduction; 16x SIMD parallelism on AVX-512 ~30x faster per multiply than BN254 Newer; requires extension-field handling for soundness What StarkWare's circle-STARK uses; future-proof choice
Goldilocks ($2^{64}-2^{32}+1$) Single u64 limb; clean reduction via algebraic identity Slower than M31 but more 2-adicity for big NTTs Used by plonky2, Risc Zero, zkSync Boojum; mature The pragmatic 2024-2026 default for STARK-based zkVMs

Why this should change how you think about ZK costs

The dominant ZK cost model from 2018 to 2024 was: more constraints = more dollars. Field arithmetic was the bottleneck, the constants were huge, and a million-constraint circuit was a real research expense.

The 2026 cost model is different. Constraint count still matters, but the constants have collapsed. A million-constraint Plonky3 trace proves on a $1500 laptop in under two minutes. That’s three orders of magnitude cheaper than the equivalent BN254 PLONK prover four years ago. Prover-side cost is no longer the binding constraint for most applications.

The new binding constraints are:

  1. Memory bandwidth. Big NTTs are memory-bound, not compute-bound. The win from small fields is partly that more elements fit in cache.
  2. Verifier complexity in non-EVM environments. Plonky3 proofs are 50–200 KB; verifying them on Ethereum requires either an EVM-friendly final wrap (which is what the SP1 / RISC0 / Stwo verifiers do) or a Solana-style permissive compute budget.
  3. Ecosystem maturity. snarkjs / Halo2-axiom / circomlib have a decade of accreted gadgets; Plonky3 is in year three of its current incarnation. The libraries are catching up but they’re not at parity yet.

Where this leaves zera-sdk

Inside zera-sdk the substrate is BN254 + Groth16 because Solana’s verifier is BN254-and-only-BN254 today. There’s no equivalent of sol_alt_bn128_pairing for any of the small-field protocols. That means Plonky3 is not a choice we get to make for the deposit / transfer / withdraw circuits — the on-chain side fixes the curve.

What we do track is the Solana CPI proposal for STARK verification (no number yet; was last discussed in 2025) and the related “compute-budget-friendly Halo2 verifier” path. The day Solana ships either of those, the prover-side win from migrating off BN254 is large enough to justify a circuit rewrite. Until then, BN254 it is.

For off-chain proving — CI checks, offline auditing, batch verification — Plonky3 is already the right tool, and we’re using it inside the test harness for cross-validating circuit semantics.

What I’d build differently in 2027

Three follow-ups, in order of how much I expect them to matter:

  1. A small-field shielded pool. Every privacy pool today is BN254 + Groth16 + per-circuit ceremony. The day Solana (or any high-throughput L1) ships a STARK verifier, the design space opens: no ceremony, faster proving, smaller wallets. Someone will publish this design before the verifier ships and they’ll be right to.
  2. A unified extension-field abstraction. Plonky3 has different extension-field arithmetic per base field. A single Ext<F, k> with consistent ergonomics would make cross-field experimentation trivial. The team is aware; not yet shipped.
  3. A small-field Poseidon variant. Poseidon-128 is parameterised for BN254. The recommended hash for BabyBear is Monolith or Poseidon2 over BabyBear, and the constraint counts are different enough that constraint-counting intuition from BN254 doesn’t transfer. A “Poseidon constraint cost calculator” that takes a field as input and emits constraint counts for common circuits would close a real reasoning gap.

Further reading

← Back to article