Plonky3, the small-fast-cheap revolution

Why plonky3 — small fields, FRI commitments, no trusted setup — is the proof system to watch in 2026. The Mersenne31 / BabyBear / Goldilocks landscape, the FRI folding step, and why your laptop is suddenly a viable prover.

For a decade the dominant question in proof-system engineering was which curve. BN254 because Ethereum verifies it cheaply. BLS12-381 because Zcash and Filecoin standardised on it. The conversation orbited 254-bit and 381-bit pairing-friendly prime fields, and the engineering economy followed: every multiplier, every NTT, every MSM was tuned for those sizes.

Then Polygon Zero shipped plonky2 in 2022, then plonky3 in 2024, and the question changed. The new question is which 31-bit prime. Mersenne31. BabyBear. KoalaBear. Fields small enough that two limbs fit in a single 64-bit word. Fields where AVX-512 SIMD lanes hold sixteen field elements at once. Fields where a consumer laptop is suddenly a viable prover for circuits that used to require a small datacentre.

This is the small-fast-cheap revolution. It is also the most underrated story in production cryptography in 2026, because most of the conversation about it is happening inside Polygon, Succinct, and a handful of zkVM teams, and it hasn’t yet hit the popular “ZK in 2026” articles. This post is my attempt to write the article I keep wishing existed.

The case for small fields

Every proof-system operation eventually reduces to multiply two field elements modulo a prime. The cost of one of those multiplies is essentially:

\text{cost}(\mathbb{F}_p) = O(\lceil \log_2 p / W \rceil^2)

where $W$ is your machine’s word size (typically 64 bits) — i.e., the cost is quadratic in the number of machine words required to hold a field element. For BN254’s 254-bit prime that’s 4 limbs, so $\sim 16$ low-level multiplies per high-level field multiplication. For Mersenne31 — the prime $p = 2^{31} - 1$ — that’s one limb, so one low-level multiply. Sixteen times faster on the floor.

The headline cost is fewer cycles per multiply. The hidden cost — and the one that actually shifts the deployment landscape — is SIMD parallelism. AVX2 holds eight 32-bit lanes; AVX-512 holds sixteen. With BN254 you can fit two field elements in an AVX-512 register and parallelism is awkward. With Mersenne31 you fit sixteen, and operations like NTTs become embarrassingly parallel.

There is one cost. Soundness. A 31-bit prime gives you ~31 bits of security per query in a STARK / FRI-based protocol. To get to the standard 100-bit security, you query the FRI oracle multiple times (~100 queries), or you work in a quadratic / quartic / quintic extension field during the protocol’s soundness-critical steps. Plonky3 does both: prover work happens in the base field for speed, and the random-evaluation challenges (where soundness lives) happen in an extension field.

This is the core trick. Big fields where you need security; small fields everywhere else. It buys an order of magnitude in prover time without compromising the threat model.

The four small-field contenders

There are four primes the 2026 ecosystem cares about. They’re all chosen because they admit fast modular reduction (no expensive division per multiply) and they all fit comfortably in a 64-bit word.

Field	Prime	Why this prime
Mersenne31	$p = 2^{31} - 1$	Mersenne prime — reduction is one shift + one add; smallest sensible prime field
BabyBear	$p = 2^{31} - 2^{27} + 1$	NTT-friendly — has a 2-adicity of 27, so domain sizes up to $2^{27}$ admit fast FFTs
KoalaBear	$p = 2^{31} - 2^{24} + 1$	NTT-friendly — slightly worse 2-adicity (24) but better extension-field arithmetic
Goldilocks	$p = 2^{64} - 2^{32} + 1$	64-bit prime; used by plonky2 and Risc Zero; fits in one machine word

Plonky3 supports all of them and lets you pick at compile time. The choice changes the constant in front of the prover time and the security analysis but doesn’t change the protocol shape.

In production:

plonky2 (the older Polygon Zero proof system, still widely deployed) uses Goldilocks.
plonky3 primarily ships with BabyBear or KoalaBear as the recommended defaults.
Risc Zero’s zkVM uses Goldilocks.
Succinct’s SP1 uses BabyBear.
Stwo / StarkWare’s next-gen uses Mersenne31 (the M31 / circle-stark program).

The convergence is striking: every serious 2026 zkVM is on a small field. The big-field era for zkVMs specifically is closing.

flowchart LR
Z[2014: Pinocchio] --> G[2016: Groth16 - BN254]
G --> P[2019: PLONK + KZG]
P --> H[2020: Halo2 - Pasta IPA]
H --> H2[2024: Halo2 - KZG/BN254]
G --> S[2018: STARK - Goldilocks]
S --> P2[2022: plonky2 - Goldilocks]
P2 --> P3[2024: plonky3 - BabyBear]
P3 --> ZK1[zkVMs: SP1, RISC0, Stwo]
H2 --> EVM[EVM rollups]
classDef big fill:#3a0a0a,stroke:#f87171,color:#fff
classDef small fill:#0a4014,stroke:#4ade80,color:#fff
class G,P,H,H2,EVM big
class S,P2,P3,ZK1 small

FRI — the polynomial commitment behind everything small

The reason small fields work in proof systems at all is FRI (Fast Reed-Solomon Interactive Oracle Proof), introduced in Ben-Sasson, Bentov, Horesh, Riabzev (2018). FRI is a polynomial commitment scheme that works over any field — no pairing-friendliness required, no trusted setup, no SRS. The trade-off is proof size: FRI proofs are tens of kilobytes, where KZG proofs are 600 bytes.

For the prover, FRI is the most expensive thing in the protocol. Most of it is folding: at each round you take a polynomial of degree $d$ and reduce it to a polynomial of degree $d/2$ by combining adjacent coefficient pairs. Repeat $\log_2 d$ times and you arrive at a constant-degree polynomial that the verifier can check directly.

The folding step is one line of arithmetic:

f'(x^2) = \frac{f(x) + f(-x)}{2} + r \cdot \frac{f(x) - f(-x)}{2x}

where $r$ is a random challenge from the verifier. If $f$ has degree $d$ , $f'$ has degree $\lfloor d/2 \rfloor$ . The verifier checks consistency at a small number of query points drawn at random.

Below is a tiny Sandpack demo that visualises the folding step on a small polynomial — you pick a degree-7 polynomial, the demo folds it to degree-3, then degree-1, then a constant, and shows the coefficients at each step.

FRI folding — visualised on a tiny polynomial [ vanilla-ts ]

run

Runnable playground (requires JavaScript)

/index.ts

// FRI folding step, visualised. We work over a small toy prime
// (101) so the numbers stay readable.
//
// Real plonky3 folds polynomials of degree 2^20+ over BabyBear or
// Mersenne31, with thousands of query points per round. The shape
// of the fold below is identical; only the numbers change.

const P = 101n;

// Modular utilities.
function mod(a: bigint, m: bigint): bigint { return ((a % m) + m) % m; }
function add(a: bigint, b: bigint): bigint { return mod(a + b, P); }
function sub(a: bigint, b: bigint): bigint { return mod(a - b, P); }
function mul(a: bigint, b: bigint): bigint { return mod(a * b, P); }
function inv(a: bigint): bigint {
// Fermat's little theorem since P is prime: a^(P-2) = a^-1
let r = 1n, e = P - 2n, b = mod(a, P);
while (e > 0n) { if (e & 1n) r = mul(r, b); b = mul(b, b); e >>= 1n; }
return r;
}

// Evaluate polynomial f at x.
function evalPoly(coeffs: bigint[], x: bigint): bigint {
let acc = 0n;
for (let i = coeffs.length - 1; i >= 0; i--) acc = add(mul(acc, x), coeffs[i]);
return acc;
}

// Split coefficients into even-indexed and odd-indexed parts.
// f(x) = f_even(x^2) + x * f_odd(x^2)
function split(coeffs: bigint[]): [bigint[], bigint[]] {
const even: bigint[] = [];
const odd: bigint[] = [];
for (let i = 0; i < coeffs.length; i++) {
  if (i % 2 === 0) even.push(coeffs[i]);
  else odd.push(coeffs[i]);
}
return [even, odd];
}

// FRI folding: given f(x) and challenge r, return
//   f'(y) = f_even(y) + r * f_odd(y)
// where y = x^2. The new polynomial has half the degree.
function fold(coeffs: bigint[], r: bigint): bigint[] {
const [even, odd] = split(coeffs);
const out: bigint[] = [];
const n = Math.max(even.length, odd.length);
for (let i = 0; i < n; i++) {
  const e = i < even.length ? even[i] : 0n;
  const o = i < odd.length ? odd[i] : 0n;
  out.push(add(e, mul(r, o)));
}
return out;
}

const out = document.getElementById("out")!;
const reroll = document.getElementById("reroll") as HTMLButtonElement;

function fmt(coeffs: bigint[]): string {
return "[ " + coeffs.map((c) => c.toString().padStart(2, " ")).join(", ") + " ]";
}

function run() {
// A degree-7 polynomial over F_101.
const f = [3n, 1n, 4n, 1n, 5n, 9n, 2n, 6n];
let lines = [];
lines.push("FRI folding over F_101");
lines.push("======================");
lines.push("");
lines.push(`degree-7 poly: ${fmt(f)}`);
// Random challenges for each fold.
let curr = f;
let round = 0;
while (curr.length > 1) {
  const r = BigInt(Math.floor(Math.random() * 100) + 1);
  const folded = fold(curr, r);
  lines.push("");
  lines.push(`round ${round + 1}: r = ${r}`);
  lines.push(`  before: ${fmt(curr)}  (degree ${curr.length - 1})`);
  lines.push(`  after:  ${fmt(folded)}  (degree ${folded.length - 1})`);
  curr = folded;
  round++;
}
lines.push("");
lines.push(`final constant:  ${curr[0]}`);
lines.push("");
lines.push("verifier checks consistency between rounds at randomly chosen");
lines.push("evaluation points — those are the FRI query points.");
out.textContent = lines.join("\n");
}

reroll.addEventListener("click", run);
run();

/index.html

<!DOCTYPE html>
<html>
<body style="margin:0;padding:1rem;background:#000;color:#e8e8e8;font-family:'Geist Mono',ui-monospace,monospace;">
  <button id="reroll" style="padding:0.5rem 0.85rem;background:#0a0a0a;color:#4ade80;border:1px solid #2a2a2a;border-radius:4px;font-family:inherit;cursor:pointer;margin-bottom:0.75rem;">re-roll random challenges</button>
  <pre id="out" style="background:#0a0a0a;color:#4ade80;padding:0.75rem;border:1px solid #2a2a2a;border-radius:4px;margin:0;white-space:pre;overflow-x:auto;">starting...</pre>
  <script type="module" src="/index.ts"></script>
</body>
</html>

Open the vanilla-ts template on codesandbox.io

What’s worth internalising from the demo: each fold is a linear combination over field elements. There’s nothing exotic here. The reason FRI is fast in production is that the inner loop of “combine pairs of coefficients with a random multiplier” is exactly the kind of thing AVX-512 was built for. Sixteen lanes. Per cycle. Per core.

Why “consumer hardware” matters in 2026

Here are wall-clock prover times for a 1-million-cycle zkVM trace, measured across the major 2026 zkVM stacks on a consumer machine — a 2024 MacBook Pro with M3 Max, 14 cores, 48 GB RAM. (Numbers from public benchmarks, normalised to the same reference input.)

Stack	Field	Prover time	Notes
RISC Zero (zkVM)	Goldilocks	~3 minutes	STARK + AIR
SP1 (zkVM)	BabyBear	~95 seconds	plonky3-based
Stwo (zkVM)	Mersenne31	~80 seconds	circle-STARK on M31
zkSync (Boojum)	Goldilocks	~5 minutes	older arithmetisation

Two years ago, none of these were under five minutes. Today the leaderboard is a tight band between 80 seconds and 3 minutes, and the difference is dominated by which small field. The big-field equivalent (a pure BN254 PLONK prover at the same trace) would take 30+ minutes on the same machine.

This is what “consumer hardware is now a viable prover” means in 2026. The substantial barrier — the one that kept zkVMs off consumer hardware until 2024 — was the cost of MSMs and NTTs over big fields. Small fields removed that barrier.

The four-prime tradeoff

Four prime choices for proof-system arithmetic in 2026. BN254 is the EVM-verifier endpoint; small fields are where the prover lives.

Option	Cost	Latency	Blast radius	Notes
BN254 (~254 bits)	Pairing-friendly; 4 limbs per element; small SIMD parallelism	Slow per-op; required for EVM verification	Standard; battle-tested by Ethereum and every Groth16 circuit	The default in 2020-2024; still required for EVM verifier outputs
BLS12-381 (~381 bits)	Pairing-friendly; 6 limbs per element	Slower than BN254 in-circuit; better aggregate signatures	Standard; Filecoin / Ethereum consensus signatures	Use when you need 128-bit security pairings, not for prover work
Mersenne31 ($2^{31}-1$)	Tiny; trivial reduction; 16x SIMD parallelism on AVX-512	~30x faster per multiply than BN254	Newer; requires extension-field handling for soundness	What StarkWare's circle-STARK uses; future-proof choice
Goldilocks ($2^{64}-2^{32}+1$)	Single u64 limb; clean reduction via algebraic identity	Slower than M31 but more 2-adicity for big NTTs	Used by plonky2, Risc Zero, zkSync Boojum; mature	The pragmatic 2024-2026 default for STARK-based zkVMs

Why this should change how you think about ZK costs

The dominant ZK cost model from 2018 to 2024 was: more constraints = more dollars. Field arithmetic was the bottleneck, the constants were huge, and a million-constraint circuit was a real research expense.

The 2026 cost model is different. Constraint count still matters, but the constants have collapsed. A million-constraint Plonky3 trace proves on a $1500 laptop in under two minutes. That’s three orders of magnitude cheaper than the equivalent BN254 PLONK prover four years ago. Prover-side cost is no longer the binding constraint for most applications.

The new binding constraints are:

Memory bandwidth. Big NTTs are memory-bound, not compute-bound. The win from small fields is partly that more elements fit in cache.
Verifier complexity in non-EVM environments. Plonky3 proofs are 50–200 KB; verifying them on Ethereum requires either an EVM-friendly final wrap (which is what the SP1 / RISC0 / Stwo verifiers do) or a Solana-style permissive compute budget.
Ecosystem maturity. snarkjs / Halo2-axiom / circomlib have a decade of accreted gadgets; Plonky3 is in year three of its current incarnation. The libraries are catching up but they’re not at parity yet.

Where this leaves zera-sdk

Inside zera-sdk the substrate is BN254 + Groth16 because Solana’s verifier is BN254-and-only-BN254 today. There’s no equivalent of sol_alt_bn128_pairing for any of the small-field protocols. That means Plonky3 is not a choice we get to make for the deposit / transfer / withdraw circuits — the on-chain side fixes the curve.

What we do track is the Solana CPI proposal for STARK verification (no number yet; was last discussed in 2025) and the related “compute-budget-friendly Halo2 verifier” path. The day Solana ships either of those, the prover-side win from migrating off BN254 is large enough to justify a circuit rewrite. Until then, BN254 it is.

For off-chain proving — CI checks, offline auditing, batch verification — Plonky3 is already the right tool, and we’re using it inside the test harness for cross-validating circuit semantics.

What I’d build differently in 2027

Three follow-ups, in order of how much I expect them to matter:

A small-field shielded pool. Every privacy pool today is BN254 + Groth16 + per-circuit ceremony. The day Solana (or any high-throughput L1) ships a STARK verifier, the design space opens: no ceremony, faster proving, smaller wallets. Someone will publish this design before the verifier ships and they’ll be right to.
A unified extension-field abstraction. Plonky3 has different extension-field arithmetic per base field. A single Ext<F, k> with consistent ergonomics would make cross-field experimentation trivial. The team is aware; not yet shipped.
A small-field Poseidon variant. Poseidon-128 is parameterised for BN254. The recommended hash for BabyBear is Monolith or Poseidon2 over BabyBear, and the constraint counts are different enough that constraint-counting intuition from BN254 doesn’t transfer. A “Poseidon constraint cost calculator” that takes a field as input and emits constraint counts for common circuits would close a real reasoning gap.