Proving in the browser, by the numbers

What is actually feasible inside a browser tab in 2026 — Groth16 prover times for Poseidon, Range, and Merkle circuits, the WASM threading story, and where the main thread stops being a viable home for your prover.

The first time I watched a Groth16 proof finish inside a Chrome tab — Poseidon-128, two-input Merkle membership, a couple of range checks — the spinner ran for 11.4 seconds. The user expected something between Apple Pay and autocomplete. Eleven seconds is forever.

Two years and several browser releases later, the same circuit on the same laptop (2024 MacBook Air, M3, 8 cores, 16 GB) finishes in 2.1 seconds, with a warm zkey, threads pinned, and SIMD on. That’s still not Apple Pay, but it is inside the I just clicked something envelope where users don’t bail. The gap between those two numbers is the entire content of this post: what part of the browser stack moved, what didn’t, and what the limit looks like in 2026.

This is not a tutorial. It’s a benchmark walk and a tradeoff inventory. If you’re picking a prover for a wallet or a dApp this quarter — and inside zera-sdk we just made this call again, see RFC 001 — the numbers below are the ones that informed our pick.

What “in the browser” actually means in 2026

A modern browser gives a WASM prover three things it didn’t have when snarkjs first shipped in 2019:

WebAssembly threads. A SharedArrayBuffer plus the Atomics API plus wasm-bindgen-rayon lets a Rust prover spawn a worker pool from a single .wasm module. This needs cross-origin isolation (Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp) — see the wasm-bindgen-rayon README for the headers your CDN needs.
128-bit SIMD. WebAssembly’s fixed-width SIMD proposal is shipped on Chrome, Firefox, Safari. For BN254 prover work — multi-scalar multiplication, NTTs, big-integer reduction — SIMD is the difference between feasible and please install our desktop app.
Bulk memory operations. memory.copy / memory.fill cut several ms off witness allocation for circuits with hundreds of thousands of wires.

The fourth thing the browser stack gives you is a worker model that decouples proving from rendering. If you call your prover on the main thread, every microtask boundary stalls the React fibres and the user sees a frozen UI. The same prover, moved into a Worker, keeps the page interactive while pegging another core. Almost every wallet that ships ZK in 2026 — including the ones that look fast — does this.

flowchart LR
UI[Main thread / UI] -->|postMessage proof input| W[Worker]
W -->|spawns rayon pool| WS[Shared WASM memory]
WS --> T1[thread 1 - MSM]
WS --> T2[thread 2 - MSM]
WS --> T3[thread 3 - NTT]
WS --> T4[thread 4 - NTT]
T1 --> G[gather]
T2 --> G
T3 --> G
T4 --> G
G -->|postMessage proof| UI

The benchmark numbers, on three workhorse circuits

The numbers below are for three circuits I keep coming back to because every shielded-pool design I’ve shipped uses some flavour of all three:

Poseidon-128, 2-to-1. ~243 R1CS constraints. The hash building block. (Background: Poseidon, by hand and by code.)
Range-16. Prove $0 \le x < 2^{16}$ via 16 bit decomposition + Boolean constraints. ~50 R1CS constraints. The “this amount is positive and not absurd” check.
Merkle-32. Membership in a depth-32 Poseidon Merkle tree. ~32 × 243 ≈ 7,800 constraints.

All numbers below are wall-clock proof generation time, with a warm zkey loaded into IndexedDB and the prover already instantiated. Cold-start (first load, parsing the zkey) adds 2–6 s on top depending on the circuit size and the user’s network. That cold-start is usually the bigger UX problem — see the closing notes.

Circuit	snarkjs 0.7 (1 thread)	snarkjs 0.7 (4 threads)	arkworks-circom WASM (4 threads)
Poseidon-128	~95 ms	~50 ms	~25 ms
Range-16	~40 ms	~30 ms	~15 ms
Merkle-32	~2,400 ms	~900 ms	~410 ms

The arkworks numbers come from a Rust prover compiled to WASM with wasm-bindgen-rayon and the same R1CS the snarkjs path consumes. The 4× cliff between snarkjs and arkworks-WASM at Merkle-32 is the thing to internalise: at the constraint counts that real applications hit, the gap between “JavaScript with WASM hot loops” and “Rust compiled to WASM” is roughly 5× of proving time.

That ratio is consistent with the Mopro team’s comparison of Circom provers — they measure native Rust provers at 5–10× snarkjs speed, with the WASM Rust prover sitting roughly halfway between them.

A field-arithmetic micro-benchmark you can run right now

Before getting to prover-level numbers, the floor of any of this is how fast can the browser raise a 254-bit BigInt to the fifth power. That’s the inner loop of every Poseidon round. Here’s a tiny vanilla-ts benchmark that times $x^5$ over BN254’s prime for 10,000 iterations and reports ops/sec. Run it on your laptop and on your phone — the gap is the gap between “proving on a wallet” and “proving on a desktop”.

x^5 over BN254 — JS BigInt benchmark [ vanilla-ts ]

run

Runnable playground (requires JavaScript)

/index.ts

// Tiny benchmark: how many x^5 mod p ops/sec does this browser do
// using native BigInt? p = the BN254 scalar field prime.
//
// Native Rust on the same hardware sits roughly 30-60x faster than this
// number. WASM-compiled Rust sits roughly 15-30x faster. snarkjs uses a
// hand-tuned WASM bigint that beats raw JS BigInt by ~10x.

const P = 21888242871839275222246405745257275088548364400416034343698204186575808495617n;

function pow5(x: bigint): bigint {
const x2 = (x * x) % P;
const x4 = (x2 * x2) % P;
return (x4 * x) % P;
}

function bench(iters: number): number {
// Hot start: warm up the JIT.
let acc = 13n;
for (let i = 0; i < 1000; i++) acc = pow5(acc + BigInt(i));
// Real run.
const t0 = performance.now();
for (let i = 0; i < iters; i++) acc = pow5(acc + BigInt(i));
const t1 = performance.now();
// Sink so DCE can't optimise the loop away.
(window as any).__sink = acc;
return iters / ((t1 - t0) / 1000);
}

const out = document.getElementById("out")!;
const runBtn = document.getElementById("run") as HTMLButtonElement;

function format(opsPerSec: number): string {
if (opsPerSec > 1_000_000) return (opsPerSec / 1_000_000).toFixed(2) + " Mops/s";
if (opsPerSec > 1_000) return (opsPerSec / 1_000).toFixed(1) + " Kops/s";
return opsPerSec.toFixed(0) + " ops/s";
}

function run() {
out.textContent = "running 10,000 x^5 mod p ops over BN254...\n";
// Run several rounds so the median is meaningful.
const rounds = 5;
const results: number[] = [];
for (let r = 0; r < rounds; r++) {
  const ops = bench(10_000);
  results.push(ops);
  out.textContent += `round ${r + 1}: ${format(ops)}\n`;
}
results.sort((a, b) => a - b);
const median = results[Math.floor(rounds / 2)];
out.textContent += `\nmedian: ${format(median)}\n`;
out.textContent += `\nfor reference:\n`;
out.textContent += `  snarkjs WASM prover: ~10x this\n`;
out.textContent += `  arkworks compiled to WASM: ~20-30x this\n`;
out.textContent += `  native Rust on the same CPU: ~50-100x this\n`;
}

runBtn.addEventListener("click", run);
run();

/index.html

<!DOCTYPE html>
<html>
<body style="margin:0;padding:1rem;background:#000;color:#e8e8e8;font-family:'Geist Mono',ui-monospace,monospace;">
  <button id="run" style="padding:0.5rem 0.85rem;background:#0a0a0a;color:#4ade80;border:1px solid #2a2a2a;border-radius:4px;font-family:inherit;cursor:pointer;margin-bottom:0.75rem;">run again</button>
  <pre id="out" style="background:#0a0a0a;color:#4ade80;padding:0.75rem;border:1px solid #2a2a2a;border-radius:4px;margin:0;white-space:pre-wrap;">starting...</pre>
  <script type="module" src="/index.ts"></script>
</body>
</html>

Open the vanilla-ts template on codesandbox.io

On my M3 Air this run reports about 0.9 Mops/s for raw BigInt $x^5$ . The published snarkjs WASM prover for the same operation hits roughly 9 Mops/s — a 10× win from hand-rolled big-int arithmetic in WASM. Compiled-Rust BigInt code (ark-ff over BN254) hits 20–35 Mops/s in WASM. Native Rust hits 70–100+ Mops/s depending on assembly tuning. That stack of orders-of-magnitude is why prover libraries are not written in JavaScript even when the deployment target is the browser.

The four-way prover tradeoff

Four browser-side prover options, in 2026. The right answer depends on the constraint count and on whether you can ship cross-origin headers.

Option	Cost	Latency	Blast radius	Notes
snarkjs (Groth16)	Pure WASM, ~20 KB JS shim, ~5 MB zkey lazy-loaded	Slowest of the four; threads help, SIMD helps less	Battle-tested, used by every Iden3 / Polygon ID deployment	What ZERA ships in the browser today; integrates in one npm install
arkworks-circom WASM (Groth16)	Rust → WASM via wasm-bindgen-rayon; ~2 MB extra wasm bundle	~3-5x faster than snarkjs at depth-32 Merkle	Smaller deployment surface; needs COOP/COEP headers	Where I'd ship a v2 if I had a quarter to invest
Nova-WASM (folding)	Multi-step proof folding; per-step is small but recursion has overhead	Fast for many-step circuits (zkVM); slower for one-shot	Newer than Groth16; tooling thin in the browser	Worth it for circuits that look like a loop; not for a single Merkle path
Halo2-WASM (PLONKish)	No per-circuit ceremony; KZG SRS shared across circuits	Slowest single-shot but the lookup support is enormous	Privacy Scaling Explorations fork is in maintenance as of Jan 2025	Pick this if your circuit is dominated by lookups (range checks, RLC)

The take-home from running these benchmarks for a year is simple: for circuits under ~10k constraints the choice barely matters; for circuits over ~100k constraints the choice is the entire performance story. Most wallet circuits live in the murky middle — 5k to 50k constraints — where snarkjs is fine for now and arkworks-WASM is a 2026 upgrade I keep on the roadmap.

When the main thread is fine, and when it isn’t

A sloppy heuristic that I’ve found holds up:

t_{\text{prove}} > 100\text{ ms} \implies \text{move to a Worker}

Below 100 ms the cost of postMessage round-trips (serialising witness inputs, copying the proof back) eats most of the win. Above that, you’re in user-perceptible territory and the main thread stops being viable. The empirical numbers in the table above mean: Poseidon and Range can stay on the main thread; Merkle paths and anything wallet-shaped should move to a Worker.

A second heuristic, less popular but more important: don’t put your prover in a requestIdleCallback. The user clicked Send. They are waiting. Promote the work, don’t defer it.

Where the cold-start really lives

Proof generation time is the metric people quote. Cold-start is the metric people feel. The pieces of cold-start, in order of size:

Zkey download. A Merkle-32 zkey is ~25 MB. A two-input shielded-pool circuit zkey can be 80+ MB. Download time dominates everything else on a phone on LTE.
Zkey parse + prover instantiation. snarkjs parses the zkey eagerly into typed-array views; arkworks-WASM mmap-parses lazily. The gap is 1.5–4 s on a Merkle-32 zkey.
WASM compilation. WebAssembly.instantiateStreaming with the right MIME type lets the browser pipeline compile and download. Without it you pay the full compile after the download finishes. This is a CDN-config bug in the wild more often than it should be.
Worker pool spin-up. ~50 ms per worker. Pre-spin them on page load, not on first proof.

If you can only optimise one thing, it should be (1). IndexedDB-backed lazy chunks of the zkey, served with Cache-Control: immutable, max-age=31536000, change first-load from “ten seconds of nothing” to “one second of yellow flicker, then proof”. This is what we do in the zera-sdk wallet path and it’s the single biggest UX win we shipped in Q1 2026.

What I’d build differently in 2027

Three things, ranked.

Prover pre-warming on idle. The moment a user authenticates, fire the worker pool and pre-load the zkey. By the time they tap Send, the prover is hot. This is just engineering, not cryptography, but it’s the missing piece in every wallet I’ve benchmarked.
Move to a folding-friendly proving system for batch operations. A user spending three notes from a UTXO pool is doing three Merkle paths back-to-back. Folding (Nova / SuperNova / ProtoStar) makes the Nth proof nearly free; Groth16 makes the Nth proof exactly N times the cost.
Replace the per-vendor zkey format with something content-addressed. Today every project ships its own .zkey blobs and every wallet has to host them. A zkey://sha256/abc... resolver — backed by IPFS or an HTTP CDN — would let multiple wallets share the same zkey load and the same browser cache.

What this means for ZERA today

Inside zera-sdk the in-browser path is still snarkjs (per RFC 001). The neon-rs Node path is a native Rust prover and ~30× faster, but that’s not what a web wallet runs. The arkworks-WASM upgrade is on the roadmap as a “browser v2” target — see the open issue thread linked from the SDK repo. The decision-driver was simple: snarkjs is good enough for one-shot deposits and transfers. The day we want to make a 10-note batch tx feel instantaneous, we need either folding (Nova) or a faster underlying prover (arkworks-WASM).

For now: snarkjs, threads on, SIMD on, zkey pinned to IndexedDB, prover lifted to a Worker. That gets us 2 seconds of proving time at Merkle-32 on a mid-range laptop in 2026. The next 50% will come from arkworks; the 5× after that will come from folding. The 50× after that will come from someone else’s algorithmic breakthrough that I don’t yet know about.