Proving in the browser, by the numbers
What is actually feasible inside a browser tab in 2026 — Groth16 prover times for Poseidon, Range, and Merkle circuits, the WASM threading story, and where the main thread stops being a viable home for your prover.
- FROM
- Dax the Dev <[email protected]>
- SOURCE
- https://blog.skill-issue.dev/blog/proving_in_the_browser_by_the_numbers/
- FILED
- 2026-04-29 16:00 UTC
- REVISED
- 2026-04-29 16:00 UTC
- TIME
- 9 min read
- SERIES
- ZK SNARKs in production
- TAGS
The first time I watched a Groth16 proof finish inside a Chrome tab — Poseidon-128, two-input Merkle membership, a couple of range checks — the spinner ran for 11.4 seconds. The user expected something between Apple Pay and autocomplete. Eleven seconds is forever.
Two years and several browser releases later, the same circuit on the same laptop (2024 MacBook Air, M3, 8 cores, 16 GB) finishes in 2.1 seconds, with a warm zkey, threads pinned, and SIMD on. That’s still not Apple Pay, but it is inside the I just clicked something envelope where users don’t bail. The gap between those two numbers is the entire content of this post: what part of the browser stack moved, what didn’t, and what the limit looks like in 2026.
This is not a tutorial. It’s a benchmark walk and a tradeoff inventory. If you’re picking a prover for a wallet or a dApp this quarter — and inside zera-sdk we just made this call again, see RFC 001 — the numbers below are the ones that informed our pick.
What “in the browser” actually means in 2026
A modern browser gives a WASM prover three things it didn’t have when snarkjs first shipped in 2019:
- WebAssembly threads. A
SharedArrayBufferplus theAtomicsAPI pluswasm-bindgen-rayonlets a Rust prover spawn a worker pool from a single.wasmmodule. This needs cross-origin isolation (Cross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corp) — see thewasm-bindgen-rayonREADME for the headers your CDN needs. - 128-bit SIMD. WebAssembly’s fixed-width SIMD proposal is shipped on Chrome, Firefox, Safari. For BN254 prover work — multi-scalar multiplication, NTTs, big-integer reduction — SIMD is the difference between feasible and please install our desktop app.
- Bulk memory operations.
memory.copy/memory.fillcut several ms off witness allocation for circuits with hundreds of thousands of wires.
The fourth thing the browser stack gives you is a worker model that decouples proving from rendering. If you call your prover on the main thread, every microtask boundary stalls the React fibres and the user sees a frozen UI. The same prover, moved into a Worker, keeps the page interactive while pegging another core. Almost every wallet that ships ZK in 2026 — including the ones that look fast — does this.
flowchart LR UI[Main thread / UI] -->|postMessage proof input| W[Worker] W -->|spawns rayon pool| WS[Shared WASM memory] WS --> T1[thread 1 - MSM] WS --> T2[thread 2 - MSM] WS --> T3[thread 3 - NTT] WS --> T4[thread 4 - NTT] T1 --> G[gather] T2 --> G T3 --> G T4 --> G G -->|postMessage proof| UI
The benchmark numbers, on three workhorse circuits
The numbers below are for three circuits I keep coming back to because every shielded-pool design I’ve shipped uses some flavour of all three:
- Poseidon-128, 2-to-1. ~243 R1CS constraints. The hash building block. (Background: Poseidon, by hand and by code.)
- Range-16. Prove via 16 bit decomposition + Boolean constraints. ~50 R1CS constraints. The “this amount is positive and not absurd” check.
- Merkle-32. Membership in a depth-32 Poseidon Merkle tree. ~32 × 243 ≈ 7,800 constraints.
All numbers below are wall-clock proof generation time, with a warm zkey loaded into IndexedDB and the prover already instantiated. Cold-start (first load, parsing the zkey) adds 2–6 s on top depending on the circuit size and the user’s network. That cold-start is usually the bigger UX problem — see the closing notes.
| Circuit | snarkjs 0.7 (1 thread) | snarkjs 0.7 (4 threads) | arkworks-circom WASM (4 threads) |
|---|---|---|---|
| Poseidon-128 | ~95 ms | ~50 ms | ~25 ms |
| Range-16 | ~40 ms | ~30 ms | ~15 ms |
| Merkle-32 | ~2,400 ms | ~900 ms | ~410 ms |
The arkworks numbers come from a Rust prover compiled to WASM with wasm-bindgen-rayon and the same R1CS the snarkjs path consumes. The 4× cliff between snarkjs and arkworks-WASM at Merkle-32 is the thing to internalise: at the constraint counts that real applications hit, the gap between “JavaScript with WASM hot loops” and “Rust compiled to WASM” is roughly 5× of proving time.
That ratio is consistent with the Mopro team’s comparison of Circom provers — they measure native Rust provers at 5–10× snarkjs speed, with the WASM Rust prover sitting roughly halfway between them.
A field-arithmetic micro-benchmark you can run right now
Before getting to prover-level numbers, the floor of any of this is how fast can the browser raise a 254-bit BigInt to the fifth power. That’s the inner loop of every Poseidon round. Here’s a tiny vanilla-ts benchmark that times over BN254’s prime for 10,000 iterations and reports ops/sec. Run it on your laptop and on your phone — the gap is the gap between “proving on a wallet” and “proving on a desktop”.
On my M3 Air this run reports about 0.9 Mops/s for raw BigInt . The published snarkjs WASM prover for the same operation hits roughly 9 Mops/s — a 10× win from hand-rolled big-int arithmetic in WASM. Compiled-Rust BigInt code (ark-ff over BN254) hits 20–35 Mops/s in WASM. Native Rust hits 70–100+ Mops/s depending on assembly tuning. That stack of orders-of-magnitude is why prover libraries are not written in JavaScript even when the deployment target is the browser.
The four-way prover tradeoff
| Option | Cost | Latency | Blast radius | Notes |
|---|---|---|---|---|
| snarkjs (Groth16) | Pure WASM, ~20 KB JS shim, ~5 MB zkey lazy-loaded | Slowest of the four; threads help, SIMD helps less | Battle-tested, used by every Iden3 / Polygon ID deployment | What ZERA ships in the browser today; integrates in one npm install |
| arkworks-circom WASM (Groth16) | Rust → WASM via wasm-bindgen-rayon; ~2 MB extra wasm bundle | ~3-5x faster than snarkjs at depth-32 Merkle | Smaller deployment surface; needs COOP/COEP headers | Where I'd ship a v2 if I had a quarter to invest |
| Nova-WASM (folding) | Multi-step proof folding; per-step is small but recursion has overhead | Fast for many-step circuits (zkVM); slower for one-shot | Newer than Groth16; tooling thin in the browser | Worth it for circuits that look like a loop; not for a single Merkle path |
| Halo2-WASM (PLONKish) | No per-circuit ceremony; KZG SRS shared across circuits | Slowest single-shot but the lookup support is enormous | Privacy Scaling Explorations fork is in maintenance as of Jan 2025 | Pick this if your circuit is dominated by lookups (range checks, RLC) |
The take-home from running these benchmarks for a year is simple: for circuits under ~10k constraints the choice barely matters; for circuits over ~100k constraints the choice is the entire performance story. Most wallet circuits live in the murky middle — 5k to 50k constraints — where snarkjs is fine for now and arkworks-WASM is a 2026 upgrade I keep on the roadmap.
When the main thread is fine, and when it isn’t
A sloppy heuristic that I’ve found holds up:
Below 100 ms the cost of postMessage round-trips (serialising witness inputs, copying the proof back) eats most of the win. Above that, you’re in user-perceptible territory and the main thread stops being viable. The empirical numbers in the table above mean: Poseidon and Range can stay on the main thread; Merkle paths and anything wallet-shaped should move to a Worker.
A second heuristic, less popular but more important: don’t put your prover in a requestIdleCallback. The user clicked Send. They are waiting. Promote the work, don’t defer it.
Where the cold-start really lives
Proof generation time is the metric people quote. Cold-start is the metric people feel. The pieces of cold-start, in order of size:
- Zkey download. A Merkle-32 zkey is ~25 MB. A two-input shielded-pool circuit zkey can be 80+ MB. Download time dominates everything else on a phone on LTE.
- Zkey parse + prover instantiation. snarkjs parses the zkey eagerly into typed-array views; arkworks-WASM mmap-parses lazily. The gap is 1.5–4 s on a Merkle-32 zkey.
- WASM compilation.
WebAssembly.instantiateStreamingwith the right MIME type lets the browser pipeline compile and download. Without it you pay the full compile after the download finishes. This is a CDN-config bug in the wild more often than it should be. - Worker pool spin-up. ~50 ms per worker. Pre-spin them on page load, not on first proof.
If you can only optimise one thing, it should be (1). IndexedDB-backed lazy chunks of the zkey, served with Cache-Control: immutable, max-age=31536000, change first-load from “ten seconds of nothing” to “one second of yellow flicker, then proof”. This is what we do in the zera-sdk wallet path and it’s the single biggest UX win we shipped in Q1 2026.
What I’d build differently in 2027
Three things, ranked.
- Prover pre-warming on idle. The moment a user authenticates, fire the worker pool and pre-load the zkey. By the time they tap Send, the prover is hot. This is just engineering, not cryptography, but it’s the missing piece in every wallet I’ve benchmarked.
- Move to a folding-friendly proving system for batch operations. A user spending three notes from a UTXO pool is doing three Merkle paths back-to-back. Folding (Nova / SuperNova / ProtoStar) makes the Nth proof nearly free; Groth16 makes the Nth proof exactly N times the cost.
- Replace the per-vendor zkey format with something content-addressed. Today every project ships its own
.zkeyblobs and every wallet has to host them. Azkey://sha256/abc...resolver — backed by IPFS or an HTTP CDN — would let multiple wallets share the same zkey load and the same browser cache.
What this means for ZERA today
Inside zera-sdk the in-browser path is still snarkjs (per RFC 001). The neon-rs Node path is a native Rust prover and ~30× faster, but that’s not what a web wallet runs. The arkworks-WASM upgrade is on the roadmap as a “browser v2” target — see the open issue thread linked from the SDK repo. The decision-driver was simple: snarkjs is good enough for one-shot deposits and transfers. The day we want to make a 10-note batch tx feel instantaneous, we need either folding (Nova) or a faster underlying prover (arkworks-WASM).
For now: snarkjs, threads on, SIMD on, zkey pinned to IndexedDB, prover lifted to a Worker. That gets us 2 seconds of proving time at Merkle-32 on a mid-range laptop in 2026. The next 50% will come from arkworks; the 5× after that will come from folding. The 50× after that will come from someone else’s algorithmic breakthrough that I don’t yet know about.
Further reading
- snarkjs — Iden3, the reference WASM Groth16 prover; benchmark table in the README
- Mopro: comparison of Circom provers — community benchmark of snarkjs / arkworks / native Rust at matched circuits, 2024
wasm-bindgen-rayon— RReverser, the SharedArrayBuffer-backed Rayon adapter that makes multi-threaded Rust WASM work in browsers- WebAssembly fixed-width SIMD proposal — the standard your prover wants enabled
- Marlin: Preprocessing zkSNARKs with Universal and Updatable SRS — Chiesa, Hu, Maller, Mishra, Vesely, Ward (2019) — the paper that made universal SRS practical
- Nova: Recursive Zero-Knowledge Arguments from Folding Schemes — Kothapalli, Setty, Tzialla (2021) — the folding paper, for context on why batch proving is becoming a different game
- Poseidon, by hand and by code — the inner loop your browser is running 65 times per Merkle level