Merkle inclusion proofs over compressed account state on Solana

How a 32-byte hash and a logarithmic path replace a multi-kilobyte account. Walk the tree-height math, the Light Protocol compressed-account model, and an inclusion-proof construction you can run in Node.

The cheapest piece of state in a privacy pool — and the most contested one — is the commitment tree. Every shielded note’s commitment goes in. Every spend proves an inclusion. The tree is read on every transfer and written on every deposit. If the tree state is expensive, every operation is expensive. If the inclusion proofs are big, every spend is big.

In 2024 Light Protocol shipped ZK Compression on Solana, and the production primitive for “store a lot of state cheaply, prove inclusion in zero-knowledge” became standard. This post is the math behind that primitive, the deployment shape, and a runnable inclusion-proof construction. It’s a sibling piece to Pedersen commitments, in production and Nullifiers without the witchcraft — those tell you what we put into the tree; this one tells you how we prove things about the tree.

The minimum tree

A binary Merkle tree is the simplest commitment scheme that supports logarithmic-size inclusion proofs. Start with a sequence of leaves $\ell_0, \ell_1, \dots, \ell_{N-1}$ . Define the tree recursively:

\text{node}(i, j) = H(\text{node}(i, m) \,\|\, \text{node}(m+1, j)), \quad m = \lfloor (i+j)/2 \rfloor,

with $\text{node}(i, i) = \ell_i$ at the leaves. The root is $\text{node}(0, N-1)$ . To prove that $\ell_k$ is in the tree, you reveal the root plus the co-path — for each level $j$ , the sibling node along the path from leaf $k$ to the root. There are exactly $\log_2 N$ siblings.

The proof size is $\log_2 N$ hash outputs. At $N = 2^{32}$ leaves and 32-byte hashes, that’s 1024 bytes. At $N = 2^{20}$ (about a million leaves), it’s 640 bytes. The verifier cost is $\log_2 N$ hashes. Both numbers are unreasonably small compared to the account-state cost of storing all $N$ leaves directly on chain.

That’s the entire shape. Two equations and a co-path. The reason it shows up everywhere is that nothing else hits the same combination of small proof, cheap verifier, and append-only update path.

flowchart TD
R[root]
R --> A[h_AB]
R --> B[h_CD]
A --> A1[h_A]
A --> A2[h_B]
B --> B1[h_C]
B --> B2[h_D]
A1 --> L0[leaf 0: commitment_0]
A2 --> L1[leaf 1: commitment_1]
B1 --> L2[leaf 2: commitment_2]
B2 --> L3[leaf 3: commitment_3]
classDef leaf fill:#0a0a0a,stroke:#4ade80,color:#4ade80
classDef node fill:#1a1a1a,stroke:#a3a3a3,color:#e8e8e8
class L0,L1,L2,L3 leaf
class R,A,B,A1,A2,B1,B2 node

To prove inclusion of leaf 1 (commitment_1), the prover reveals the co-path [h_A, h_CD]. The verifier hashes up: h_B = leaf_1, h_AB = H(h_A || h_B), root = H(h_AB || h_CD), and checks that root matches the public root.

Inclusion proof size, exactly

For a tree of height $h$ (so $N \le 2^h$ leaves) with hash output size $s$ bytes, the inclusion proof is exactly $h \cdot s$ bytes plus the leaf index (typically 4 bytes). For Poseidon over BN254 with $s = 32$ :

|\pi_{\text{inclusion}}| = 32 h + 4 \text{ bytes}.

A useful table for production planning:

Option	Cost	Latency	Blast radius	Notes
h = 20 (1M leaves)	~644 bytes inclusion proof	20 hashes verifier-side; ~0.2ms in WASM	More than enough for testnet; comfortable for a typical L2	Light Protocol's default address tree height. The right choice for state-tree v1.
h = 26 (67M leaves)	~836 bytes inclusion proof	26 hashes; ~0.3ms	Used by Light Protocol for state trees; capacity for years of mainnet usage	The deployment-grade choice.
h = 32 (4B leaves)	~1028 bytes inclusion proof	32 hashes; ~0.4ms	Overkill for any single program; useful if multiple programs share one tree	Bitcoin-scale capacity. Almost never the right initial choice.
MMR (variable height)	~64 bytes per peak + log(n) per leaf	More complex update path; same verifier cost	Append-only with no rewrite-on-grow; ideal for batched deposits	Worth it when you batch many leaves per slot. Niche today.

Light Protocol’s state trees default to $h = 26$ — see their account-compression program — which gives 67 million leaves of capacity per tree. For zeraswap and the Dax911/z_trade/programs/zeraswap program, we use the same $h = 26$ default for the same reason: it’s the right balance between proof size and capacity, and it’s what the on-chain compression program is parameterised for.

Compressed accounts, in one diagram

The Light Protocol model is the cleanest way to think about “Solana accounts that don’t take Solana account space.” A compressed account is a tuple of fields hashed together; the hash is a leaf in a state tree; the tree’s root is what lives in account state on chain.

flowchart LR
subgraph OffChain[Off-chain client / indexer]
  A[Compressed account]
  A --> AD[discriminator]
  A --> AO[owner]
  A --> AL[lamports]
  A --> ADH[data hash]
  A --> AAH[address hash]
end
subgraph OnChain[On-chain state tree]
  H[hash to leaf] --> L[leaf in Merkle tree]
  L --> R[Merkle root]
  R --> SA[Solana account: just the root]
end
A --> H
classDef cell fill:#0a0a0a,stroke:#4ade80,color:#4ade80
classDef chain fill:#1a1a1a,stroke:#737373,color:#a3a3a3
class A,AD,AO,AL,ADH,AAH cell
class H,L,R,SA chain

The on-chain footprint is the root (32 bytes) plus the rolling-hash update buffer (a few KB amortised across many writes). The account data, the discriminator, the owner, the lamports — none of that lives in account state. It lives in the indexer’s Postgres or in a Photon RPC node, and it’s reconstructed at proof-construction time.

The inclusion proof is the trick that makes this work. To execute against a compressed account, the client constructs an inclusion proof against a recent root, the on-chain program verifies the proof against the root it has stored, and the program operates on the (now-trusted) account contents. The root is the only piece of state that has to live on chain. Everything else is reconstruction.

Compressed accounts are stored as leaves in append-only Merkle trees, with only the tree’s root maintained in Solana account state. State validity is enforced through inclusion proofs verified by the on-chain program at execution time, allowing arbitrary amounts of state to be referenced at constant on-chain storage cost.

Light Protocol whitepaper, 2024 ↗ source

The reason this is the primitive for production privacy on Solana is that Solana’s account-state cost is the load-bearing constraint. A normal Solana account is rent-exempt at roughly 0.002 SOL per kilobyte, meaning a megabyte of state costs ~2 SOL ($300+ at 2026 prices). A compressed account is storage-amortised across the tree, and the cost per leaf is sub-cent. Five orders of magnitude.

The Node simulator

Here is a working Merkle inclusion-proof construction over a 16-leaf tree, with the prover-side path construction and the verifier-side root check. It uses SHA-256 for readability — a real ZERA tree uses Poseidon — but the algorithmic shape is identical.

sandbox [ node ]

run

Runnable playground (requires JavaScript)

/index.js

// Inclusion proof construction + verification over a binary Merkle tree.
// Uses SHA-256 for readability; a production ZERA / Light Protocol tree
// uses Poseidon, which is constraint-friendly inside SNARKs.
//
// Tree of height h has 2^h leaves; inclusion proof is h hashes + index.

const { createHash } = require("crypto");

const H = (left, right) =>
createHash("sha256").update(Buffer.concat([left, right])).digest();

function buildTree(leaves) {
// Pad to power of two with zero-leaves (Light Protocol does this with a
// canonical default-leaf hash, so empty subtrees have known roots).
const padTo = 1 << Math.ceil(Math.log2(Math.max(leaves.length, 1)));
const padded = leaves.slice();
while (padded.length < padTo) padded.push(Buffer.alloc(32));

// Bottom-up construction.
const levels = [padded];
while (levels[levels.length - 1].length > 1) {
  const cur = levels[levels.length - 1];
  const next = [];
  for (let i = 0; i < cur.length; i += 2) {
    next.push(H(cur[i], cur[i + 1]));
  }
  levels.push(next);
}
return { root: levels[levels.length - 1][0], levels };
}

function inclusionProof(levels, index) {
const path = [];
const directions = [];
let idx = index;
for (let lvl = 0; lvl < levels.length - 1; lvl++) {
  const sibling = idx % 2 === 0
    ? levels[lvl][idx + 1]
    : levels[lvl][idx - 1];
  path.push(sibling);
  directions.push(idx % 2);   // 0 = we are left, 1 = we are right
  idx = idx >> 1;
}
return { path, directions };
}

function verifyInclusion(leaf, index, path, root) {
let cur = leaf;
let idx = index;
for (const sibling of path) {
  cur = idx % 2 === 0 ? H(cur, sibling) : H(sibling, cur);
  idx = idx >> 1;
}
return cur.equals(root);
}

// Demo ---------------------------------------------------------
function leafFor(i) {
// In a real shielded pool: leaf = Poseidon(amount, asset, secret, ...)
// Here: a synthetic leaf so we can read the demo output.
return createHash("sha256").update(Buffer.from(`commitment-${i}`)).digest();
}

const N = 16;
const leaves = Array.from({ length: N }, (_, i) => leafFor(i));
const { root, levels } = buildTree(leaves);

console.log(`tree height: ${levels.length - 1}`);
console.log(`leaf count:  ${N}`);
console.log(`root:        ${root.toString("hex").slice(0, 24)}...`);
console.log("");

// Prove inclusion of leaf 7
const idx = 7;
const { path, directions } = inclusionProof(levels, idx);
const ok = verifyInclusion(leaves[idx], idx, path, root);

console.log(`proving inclusion of leaf ${idx}`);
console.log(`co-path length: ${path.length}`);
console.log(`directions:     [${directions.join(", ")}]`);
console.log(`verifies:       ${ok}`);
console.log("");

// Tampering: try to claim leaf 7 = leaves[3] (a different commitment)
const fake = verifyInclusion(leaves[3], idx, path, root);
console.log(`tampered claim verifies: ${fake} (expected false)`);

// Proof size in bytes
const proofBytes = path.reduce((s, p) => s + p.length, 0) + 4; // +4 for index
console.log("");
console.log(`inclusion proof size: ${proofBytes} bytes`);
console.log(`(at h=26 it would be ${26 * 32 + 4} bytes)`);

/package.json

{
"name": "merkle-demo",
"version": "1.0.0",
"main": "index.js"
}

Open the node template on codesandbox.io

Two things to notice when you run this. The proof is 132 bytes for a 16-leaf tree (4 hashes + an index). Scaled to $h = 26$ , it’s 836 bytes — independent of how many leaves are in the tree. That’s the O(\log n) argument with the constant factor pinned down. The other thing: the tampering attempt at the end fails because verifyInclusion(leaves[3], 7, path, root) re-hashes up the wrong path. The directions array is what makes this work; without it, the verifier doesn’t know whether the sibling goes on the left or the right.

Batched inclusion via Merkle Mountain Ranges

The basic Merkle tree is append-only but expensive to grow — every new leaf forces a recompute of $\log_2 N$ internal nodes. For workloads that batch many leaves at once (rollups, periodic deposit windows, settlement layers), the Merkle Mountain Range is the structural improvement.

An MMR is a forest of perfect binary trees. New leaves are appended to the rightmost tree; when two trees of equal height exist, they’re merged. The peaks (one per tree) are then “bagged” — hashed together — to produce the MMR root. The math:

|\text{peaks}| = \text{popcount}(N)

so for $N = 2^k$ there’s exactly one peak, and for $N$ between powers of two the peak count is bounded by $\lceil \log_2 N \rceil$ . An inclusion proof for a leaf in an MMR is the path within its containing perfect tree (size $\le \log_2 N$ ), plus the peaks of the other trees (size $< \log_2 N$ ). Total:

|\pi_{\text{MMR}}| \le 2 \log_2 N \cdot s

with $s$ the hash output size. Slightly larger than a balanced tree, but the update is $O(\log N)$ amortised with constant cost per appended leaf in the steady state, which matters for high-throughput deposit workloads. Todd’s original spec and Robinson’s optimality result (2025) are the references; FlyClient and Mina use MMRs in production.

For zeraswap we don’t use an MMR — the deposit cadence is interactive enough that the balanced tree dominates — but the design seam is in programs/zeraswap/src/state.rs so we can swap if/when the deposit pattern shifts.

What the on-chain program checks

The on-chain inclusion check is two operations:

Recompute the root from the leaf, the index, and the co-path. This is $\log_2 N$ Poseidon hashes. On Solana with the sol_poseidon syscall, that’s roughly $26 \times 1500 = 39{,}000$ compute units at $h = 26$ .
Compare the recomputed root against a recent canonical root stored on-chain. Light Protocol keeps a sliding window of recent roots (the rolling-hash update buffer) so that proofs constructed against a root from 5-10 slots ago still verify. This is what makes high-throughput compressed accounts work — you can’t force every prover to re-derive a proof on every slot tick.

The on-chain program does not store the leaves. It does not store the inner nodes. It stores the root and the change history, period. The state-cost difference between “32 bytes on chain plus an indexer” and “32 KB per account” is the entire reason ZK Compression got to mainnet on Solana.

What lives where, in the ZERA stack

To make this concrete:

crates/zera-sdk-core/src/merkle.rs        # Rust prover-side path construction
crates/zera-sdk-onchain/src/lib.rs        # on-chain root verification
packages/sdk/src/merkle.ts                # JS path construction (browser wallet)
programs/zeraswap/src/state.rs            # commitment-tree state account layout

All four read the same Poseidon parameters (see the Pedersen post for why we have four implementations of Poseidon and how they’re cross-validated). The same way the Poseidon hash has to agree byte-for-byte across the four implementations, the Merkle tree construction has to agree on:

Leaf encoding (which fields go into the leaf hash, in what order)
Internal node encoding (left || right, both 32 bytes, big-endian)
Default-leaf hash for empty subtrees (Light Protocol uses Poseidon of zero; we match)
Root format (single 32-byte field element)

If any of these drift, the prover and verifier disagree silently and the protocol stops accepting proofs. We have integration tests in tests/merkle_cross_impl.rs that build the same tree from the JS and Rust sides and assert equality at every level. They run in CI on every commit.

Where this leaves the design space

The thing I keep coming back to: a Merkle tree is the cheapest possible primitive for “prove this thing is in this set” with a logarithmic-size proof. There is no clever lattice, no exotic accumulator, that comes close on the cost-per-byte budget Solana imposes. Verkle trees are interesting on paper and impractical in production for this surface (the polynomial commitment overhead dominates at the leaf counts we care about). KZG-based vector commitments are interesting for trusted-setup-tolerant rollups and overkill for a privacy pool.

So the answer is the boring one. A balanced binary Merkle tree, height 26, Poseidon hash, default-zero subtrees, sliding window of 32 recent roots. It is what Light Protocol shipped, what zera-sdk ships, and what every serious Solana privacy stack will ship through the rest of the decade. The interesting work is in the layers above (the SNARK that uses the inclusion proof, the nullifier set that enforces single-spend, the curve choice underneath the hash) — see the curve post for that.