Latitude bare-metal primary, Fly.io backup: the deploy story for a 1-min-block chain

Dax the Dev

Latitude bare-metal primary, Fly.io backup: the deploy story for a 1-min-block chain

3 June 2025 11 min read verified human-written

Sections

The two-tier topology
fly.toml, annotated
App name: the rebrand artefact
Kill signal and timeout
Volumes
Rolling deploy strategy
Health checks
Sizing
The Latitude box
Why a 1-minute-block chain hates cold starts
Cost math: Latitude vs Fly
A tradeoff table
What changes after Phase 4
What I changed my mind about
Further reading

The 2026-04-13 commit d3d532cc deploy: vanta v1 LIVE on Latitude is the moment Vanta moved from “regtest on a Mac mini under my desk” to “mainnet on a real-world internet host.” The seed node IP — 64.34.82.145:9333 — has been the bootstrap addnode in the desktop wallet’s auto-config since that commit.

What the commit message doesn’t tell you is that there’s a second deploy target. The fly.toml in the repo declares an 11-region fleet on Fly.io, hardcoded to an old zeracoin-seed app name. That fleet is the backup — the failover that the network falls back to when the bare-metal box goes down. Bare metal is primary. Fly is the safety net.

This post is the architecture, the fly.toml walk-through, the cost math that makes bare metal cheaper than equivalent Fly machines, and a candid paragraph about why a 1-minute-block chain particularly hates cold starts.

The two-tier topology

There’s a single primary bare-metal box, and a fleet of small Fly machines. The wallet’s auto-config lists both IPs for redundancy:

addnode=64.34.82.145:9333    # Latitude bare metal — primary
addnode=66.241.124.138:9333  # Fly.io fleet — backup

The bitcoind P2P protocol picks whichever it can reach first and rotates if a peer disappears. There’s nothing fancy here — Bitcoin Core’s peer discovery does the work. The architecture is “primary host, secondary host, network sorts itself out.”

The reason for two tiers (and not just two bare-metal boxes, or just a Fly fleet) is operational. Bare metal is cheap when you can give it your full attention. Bare metal is brittle when you can’t — disk failures happen, ISPs renumber, hardware ages. The Fly fleet is the “I am asleep, the chain stays up” insurance.

fly.toml, annotated

The full fly.toml is short. The interesting parts are below.

App name: the rebrand artefact

app = "zeracoin-seed"
primary_region = "iad"

The Fly app is still named zeracoin-seed — the pre-rebrand name. Renaming a Fly app requires recreating it (you lose the IPs and volumes), and the IPs are baked into the desktop wallet’s addnode lines. Recreating the app would force a wallet upgrade for every existing user.

The fix lives in commit 1b72aec6 — fly: match actual app name (zeracoin-seed) + clamp grace_period — which is the moment I committed to the rebrand-postponement and updated the deploy script to match the actual app name instead of pretending we’d already migrated. The tradeoff is: ugly artefact in fly.toml vs. forcing a migration every existing user has to participate in. The artefact wins.

Kill signal and timeout

kill_signal = "SIGTERM"
kill_timeout = "120s"

Bitcoin Core flushes its database on shutdown. Get SIGKILL’d mid-flush and you can corrupt chainstate or block files. The 2-minute kill_timeout is the window we give Fly’s orchestrator to wait before escalating; in practice vantad flushes in 10–20 seconds, so 120 is generous insurance.

Fly defaults to a 5-second kill_timeout. Five seconds is not enough to flush a UTXO database, full stop. Every Bitcoin-Core deploy I’ve seen on Fly that didn’t override this had at least one chainstate-corruption incident. Override it.

Volumes

[mounts]
  source = "vanta_data"
  destination = "/root/.vanta"

A persistent volume mounted at ~/.vanta — the Bitcoin Core data dir. Fly creates one volume per machine (the volume names get auto-numbered: vanta_data, vanta_data_v2, etc). The volume survives machine restarts; only a fly volumes destroy deletes it.

The data dir contains chainstate, blocks, the mempool, the peers cache, and the wallet (if any). On a fresh deploy this is empty and the machine does an initial-block-download from peers; on a restart it picks up where it left off. The volume is what makes “restart a machine” cheap and “destroy a machine” expensive.

Rolling deploy strategy

[deploy]
  strategy = "rolling"
  max_unavailable = 0.25
  wait_timeout = "10m"

Rolling deploys take at most 25% of the fleet down at once. With 11 machines spread across 11 regions, that’s about 3 machines unavailable during any given deploy. The other 8 keep the network reachable for the wallet’s addnode lookups.

wait_timeout = "10m" gives each machine ten minutes to come back up and pass health checks before the deploy considers it failed. Bitcoin Core sometimes takes that long to verify chainstate at startup, especially on a small machine; default Fly wait_timeout (5m) was tripping us during deploys and leaving the cluster in a partially-deployed state.

Health checks

[[services]]
  internal_port = 9333
  protocol = "tcp"
  auto_stop_machines = false
  auto_start_machines = true

  [[services.ports]]
    port = 9333

  [[services.tcp_checks]]
    interval = "30s"
    timeout = "5s"
    grace_period = "1m"

auto_stop_machines = false is intentional. Fly’s autostop will spin a machine down after a few minutes of no traffic. A seed node with no traffic is suspicious, but it’s not “stop the machine” suspicious — peer discovery is bursty, and a seed that’s stopped when a wallet starts up is a seed that’s not doing its job.

auto_start_machines = true lets Fly start a stopped machine on a cold tcp connection. This is the safety net for any case where the autostop did fire.

tcp_checks is a 30-second TCP-handshake probe against port 9333. If vantad dies or wedges, its P2P listener goes away, the TCP check fails, and Fly restarts the machine. The grace_period = "1m" is the startup window where we don’t penalise a machine for being mid-IBD.

grace_period is capped at 1m by Fly — anything higher gets clamped, which is a thing I learned by setting it to 5m and watching the deploy log it as “1m (clamped).” The 1-minute window is enough for a warm restart but not enough for a cold IBD; we work around it by not destroying machines casually.

Sizing

[vm]
  size = "shared-cpu-1x"
  memory = "2gb"
  swap_size_mb = 1024

shared-cpu-1x is Fly’s smallest paid tier. 2 GB RAM is bumped from the default 1 GB because txindex=1 plus the UTXO set needs headroom on a Vanta-sized chain. 1 GB swap is insurance against OOM kills during IBD bursts (specifically: the moment when the UTXO set is being loaded into memory at startup).

This is sized for a seed node, not a miner node. We don’t run mining workloads on Fly. The Bitaxe rig at home is the actual mining setup.

The Latitude box

The bare-metal primary is on Latitude.sh (formerly Latitude.net), a smaller-than-OVH-but-bigger-than-Hetzner bare-metal provider with hourly billing. The spec is a single AMD Ryzen 9, 32 GB ECC RAM, 1 TB NVMe, with a /29 subnet and an unmetered 1 Gbps port. TODO: Dax confirm the exact tier — I have it as c2.medium.x86 but want to verify against the Latitude billing dashboard.

What it runs:

vantad — the L1 node, listening on port 9333 (P2P) and 9332 (RPC, bound to localhost).
vanta-node — the L2 sidecar, listening on port 9380 for the REST API.
nginx — TLS termination for the L2 REST API (port 443 → 9380).
The Bitaxe pool (port 3333) — the home rig actually plugs into a separate machine, but the pool stratum server lives on the Latitude box.
The vanta-explorer (port 80 → 8080) — block explorer.
The fly-deploy mirror — a backup of the Fly fleet’s deploy state, in case Fly itself goes down for an extended period.

This is more than a “seed node.” It’s the primary operational deploy of the chain. The Fly fleet is, again, the seed fallback — they don’t run the explorer or the L2 sidecar. They just keep the P2P network reachable.

Why a 1-minute-block chain hates cold starts

Worth dwelling on this. On Bitcoin (10-minute blocks), a node that’s been off for an hour comes back up and is six blocks behind. Catching up is fast. The chain’s “average” block production rate is generous enough that a 60-second startup delay is invisible.

On Vanta (1-minute blocks), an hour off is sixty blocks behind. A 60-second startup is one full block of latency. If the seed nodes are slow to come back up, wallet UX degrades visibly: the user opens the wallet, sees “syncing,” and waits sixty seconds where Bitcoin would have synced in ten.

WARNING: This is the operational property that makes Fly’s autostop dangerous for a fast-block chain. A seed node that’s been auto-stopped after 30 minutes of idle, then woken up by a wallet’s first connection, takes ~15 seconds of cold start. During that 15 seconds, the wallet sees no peers and reports “L1 disconnected.” This is a real user-visible regression compared to a warm seed.

The mitigations are stacked:

auto_stop_machines = false in fly.toml — Fly never stops the seeds.
The Latitude bare-metal primary handles 99% of the bootstrap traffic, so most wallets never even hit the Fly fleet.
The Fly fleet keeps machines warm by each other’s P2P traffic — bitcoind’s peer-keepalive interval is short enough that the machines stay active even with no client traffic.
The Latitude box has a systemd unit with Restart=always so any local crash recovers in under 10 seconds.

I’d not run a fast-block chain on a serverless-by-default platform. Fly is a great fit because it can be configured to behave like an always-on host. Fly’s defaults are not.

Cost math: Latitude vs Fly

Approximate, monthly:

Component	Latitude (bare metal)	Equivalent Fly
1× AMD Ryzen 9 (8c/16t)	~$140	shared-cpu-8x: ~$160
32 GB RAM	included	$80 (32 GB at$ 2.50/GB)
1 TB NVMe	included	$150 (1 TB at$ 0.15/GB)
1 Gbps unmetered	included	bandwidth metered, est. $30
Total per box	~$140	~$420

Latitude’s all-included pricing for a single bare-metal box is roughly one third the cost of an equivalently-specced Fly machine. The Fly fleet (11 small seeds at ~ $5–$ 10/month each) costs another ~$80/month combined.

So the total bill: Latitude $140 + Fly fleet$ 80 = ~ $220/month for *primary + 11-region failover.* An equivalent Fly-only deploy (one big primary + 11 small seeds) would be ~$ 500/month for a worse outcome (no actual bare-metal performance for the L2 indexer, no NVMe write-throughput for the chainstate, no dedicated network port).

This is a textbook case for hybrid deploy. The thing you’re optimising for cost on (the heavy, always-on workload) goes on bare metal. The thing you’re optimising for availability on (the geographic-redundancy seed fleet) goes on the platform with built-in geographic distribution.

A tradeoff table

I keep telling people to do this kind of comparison explicitly, so:

Option	Cost (1 yr)	Latency to seed	Cold-start risk	Operational burden
Bare metal only (Latitude)	~$1,700	Variable by region (single PoP)	Low — always on	High if hardware fails
Fly fleet only (11 regions)	~$5,000	Low (regional anycast)	High if autostop is enabled	Low — managed platform
Hybrid (Latitude primary + Fly backup)	~$2,600	Low (Fly fronts geographic)	Low (primary always on)	Medium
DigitalOcean / Linode dedicated	~ $1,200–$ 2,000	Moderate (one PoP per droplet)	Medium	Medium
Hetzner dedicated	~ $700–$ 1,400	High (mostly EU PoPs)	Low	Medium

The Hetzner option is genuinely tempting on cost grounds — half the price of Latitude. The reason I didn’t pick it for this chain is that Hetzner’s IP ranges are widely flagged by reputation services as “spam-adjacent” (because they’re cheap and hosters use them for everything), and a small-network seed node whose IP gets transiently blocked by some random ISP’s anti-spam filter is a problem I do not want.

DigitalOcean’s $40/mo "premium intel" droplets would have worked too, but the bandwidth charges add up — DO meters at$ 0.01/GB above the included amount, and a chain seed serving IBD to fresh nodes can easily push 100 GB/day during a busy period.

What changes after Phase 4

Phase 4 in the architecture roadmap is “full Rust node rewrite using rust-bitcoin stack.” When that lands, the deploy story shifts:

The L2 sidecar and L1 node are one binary, not two. Operationally that’s a smaller blast radius — one PID to monitor instead of two.
The Rust node is statically linked and ships as a single ~30 MB binary. Container size collapses.
We can in principle deploy on smaller Fly machines (256 MB instead of 2 GB) once the C++ is gone.

But Phase 4 is the future. The current deploy story is “C++ node + Rust sidecar on bare metal, with a Fly fleet of C++-node-only seeds for failover.”

What I changed my mind about

I started this project assuming Fly was the right deploy target for everything. It’s a great platform, the developer experience is unmatched on its tier, and the ergonomics of fly deploy after years of Kubernetes is genuinely refreshing.

The thing that changed my mind was the cold-start property. A 1-minute-block chain has a different operational profile than a request-response web service. Fly’s defaults — autostop, autoresurrect on demand, regional load balancing — are tuned for a workload where 100 ms latency is fine and 5 second cold starts are tolerable. Neither is fine for a chain seed.

Once I’d configured Fly out of its defaults — auto_stop_machines = false, larger memory, longer kill_timeout, longer wait_timeout — I was running a Fly machine as if it were an always-on box. At which point: an always-on box is what bare metal is, at one-third the price, with a real network interface and dedicated NVMe.

The Fly fleet still has a job — geographic redundancy, multi-region warm seeds — that bare metal can’t do without a substantial multi-PoP investment. So Fly stays as the backup ring. Latitude is the primary. Both are needed; neither is sufficient.