skip to content
Skill Issue Dev | Dax the Dev
search
← all docs

RFC 003: Rusty Pipes detection heuristic

A runtime + install-time heuristic for flagging suspect Rust binaries shipped via npm. Builds on the Rusty Pipes posts; aimed at a CI gate or an `npm audit`-class tool.

draft Sketch — actively being shaped, not yet endorsed.

by Dax the Dev

Summary

The Rusty Pipes series documented a class of npm supply-chain attacks where a maintainer ships a small, ostensibly benign JavaScript package whose postinstall (or, more sneakily, runtime require) loads a precompiled Rust .node / .dylib / .so binary, and that binary does the malicious work — out of view of every JS-aware static analysis tool. The follow-up post (Rust in Peace) walked through how to actually build such a payload, and the exploit post showed it working end-to-end.

This RFC defines a heuristic for flagging suspect Rust binaries in npm packages. It is not a malware scanner. It is a “this package contains a native artefact whose provenance does not match its claims” detector, suitable for running as a CI gate or as a one-shot CLI before npm install.

Motivation

Pure-JS supply-chain attacks (event-stream, ua-parser-js, colors, faker) are detectable by static analysis: the payload is JS, you can grep it, you can run it under a sandbox. Native attacks are not: a .node file is opaque to every JS-ecosystem tool. The maintainer can ship a benign JS surface, a malicious binary, and npm audit is silent about both.

The Rusty Pipes attack surface, in order of decreasing audit visibility:

  1. postinstall script that downloads a binary. Detectable: the script is JS, you can read it. Mitigation: npm install --ignore-scripts is a known posture.
  2. postinstall script that compiles from source. Detectable: same as above. Plus, the build is reproducible from source.
  3. Prebuilt binary committed to the package. Opaque. The package contains a .node or platform-specific binary, the package.json main (or a require() deeper in the tree) loads it, and the binary’s behaviour is invisible to JS-aware tools.
  4. Conditional binary load. The binary is loaded only when a specific environment is detected — the runtime is Node 20, the platform is linux-x64, the host has git installed, etc. Even dynamic analysis under npm install may not exercise the bad path.

This RFC focuses on (3) and (4). The goal is not to prove a binary is malicious — that is undecidable — but to make shipping an unsanctioned native binary noisy enough that maintainers and downstream auditors notice.

Detailed design

The heuristic runs in two modes:

  • Install-time (preferred): hooks into the package install pipeline (a wrapper around npm/pnpm/yarn install, or a pre-publish hook in CI).
  • Audit-time (always available): scans an extracted package tarball or a node_modules tree.

In both modes the input is a package directory. The output is a verdict: clean, notable, flagged. Notable and flagged emit human-readable findings.

Signal 1: native artefact present without declared binary field

Packages that legitimately ship native code (@napi-rs/*, node-gyp projects, neon-rs projects) declare so in package.json — typically a binary field, an engines.node constraint, an optionalDependencies set of platform packages, or a napi field.

If the package contains any of *.node, *.dylib, *.so, *.dll, *.wasm and none of the legitimacy markers above is present, that is a Signal 1 hit. Severity: notable.

False-positive class: legacy packages predating the napi/neon conventions. Mitigation: maintain a small allowlist of packages that ship native artefacts the old-fashioned way (e.g. fsevents, bufferutil, utf-8-validate).

Signal 2: native artefact’s claimed source not present

If a package ships *.node but no Cargo.toml, binding.gyp, build.rs, src/lib.rs, or equivalent, the binary cannot be rebuilt from the package. That is suspicious because:

  • A user who wants to audit the binary cannot reproduce it.
  • A user who wants to harden their build by re-compiling from source cannot.
  • The npm registry has no way to verify the binary matches the source.

Severity: notable. Promoted to flagged if combined with Signal 4 or Signal 5.

Signal 3: binary architecture mismatch

A linux-x64 package containing a darwin-arm64 binary is at minimum a packaging mistake; at worst it is a maintainer accidentally publishing their dev-machine binary in place of the CI-built one. The heuristic uses file(1) (or its magic-bytes Node port) to read the Mach-O / ELF / PE header and compare against the expected platform from the os and cpu fields in package.json and the platform-suffixed package name.

Severity: notable. Promoted to flagged if the actual architecture is not declared by any optionalDependencies entry, because at that point the binary is for an architecture the package does not claim to support.

Signal 4: binary imports suspicious symbols

nm-style symbol enumeration (or for stripped binaries, string extraction) for known network-egress, process-spawn, and crypto-stealer symbols:

  • socket, connect, getaddrinfo, https?_request_* — outbound network from a binary that does not advertise itself as a network client is a strong signal.
  • execve, posix_spawn, CreateProcess — process spawn.
  • keytar, wallet.dat, id_rsa, ~/.aws, ~/.config/solana, id.json (Solana wallet path) — string constants pointing at known credential store paths.
  • fetch_proxy_settings, BrowserData — Chromium-profile-stealer markers.

Severity: flagged. The presence of any one of these in a package whose stated job is “format dates” or “parse YAML” is the actual smoking gun.

Signal 5: binary is dynamically loaded by a non-obvious code path

Walk the JS surface of the package looking for require(...), import(...), and process.dlopen calls whose argument resolves at runtime to a native artefact. If the load is gated by:

  • An environment variable check (if (process.env.NODE_ENV === 'production') require('./bin/x.node'))
  • A platform check whose set of supported platforms is suspiciously narrow (only linux-x64)
  • A version check on Node, OpenSSL, or a peer dep

the heuristic flags it as notable. The Rusty Pipes exploit specifically used a Node-version gate to avoid running in CI matrices that test on multiple Node versions.

Severity: notable. Combined with Signal 4, flagged.

Signal 6: maintainer churn at the boundary of a binary-shipping release

This signal needs registry metadata, not just the package contents. The heuristic looks at:

  • The set of npm publishers who have ever published the package.
  • Whether the most recent publisher is publishing for the first time.
  • Whether the most recent release is the first to ship a *.node file.
  • Whether the version bump that introduced the binary was a patch-level bump (i.e. the maintainer who is shipping native code for the first time is doing so on a 1.4.7 → 1.4.8 release, not a major).

Hit-conjunction (new publisher) ∧ (first binary-shipping release) ∧ (patch bump) is flagged. This is the exact attacker pattern from the exploit post.

Reporting

Each finding is a JSON object:

{
  "package": "[email protected]",
  "signal": "S4",
  "severity": "flagged",
  "evidence": {
    "binary": "lib/native/linux-x64/colorstring.node",
    "symbols": ["getaddrinfo", "execve", "wallet.dat"]
  },
  "explainer": "https://docs.example.com/rusty-pipes/S4"
}

The CLI exits non-zero on any flagged finding so it can be wired into CI directly.

Alternatives considered

A1: Sandbox postinstall and execute the binary in a tracer

Run the package install + a require() of the package’s main inside a seccomp/sandbox that records syscalls.

Pros: dynamic — sees what the binary actually does at runtime. Cons: defeated by environment-gated payloads (Signal 5). Also expensive: every npm install would take seconds longer. And: a sandbox bypass is itself a research target.

Status: complementary, not alternative. The dynamic analysis is a great addition to the static heuristics; it is not a substitute, because a binary that does nothing in a sandbox could still be malicious in production.

A2: Reproducible-build attestation

Require every package shipping a .node to ship a SLSA-style attestation pointing at the source commit and the build environment.

Pros: long-term correct answer. Cons: requires ecosystem-wide adoption. A heuristic that works today on packages that lack attestation is needed in the meantime.

Status: this RFC is the heuristic; SLSA is the future. They coexist.

A3: Block all .node files in packages that do not declare native code

The strict version of Signal 1.

Pros: maximally defensive. Cons: breaks legacy packages, breaks experimentation, breaks bufferutil-class packages that pre-date napi.

Status: rejected as a default; available as an opt-in --strict mode.

Drawbacks

  • False positives are real. Legitimate native packages that don’t follow napi conventions will trip Signal 1. The allowlist needs maintenance.
  • Symbol enumeration on stripped binaries is weak. Strings extraction catches the wallet.dat-class signal but not the getaddrinfo-class signal if the binary statically links a custom DNS resolver. Mitigation: combine with Signal 6 (publisher churn).
  • Signal 6 needs registry metadata. That requires a privileged or rate-limited path to the npm registry. Audit-time-on-tarball mode cannot compute Signal 6.

Open questions

  1. What is the exit-code policy? Should flagged always exit non-zero, or should there be a --no-fail flag for CI surveys that want to inventory rather than gate? Leaning toward “exit non-zero by default, --no-fail for inventory mode”.
  2. Where does this live? As a standalone CLI? An npm audit plugin? A pre-publish hook for maintainers to self-check? Probably all three, but the standalone CLI ships first.
  3. How does it handle optionalDependencies platform packages? A package like @foo/native-linux-x64 is a sibling package containing only the binary. The heuristic needs to scan those, not just the parent. Plumbing TBD.
  4. Should it also flag pure-JS obfuscation patterns? Out of scope for this RFC — that is a different attack class, well-covered by existing tools.

Adoption

This RFC is draft. The progression to proposed requires:

  1. A reference implementation that runs each signal against a corpus of known-good packages (top-1000 npm) and the reproductions from the Rusty Pipes series. Target false-positive rate <1% on the corpus, true-positive rate >95% on the reproductions.
  2. A worked example showing the heuristic flagging the Rusty Pipes exploit binary while not flagging @napi-rs/canvas.
  3. A documented allowlist process.

The progression to accepted requires a shipping CLI and at least one CI integration in an external project.

Security considerations

  • The heuristic is a target. Once attackers know what we look for, they will adapt. Signal 4 in particular is easily evaded by symbol stripping plus indirect call patterns. Signal 6 cannot be evaded without compromising additional npm accounts, which is the actual attacker work-factor we want to raise.
  • Running the heuristic itself is a potential attack surface. It walks tarballs and reads binaries. An attacker could craft a malformed .node to exploit our binary parser. Mitigation: use a hardened parser library (e.g. goblin’s pure-Rust object-file parser) and run the audit in a process with no network access.
  • Reporting to a central service is out of scope. A future version might centralise findings to track ecosystem health, but that introduces telemetry concerns and is not in this RFC.

References