skip to content
Skill Issue Dev | Dax the Dev
search
← all docs

RFC 004: Blog MCP tool surface

The contract for a personal MCP server that exposes blog.skill-issue.dev to AI agents — search posts, fetch posts as plain text, list notes, and discover the canonical /llms.txt index, without scraping the public site.

proposed Up for review. Open to feedback before adoption.

by Dax the Dev

Summary

This RFC defines the contract for a small Model Context Protocol server that re-publishes blog.skill-issue.dev as a tool surface for AI agents. It is the parallel-of-scope to the zera-sdk MCP server — same protocol, totally different domain. Where @zera-labs/mcp-server exposes shielded-pool primitives, this RFC’s server exposes the blog corpus: posts, notes, design docs (this collection), and the canonical /llms.txt index.

Status is proposed because the surface is small and contentious only at the edges (rate-limiting, request shape).

Motivation

The blog already publishes machine-readable indices: /llms.txt, /llms-full.txt, /feed.json, /atom.xml, /rss.xml, /notes.xml. Agents that want to read the corpus today scrape one of those, parse the markdown, and possibly fetch individual posts. That works, but:

  • An agent reading the site is doing N requests where 1 would do, because tool-call descriptions guide them better than HTML headers.
  • Agents do not benefit from the structured frontmatter (tags, series, pubDate) that the site already has — they re-derive it from prose.
  • A search query against the corpus has to be implemented per-agent because there is no /search endpoint that returns structured results (the site uses Pagefind client-side).

A purpose-built MCP server fixes all three. It also costs almost nothing: the server is a thin wrapper over the same content collections that the Astro site already uses.

Detailed design

Deployment shape

The server is a separate package, not part of this Astro project’s runtime. It runs as a stdio MCP server invoked by the agent’s local config (Claude Desktop, Cursor, Continue, etc.):

{
  "mcpServers": {
    "skill-issue-blog": {
      "command": "npx",
      "args": ["-y", "@daxts/blog-mcp"],
      "env": {
        "BLOG_BASE_URL": "https://blog.skill-issue.dev"
      }
    }
  }
}

The server fetches from the public blog at runtime (it is not bundled with the corpus) so that the corpus stays canonical at the deployed URL. There is no registry-style daemon; each agent runs its own.

Tools

blog_list_posts

Return the post index, sorted by date descending, paginated.

Input:

  • limit (number, default 25, max 100)
  • offset (number, default 0)
  • tag (string, optional) — filter by tag
  • series (string, optional) — filter by series

Output: Array<{ slug: string; title: string; description: string; pubDate: string; tags: string[]; series?: string; }>

Implementation: fetch /feed.json, transform.

blog_get_post

Fetch the markdown body of a single post.

Input:

  • slug (string, required)
  • format ("markdown" | "plain", default "markdown")

Output: { slug, title, description, pubDate, body, wordCount }

Implementation: fetch /blog/<slug>/index.md if available, else fall back to scraping /blog/<slug>/ and stripping the layout. The site already publishes /llms-full.txt which is the entire corpus inline; for individual posts the markdown source is the cleanest fetch.

Server-side keyword search over titles, descriptions, and bodies.

Input:

  • query (string, required)
  • collection ("blog" | "notes" | "docs" | "all", default "all")
  • limit (number, default 10, max 50)

Output: Array<{ slug: string; collection: string; title: string; snippet: string; score: number }>

Implementation: the blog already ships a Pagefind index for the production search box. The MCP server fetches the Pagefind index (or a JSON-shaped derivative — TODO: Dax confirm whether to publish a search-only JSON endpoint) and runs the same query the client would.

blog_list_notes

Return the notes index.

Input:

  • limit (number, default 25, max 100)
  • offset (number, default 0)
  • tag (string, optional)

Output: Array<{ slug: string; title?: string; pubDate: string; link?: string; tags: string[]; bodyExcerpt: string; }>

Implementation: parse /notes.xml.

blog_list_docs

Return the design-docs index from this collection (/docs).

Input:

  • status (enum "draft" | "proposed" | "accepted" | "shipped" | "rejected" | "superseded" | "all", default "all")
  • limit (number, default 25)

Output: Array<{ slug: string; title: string; description: string; status: string; date: string; }>

Implementation: TODO — depends on the docs collection exposing a JSON index. We will ship a /docs.json endpoint alongside this RFC if not present yet.

blog_about

A canned, low-friction “who is this person” tool. Returns the author bio, current focus, and key links (resume, GitHub, Cal.com).

No input.

Output:

{
  name: "Dax the Dev",
  pitch: string,
  focus: string,            // pulled from /now
  links: { github, linkedin, cal, resume, rss },
  recent_posts: Array<{ slug, title, pubDate }>,
}

Implementation: pulls from /about and /now and the most recent feed entries. This tool exists because agents asking “what is this person up to?” should get one good answer, not a synthesis of three scrapes.

Things this server will not do

  • No write tools. Agents cannot post, edit, or delete content. This is a read-only mirror of a public site.
  • No private endpoints. The server fetches from the public URL; there is no shortcut to drafts.
  • No analytics or telemetry from the server. Each agent’s local invocation is its own concern.
  • No bundled corpus. The corpus lives at the deployed URL; the server is a thin transformer.

Caching and rate limiting

The server caches blog_list_posts, blog_list_notes, and blog_list_docs for 5 minutes in a local on-disk cache (~/.cache/skill-issue-blog-mcp/). Individual post fetches are cached for 24 hours, keyed by slug + ETag.

There is no rate limit imposed by the MCP server itself. The blog is on Cloudflare Pages and inherits Cloudflare’s edge throttling. If an agent goes wild, Cloudflare returns 429 and the server propagates that to the agent.

Alternatives considered

A1: Don’t ship an MCP server; rely on /llms.txt and /llms-full.txt

These already exist on the blog. An agent that knows about them works fine.

Pros: zero new code, zero new install for agents. Cons: the MCP layer offers tool descriptions that guide agent decisions; an agent reading /llms.txt has to figure out the structure on its own. Also: search is not solved by the txt indices.

Status: complementary, not alternative. The MCP server uses the txt indices internally; their existence does not make the MCP server redundant.

A2: Ship the server as a Cloudflare Worker (HTTP MCP transport)

Once the HTTP transport for MCP stabilises, host the server on the same Cloudflare Pages deployment as the blog. Agents connect over HTTPS, no local install.

Pros: zero install for users, latest corpus always. Cons: HTTP MCP transport is still in flux as of this writing; stdio is the well-supported default. Also: a hosted MCP server is a target for abuse in a way that a local stdio server is not.

Status: future work. The stdio version ships first; the HTTP version is a follow-up RFC if HTTP MCP transport stabilises.

A3: Bundle the corpus with the server

Ship @daxts/blog-mcp with the entire corpus inlined, no network calls.

Pros: works offline, predictable, can be diff-ed at install time. Cons: corpus goes stale; users would need to update the package on every blog post. Defeats the purpose.

Status: rejected.

A4: Ship as an OpenAPI / mcp.json spec instead of a server

Publish a static /.well-known/mcp.json that describes the available tools, and require agents to implement a generic MCP-from-OpenAPI bridge.

Pros: zero runtime dependency. Cons: nobody has shipped that bridge yet. Stdio is what works today.

Status: rejected for v1.

Drawbacks

  • Yet another package to maintain. Mitigated by the server being a thin (~300 LOC) transformer.
  • Tool descriptions need to be good. A bad tool description is worse than no tool — the agent calls it for the wrong reason. Iterating on descriptions costs nothing but attention.
  • The blog_search tool depends on the Pagefind index format. If Pagefind changes its format on a major bump, the server breaks. Pinning the Pagefind version in the server’s deps mitigates.

Open questions

  1. Does the blog publish a /docs.json index? Today it does not. Adding one is part of the Lever-1 work that contains this RFC. (Meta: this RFC is itself a doc in the collection it describes.) TODO: Dax confirm publication ordering.
  2. Should blog_get_post return rendered HTML in addition to markdown? Some agents prefer rendered text. Deferred — markdown first, HTML if requested.
  3. How does the MCP server discover new content? Today, by re-fetching the index every cache TTL. A more efficient design would subscribe to the RSS feed and poll for changes. Deferred to v0.2.
  4. Authentication. The blog has no private endpoints, so there is nothing to authenticate. If we ever add a private “ask me anything” endpoint (see RFC 002 sibling work), this changes. Out of scope.
  5. What is the server’s relationship to the zera-sdk MCP server? They are independent. An agent can install both. There is no shared code; the only commonality is the MCP SDK they both wrap. Documented in the README to avoid confusion.

Adoption

proposed. Acceptance gate:

  1. Reference implementation in a sibling repo (likely Dax911/blog-mcp).
  2. Worked example: an agent answers “what has Dax written about Rust supply-chain attacks?” using blog_search followed by blog_get_post.
  3. The blog publishes the supporting endpoints this server depends on (/docs.json, the search-friendly Pagefind export — TODO: Dax confirm scope).

The progression to accepted requires a public package on npm and a documented Claude Desktop install. Progression to shipped is when there is a non-author agent that habitually uses it.

Security considerations

  • The MCP server fetches from the public blog at runtime. That is a network egress on the user’s machine. Agents that block ambient network egress need to allowlist blog.skill-issue.dev.
  • No write surface, so no abuse via the agent. The server cannot post comments, edit posts, or anything else; everything is read-only.
  • Cache poisoning is the worst-case bug. A malicious response from a network-on-path attacker could be cached on the user’s disk for 24 hours. Mitigation: HTTPS with cert pinning is overkill; standard cert validation + ETag is sufficient.
  • The server is not a circumvention of robots policies. The blog has no robots restriction on its public corpus, so this is moot, but worth stating: if the blog ever adds a robots policy, the MCP server respects it.

References