Skip to content

The Tool class

Every tool the agent can call — and every matching CLI subcommand you can run yourself — is defined once as a ToolDefinition. A single definition drives three consumers:

  1. The Anthropic SDK (via input_schema: JSONSchema) so the model can call it.
  2. Commander.js via an auto-generated subcommand.
  3. Tests, which import the tool directly and call execute().

This lives in src/tools/tool.ts.


Shape of a tool

ts
import { z } from "zod";
import type { ToolDefinition } from "../tool.ts";

const inputSchema = z.object({
  summary: z.string().describe("Summary of work done"),
});

const outputSchema = z.object({
  message: z.string(),
  is_error: z.boolean(),
});

export const completeTaskTool = {
  name: "complete_task",
  description:
    "Mark the current task as complete with a summary of what was accomplished.",
  group: "task",
  terminal: true,
  inputSchema,
  outputSchema,
  execute: async (input, ctx) => ({
    message: `Task completed: ${input.summary}`,
    is_error: false,
  }),
} satisfies ToolDefinition<typeof inputSchema, typeof outputSchema>;

Fields:

FieldPurpose
nameSnake-case identifier; also the CLI subcommand name
descriptionUsed for both the LLM tool definition and CLI help text
groupGroups tools into CLI namespaces (task, file, dir, …)
terminalIf true, the agent loop ends when this tool is called (e.g., complete_task, fail_task, wait_task)
inputSchemaZod schema with .describe() per field — becomes JSON Schema for the model and Commander flags for the CLI
outputSchemaZod schema guaranteeing the shape of the response
executeThe actual implementation, receiving validated input and a ToolContext

ToolContext

Every tool receives a ToolContext:

ts
interface ToolContext {
  conn: DbConnection;             // short-lived connection, scoped to this tool call
  dbPath: string;                 // for long-running tools that manage their own withDb
  projectDir: string;             // absolute path to the project
  config: Required<BotholomewConfig>;  // resolved config (API keys, model, …)
  mcpxClient: McpxClient | null;  // external MCP tools (may be null)
}

This is the only capability surface. A tool that isn't handed an mcpxClient can't reach the network; a tool that doesn't use conn or dbPath can't touch the database.

conn vs dbPath

The executor (runAgentLoop / runChatTurn) wraps each tool call in withDb(dbPath, async (conn) => tool.execute(input, { ...ctx, conn })). That means:

  • ctx.conn is already open for the duration of one execute() call and will be closed immediately after. Use it for ordinary tools that do one or two quick queries.
  • ctx.dbPath is for tools that run long enough that holding the file lock would block the worker or CLI (e.g., membot_refresh re-fetching many URLs). Wrap each DB touch in await withDb(ctx.dbPath, async (conn) => { … }) so the lock is released between items.

DuckDB holds the file lock at the instance level. A tool that hangs on ctx.conn through a long network round-trip keeps that lock held. When in doubt, prefer granular ctx.dbPath wrapping.


Anthropic adapter

toAnthropicTools() walks the registry and converts each Zod input schema to the Anthropic SDK's Tool type using z.toJSONSchema():

ts
{
  name: "membot_write",
  description:
    "Write a file under context/. By default, fails if the path already exists — pass on_conflict='overwrite' to replace.",
  input_schema: {
    type: "object",
    properties: { /* derived from Zod */ },
    required: ["path", "content"],
  }
}

membot_write accepts an optional on_conflict: "error" | "overwrite" input (default "error"). A collision returns is_error: true, error_type: "path_conflict", and a next_action_hint that steers the model back to membot_read or a retry with on_conflict='overwrite'.

runAgentLoop() feeds this array into client.messages.create({ tools: ... }). When the model emits a tool_use block, the loop looks up the tool by name via getTool(name), validates the input against inputSchema, calls execute(), and returns the result as a tool_result block.

Terminal tools (the ones with terminal: true) tell the loop to stop. For workers, those are complete_task, fail_task, and wait_task — any of which transitions the task out of in_progress.


CLI adapter

registerToolsAsCLI(program) iterates the registry and generates a Commander subcommand per tool, grouped by group:

bash
botholomew membot read notes/meeting.md --offset 10 --limit 20
botholomew membot tree notes --max-depth 3
botholomew membot search "quarterly revenue"

Positional args and --options are derived from the Zod schema shape. The same validation that runs for the LLM runs here, so you get the same error messages.


Registry

Tools register themselves on import, so adding a tool is a one-file change:

  1. Create src/tools/<group>/<name>.ts exporting a ToolDefinition.
  2. Add registerTool(myTool); to src/tools/registry.ts.
  3. Write a test in test/tools/<group>/<name>.test.ts.

No central dispatch table to edit, no LLM tool list to update, no CLI command to wire. The Zod schema is the source of truth.

Shared LinePatchSchema for edit tools

Any tool that mutates a markdown file via line-range patches — membot_edit, skill_edit, schedule_edit, task_edit, prompt_edit — should import { LinePatchSchema, applyLinePatches } from "../../fs/patches.ts" and reuse the same shape so the agent sees identical field descriptions across tools. The tool is responsible for re-parsing the patched output against its resource schema and rolling back on failure; the helper itself only handles the line splice. See Patch format for the field semantics.


membot_pipe — pipe a tool's output straight into context

Sometimes the agent wants a tool's full output to be searchable later but doesn't actually need to read it. A web fetch, an mcp_exec that returns a big JSON dump, a search_grep over a wide pattern — all of these can blow through the conversation budget if the bytes round-trip through the LLM.

membot_pipe is a meta-tool: you give it the name and arguments of another tool, plus a destination path, and it dispatches the inner tool, captures the stringified result, and writes it under context/ via the same ingest pipeline membot_write uses (chunked + embedded + indexed). The model only ever sees a small acknowledgment — path, byte count, and a 200-char preview — never the raw bytes.

text
agent → membot_pipe(tool_name="search_grep",
                         tool_input={...},
                         path="research/grep-results.txt")
        → { path: "research/grep-results.txt",
            bytes_written: 184321, preview: "…" }
agent → membot_search("the thing I actually wanted to know")

Two guards apply at the dispatch site:

  • Terminal tools (complete_task, fail_task, wait_task) and membot_pipe itself are rejected with error_type: "forbidden_tool". Piping a terminal tool would let the loop end without the orchestrator seeing the result; recursion is meaningless.
  • The inner tool's input is validated against its own inputSchema before dispatch, so bad arguments come back as error_type: "invalid_input" with field-level detail instead of an opaque crash.

If the inner tool returns is_error: true, nothing is written — the pipe returns error_type: "inner_tool_error" with the inner message inlined (capped at 2KB), so the agent can retry with different arguments.


membot_query — reduce a JSON blob without reading it

membot_pipe lands a big JSON result in the store; membot_query reduces it there. It runs a JSONata expression over the JSON at a logical_path and returns only the (usually small) result — so "bucket 303 entries by day" or "pull the id and subject of each" costs the size of the answer, not the size of the source.

text
agent → membot_pipe(tool_name="mcp_exec", tool_input={…}, path="mcp/inbox.json")
agent → membot_query(logical_path="mcp/inbox.json",
                     expression="${ $substring(ts,0,10): $count($) }")
        → { result: { "2026-05-31": 2, "2026-06-01": 1 }, result_type: "object" }

JSONata expressions run against the parsed JSON root ($): filter $[amount > 100], pluck $.{ 'id': id, 'subject': subject }, dedup $distinct(email), sort+slice $^(>created)[[0..9]], sum $sum(amount). Set output_logical_path to write the result back as a new entry instead of returning it inline — that's how you chain pipe → query → query.

It is a declarative transform, not code execution: a JSONata expression can only read and reshape the document it's given, with no filesystem, network, or host access. (Arbitrary code execution is a separate, deferred design — see Milestone 15 for the reasoning.) Disclosure is token-light: the tool description carries only a handful of examples, and the full syntax reference comes back on a malformed expression or when you pass expression: "?". The source must be a logical_path (a complete JSON document) — membot_query deliberately won't read from a paged read_large_result id, whose page boundaries aren't valid JSON.


capabilities_refresh — the meta-tool

The capabilities-group tool capabilities_refresh exists so the agent can keep its own tool inventory fresh. It walks getAllTools() and mcpxClient.listTools(), then asks the configured chunker_llm to produce a thematic summary — one line per theme (e.g. "Gmail — read, send, draft, search, and reply to emails") rather than a line per tool. The result is written to prompts/capabilities.md (preserving frontmatter). Because that file is loaded into every system prompt, the next boot picks up the new inventory without another round-trip. Specific tool names are intentionally absent from the rendered file; the agent uses mcp_list_tools / mcp_search / mcp_info to look them up at call-time. See prompts.md for when the agent should call it. The matching CLI surface is botholomew capabilities, and the slash command is /capabilities.


Why Zod for the schema?

Zod gives us three things at once:

  • Runtime validation. Untrusted inputs (from the model, from the CLI) are validated before execute() runs. A malformed tool call becomes a clear tool_result error the model can recover from, not a crash.
  • TypeScript inference. z.infer<typeof inputSchema> gives execute() a statically-typed input parameter.
  • JSON Schema export. z.toJSONSchema() produces the schema the Anthropic API needs without a separate definition.

The entire adapter layer is ~80 lines (src/tools/tool.ts) because Zod does the heavy lifting.

Released under the MIT License.