The Tool class
Every tool the agent can call — and every matching CLI subcommand you can run yourself — is defined once as a ToolDefinition. A single definition drives three consumers:
- The Anthropic SDK (via
input_schema: JSONSchema) so the model can call it. - Commander.js via an auto-generated subcommand.
- Tests, which import the tool directly and call
execute().
This lives in src/tools/tool.ts.
Shape of a tool
import { z } from "zod";
import type { ToolDefinition } from "../tool.ts";
const inputSchema = z.object({
summary: z.string().describe("Summary of work done"),
});
const outputSchema = z.object({
message: z.string(),
is_error: z.boolean(),
});
export const completeTaskTool = {
name: "complete_task",
description:
"Mark the current task as complete with a summary of what was accomplished.",
group: "task",
terminal: true,
inputSchema,
outputSchema,
execute: async (input, ctx) => ({
message: `Task completed: ${input.summary}`,
is_error: false,
}),
} satisfies ToolDefinition<typeof inputSchema, typeof outputSchema>;Fields:
| Field | Purpose |
|---|---|
name | Snake-case identifier; also the CLI subcommand name |
description | Used for both the LLM tool definition and CLI help text |
group | Groups tools into CLI namespaces (task, file, dir, …) |
terminal | If true, the agent loop ends when this tool is called (e.g., complete_task, fail_task, wait_task) |
inputSchema | Zod schema with .describe() per field — becomes JSON Schema for the model and Commander flags for the CLI |
outputSchema | Zod schema guaranteeing the shape of the response |
execute | The actual implementation, receiving validated input and a ToolContext |
ToolContext
Every tool receives a ToolContext:
interface ToolContext {
conn: DbConnection; // short-lived connection, scoped to this tool call
dbPath: string; // for long-running tools that manage their own withDb
projectDir: string; // absolute path to the project
config: Required<BotholomewConfig>; // resolved config (API keys, model, …)
mcpxClient: McpxClient | null; // external MCP tools (may be null)
}This is the only capability surface. A tool that isn't handed an mcpxClient can't reach the network; a tool that doesn't use conn or dbPath can't touch the database.
conn vs dbPath
The executor (runAgentLoop / runChatTurn) wraps each tool call in withDb(dbPath, async (conn) => tool.execute(input, { ...ctx, conn })). That means:
ctx.connis already open for the duration of oneexecute()call and will be closed immediately after. Use it for ordinary tools that do one or two quick queries.ctx.dbPathis for tools that run long enough that holding the file lock would block the worker or CLI (e.g.,membot_refreshre-fetching many URLs). Wrap each DB touch inawait withDb(ctx.dbPath, async (conn) => { … })so the lock is released between items.
DuckDB holds the file lock at the instance level. A tool that hangs on ctx.conn through a long network round-trip keeps that lock held. When in doubt, prefer granular ctx.dbPath wrapping.
Anthropic adapter
toAnthropicTools() walks the registry and converts each Zod input schema to the Anthropic SDK's Tool type using z.toJSONSchema():
{
name: "membot_write",
description:
"Write a file under context/. By default, fails if the path already exists — pass on_conflict='overwrite' to replace.",
input_schema: {
type: "object",
properties: { /* derived from Zod */ },
required: ["path", "content"],
}
}membot_write accepts an optional on_conflict: "error" | "overwrite" input (default "error"). A collision returns is_error: true, error_type: "path_conflict", and a next_action_hint that steers the model back to membot_read or a retry with on_conflict='overwrite'.
runAgentLoop() feeds this array into client.messages.create({ tools: ... }). When the model emits a tool_use block, the loop looks up the tool by name via getTool(name), validates the input against inputSchema, calls execute(), and returns the result as a tool_result block.
Terminal tools (the ones with terminal: true) tell the loop to stop. For workers, those are complete_task, fail_task, and wait_task — any of which transitions the task out of in_progress.
CLI adapter
registerToolsAsCLI(program) iterates the registry and generates a Commander subcommand per tool, grouped by group:
botholomew membot read notes/meeting.md --offset 10 --limit 20
botholomew membot tree notes --max-depth 3
botholomew membot search "quarterly revenue"Positional args and --options are derived from the Zod schema shape. The same validation that runs for the LLM runs here, so you get the same error messages.
Registry
Tools register themselves on import, so adding a tool is a one-file change:
- Create
src/tools/<group>/<name>.tsexporting aToolDefinition. - Add
registerTool(myTool);tosrc/tools/registry.ts. - Write a test in
test/tools/<group>/<name>.test.ts.
No central dispatch table to edit, no LLM tool list to update, no CLI command to wire. The Zod schema is the source of truth.
Shared LinePatchSchema for edit tools
Any tool that mutates a markdown file via line-range patches — membot_edit, skill_edit, schedule_edit, task_edit, prompt_edit — should import { LinePatchSchema, applyLinePatches } from "../../fs/patches.ts" and reuse the same shape so the agent sees identical field descriptions across tools. The tool is responsible for re-parsing the patched output against its resource schema and rolling back on failure; the helper itself only handles the line splice. See Patch format for the field semantics.
membot_pipe — pipe a tool's output straight into context
Sometimes the agent wants a tool's full output to be searchable later but doesn't actually need to read it. A web fetch, an mcp_exec that returns a big JSON dump, a search_grep over a wide pattern — all of these can blow through the conversation budget if the bytes round-trip through the LLM.
membot_pipe is a meta-tool: you give it the name and arguments of another tool, plus a destination path, and it dispatches the inner tool, captures the stringified result, and writes it under context/ via the same ingest pipeline membot_write uses (chunked + embedded + indexed). The model only ever sees a small acknowledgment — path, byte count, and a 200-char preview — never the raw bytes.
agent → membot_pipe(tool_name="search_grep",
tool_input={...},
path="research/grep-results.txt")
→ { path: "research/grep-results.txt",
bytes_written: 184321, preview: "…" }
agent → membot_search("the thing I actually wanted to know")Two guards apply at the dispatch site:
- Terminal tools (
complete_task,fail_task,wait_task) andmembot_pipeitself are rejected witherror_type: "forbidden_tool". Piping a terminal tool would let the loop end without the orchestrator seeing the result; recursion is meaningless. - The inner tool's input is validated against its own
inputSchemabefore dispatch, so bad arguments come back aserror_type: "invalid_input"with field-level detail instead of an opaque crash.
If the inner tool returns is_error: true, nothing is written — the pipe returns error_type: "inner_tool_error" with the inner message inlined (capped at 2KB), so the agent can retry with different arguments.
membot_query — reduce a JSON blob without reading it
membot_pipe lands a big JSON result in the store; membot_query reduces it there. It runs a JSONata expression over the JSON at a logical_path and returns only the (usually small) result — so "bucket 303 entries by day" or "pull the id and subject of each" costs the size of the answer, not the size of the source.
agent → membot_pipe(tool_name="mcp_exec", tool_input={…}, path="mcp/inbox.json")
agent → membot_query(logical_path="mcp/inbox.json",
expression="${ $substring(ts,0,10): $count($) }")
→ { result: { "2026-05-31": 2, "2026-06-01": 1 }, result_type: "object" }JSONata expressions run against the parsed JSON root ($): filter $[amount > 100], pluck $.{ 'id': id, 'subject': subject }, dedup $distinct(email), sort+slice $^(>created)[[0..9]], sum $sum(amount). Set output_logical_path to write the result back as a new entry instead of returning it inline — that's how you chain pipe → query → query.
It is a declarative transform, not code execution: a JSONata expression can only read and reshape the document it's given, with no filesystem, network, or host access. (Arbitrary code execution is a separate, deferred design — see Milestone 15 for the reasoning.) Disclosure is token-light: the tool description carries only a handful of examples, and the full syntax reference comes back on a malformed expression or when you pass expression: "?". The source must be a logical_path (a complete JSON document) — membot_query deliberately won't read from a paged read_large_result id, whose page boundaries aren't valid JSON.
capabilities_refresh — the meta-tool
The capabilities-group tool capabilities_refresh exists so the agent can keep its own tool inventory fresh. It walks getAllTools() and mcpxClient.listTools(), then asks the configured chunker_llm to produce a thematic summary — one line per theme (e.g. "Gmail — read, send, draft, search, and reply to emails") rather than a line per tool. The result is written to prompts/capabilities.md (preserving frontmatter). Because that file is loaded into every system prompt, the next boot picks up the new inventory without another round-trip. Specific tool names are intentionally absent from the rendered file; the agent uses mcp_list_tools / mcp_search / mcp_info to look them up at call-time. See prompts.md for when the agent should call it. The matching CLI surface is botholomew capabilities, and the slash command is /capabilities.
Why Zod for the schema?
Zod gives us three things at once:
- Runtime validation. Untrusted inputs (from the model, from the CLI) are validated before
execute()runs. A malformed tool call becomes a cleartool_resulterror the model can recover from, not a crash. - TypeScript inference.
z.infer<typeof inputSchema>givesexecute()a statically-typedinputparameter. - JSON Schema export.
z.toJSONSchema()produces the schema the Anthropic API needs without a separate definition.
The entire adapter layer is ~80 lines (src/tools/tool.ts) because Zod does the heavy lifting.