Skip to content

The knowledge store

Botholomew's agent has no access to your real filesystem. Its world is the membot knowledge store backing this project — a single DuckDB file at <projectDir>/index.duckdb, addressed by logical_path (an opaque string key, not a filesystem path). Every read, write, search, and delete the agent makes goes through the membot_* tools.

The safety properties this gives you:

  • No filesystem access. A prompt-injected instruction to "read ~/.ssh/id_rsa" fails because there is no tool that takes a host-filesystem path. The agent can only address entries already in the store.
  • Versioned. Every membot_write / membot_edit creates a new version_id. Deletes are tombstones, not unlinks. Use membot_versions to inspect history, membot_diff to compare two snapshots, and botholomew membot prune to permanently drop old versions when you want to.
  • Auditable. The DB is local, plain DuckDB, and your data lives in tables you can query directly with the DuckDB CLI if you ever want to.

The store itself is owned by membot — including the ingestion pipeline (PDF/DOCX/HTML → markdown, local WASM embeddings, hybrid BM25 + semantic search), URL refresh, and append-only versioning. This page documents the Botholomew-side surface: the agent tools, the line-patch edit shape, and the CLI passthrough.

Agent tools

Each membot_* tool wraps one membot operation. Names mirror upstream membot exactly so reading membot's docs gives you the same vocabulary the agent uses.

ToolPurpose
membot_addIngest a local file, directory, glob, URL, or inline:<text> literal.
membot_listList current entries (one row per logical_path).
membot_treeRender the path tree synthesized from / segments in logical_path.
membot_readRead the current (or a historical) version of an entry.
membot_searchHybrid semantic + BM25 search with RRF fusion.
membot_infoInspect metadata (source, mime, sha256s, refresh status) for one entry.
membot_statsCounts and storage summary for the whole store.
membot_versionsList every version of an entry (newest first).
membot_diffUnified diff between two versions of an entry.
membot_writeWrite inline content as a new version. Whole-file replace.
membot_moveRename a logical_path (creates a new version, tombstones the old).
membot_removeTombstone one or more entries. Use membot_prune to GC.
membot_refreshRe-fetch a URL-backed entry (if its source supports refresh).
membot_prunePermanently drop history older than a cutoff.

Botholomew adds six wrappers on top so the agent can use the file-shaped idioms it already knows:

WrapperBehavior
membot_editread → apply git-hunk line patches → write. Same LinePatchSchema as task_edit, schedule_edit, prompt_edit.
membot_copyreadwrite under a new logical_path. The source is untouched (use membot_move if you want to rename).
membot_existsinfo + catch not_found. Returns { exists: true | false } — never throws.
membot_count_lineswc -l over the markdown surrogate. Useful before a paginated read.
membot_pipeRun another tool and write its output as a new membot entry without ever flowing the body through the conversation.
membot_queryRun a JSONata transform over a JSON entry — group, filter, pluck, dedup, sort, aggregate — without loading the blob into context.

Reducing large JSON blobs with membot_query

External MCP tools often return big JSON arrays (an inbox, a list of issues, a table dump). Pulling the whole thing into the conversation to "count by day" or "pull these three fields" burns the context window. The pattern instead:

  1. Land it. membot_pipe the MCP call into a logical_path — the bytes go straight to the store, never through the conversation.
  2. Reduce it. membot_query that logical_path with a JSONata expression. Only the (usually small) result returns to the agent.

JSONata expressions run against the parsed JSON root ($). A few examples:

count by day:    ${ $substring(ts,0,10): $count($) }
filter:          $[amount > 100]
pluck fields:    $.{ 'id': id, 'subject': subject }
dedup a field:   $distinct(email)
top-10 newest:   $^(>created)[[0..9]]
sum a field:     $sum(amount)

Set output_logical_path to write the result back into the store as a new entry instead of returning it inline — handy for chaining pipe → query → query. Pass expression: "?" to get the full syntax reference back from the tool. This is a declarative transform, not code execution: a JSONata expression can only read and reshape the document it's given — it has no filesystem, network, or host access.

The patch format

membot_edit uses the shared LinePatchSchema from src/fs/patches.ts:

ts
{
  start_line: number,  // 1-based, inclusive
  end_line: number,    // 1-based, inclusive; 0 = insert without replacing
  content: string      // empty string deletes
}

Patches are applied bottom-up so earlier line numbers stay stable across a multi-hunk edit. The same shape powers task_edit, schedule_edit, prompt_edit, and skill_edit — one mental model across every resource the agent can mutate in place.

CLI passthrough

botholomew membot <verb> … spawns membot <verb> … --config <resolvedDir> (resolved from membot_scope~/.membot by default, <projectDir> if scope is "project") and forwards stdio. Run botholomew membot --help for the verb list.

bash
botholomew membot add ./docs/howto.md
botholomew membot add https://docs.google.com/document/d/...
botholomew membot search "how does the worker tick claim tasks?"
botholomew membot ls
botholomew membot tree
botholomew membot read docs/howto.md
botholomew membot versions docs/howto.md
botholomew membot diff docs/howto.md v1 v2

The Botholomew-specific helper is:

bash
botholomew membot import-global

It copies ~/.membot/index.duckdb and ~/.membot/config.json into the project so you can seed a new project with whatever you've built up in your personal membot. Refuses to overwrite a non-empty project store unless you pass --force.

Where Botholomew still uses real files

Knowledge is the only thing that moved into membot. These still live as real files under <projectDir>/:

  • tasks/<id>.md, schedules/<id>.md — markdown + strict frontmatter, with O_EXCL lockfiles for worker claim
  • threads/<YYYY-MM-DD>/<id>.csv — RFC-4180 conversation logs
  • workers/<id>.json — pidfile + heartbeat per worker
  • prompts/*.md — agent's persistent context (goals, beliefs, capabilities, and any you add)
  • skills/*.md — slash-command skills
  • logs/<YYYY-MM-DD>/<workerId>.log — worker stdout/stderr
  • config/config.json, mcpx/servers.json — settings

All of those still route through src/fs/sandbox.ts::resolveInRoot for path safety (NFC normalize, reject .. / NUL / absolute paths, lstat-walk every component) — that helper is general, not specific to knowledge content.

Released under the MIT License.