Skip to content

The virtual filesystem

Botholomew's agent has no access to your real filesystem. Every piece of content the agent can touch lives in the context_items table as a row identified by a (drive, path) pair. When the agent calls context_read({ drive: "disk", path: "/Users/evan/notes/meeting.md" }), it is not opening that file on disk — it's reading the row that was ingested from it.

This is deliberate, and it's the single most important safety property of the system:

  • Safety. The agent cannot read your home directory, cannot overwrite your SSH keys, cannot rm -rf anything, cannot exfiltrate files it wasn't handed. A prompt-injected instruction telling it to "read ~/.ssh/id_rsa" has nothing to act on — that path doesn't exist in its world unless you ingested it. The worst a rogue agent can do is corrupt rows inside .botholomew/data.duckdb, which you can recover from a backup of a single file.
  • Portability. The entire "filesystem" is a single DuckDB file you can copy, share, or back up.
  • Searchability. Every "file" is already indexed, chunked, embedded, and queryable.
  • History. Everything the agent writes is recorded in threads/interactions, so you can audit every change.

Drives

Every context item lives under a drive. The drive names the origin of the content; the path is whatever that origin natively uses.

DrivePath shapeExample ref
diskabsolute filesystem pathdisk:/Users/evan/notes/meeting.md
urlfull URL (with scheme)url:/https://example.com/post
agentarbitrary agent-chosen pathagent:/notes/scratch.md
google-docsGoogle Docs document idgoogle-docs:/1AbCDEFGhij
github/<owner>/<repo>/<rest>github:/evantahler/botholomew/README.md

The drive:/path string form is the display and CLI convention. Internally, context_items has two columns — drive TEXT and path TEXT — with a UNIQUE(drive, path) index. That index is the identity key: an ingest that hits an existing (drive, path) is a refresh, never a duplicate.

New drives (additional MCP services) can be added by teaching src/context/drives.ts:detectDriveFromUrl to recognize their URLs and extract the right path shape.

The agent drive

Content written by the agent itself (via context_write) defaults to the agent drive. It has no external origin, so it's never a candidate for context_refresh.


The mapping

Filesystem conceptDuckDB representation
Identity(context_items.drive, context_items.path) — unique together
Display formdrive:/path (e.g. disk:/Users/x/foo.md)
File contentscontext_items.content (TEXT) or content_blob (BLOB)
MIME typecontext_items.mime_type
DirectoryA row with mime_type = 'inode/directory'
Directory listingItems filtered by drive and a path prefix, with intermediate directory segments derived from the matching paths
Binary fileis_textual = false, content in content_blob
Ingestion timeindexed_at, created_at, updated_at

The agent's tools

All tools that operate on context items take (drive, path) together. For context_read, context_info, and context_exists, path can also be a bare UUID or a drive:/path string — in those cases drive is ignored.

Discovery:

ToolWhat it does
context_list_drivesList every drive that has content, with counts — a good first call when you don't know what's ingested
context_treeWith no drive: list drives. With a drive: render a tree of that drive — the agent's bird's-eye view

Directory operations:

ToolWhat it does
context_create_dirCreate a directory placeholder row (defaults to drive: "agent")
context_dir_sizeSum length(content) for items under a drive/prefix

File operations:

ToolWhat it does
context_readRead an item's content; slice by line (offset/limit)
context_writeUpsert a row, trigger re-chunk + re-embed, return a tree snapshot (defaults to drive: "agent")
context_editApply git-style line-range patches
context_deleteRemove by (drive, path) or recursively by prefix
context_copyDuplicate a row to a new (drive, path)
context_moveRename or relocate a row — can move between drives
context_infoReturn metadata (size, lines, mime, indexed_at, drive, path, ref)
context_exists(drive, path) existence check
context_count_linesCount \n in content

These are also exposed from the host CLI:

bash
botholomew context add ~/notes/meeting.md        # ingests as disk:/Users/.../meeting.md
botholomew context add https://github.com/evantahler/botholomew/blob/main/README.md
                                                 # ingests as github:/evantahler/botholomew/README.md
botholomew context list
botholomew context read disk:/Users/evan/notes/meeting.md
botholomew context tree disk:/Users/evan/notes

Structured errors from context_read / context_info

When the agent passes a path that doesn't resolve, these tools return a structured is_error: true response (they do not throw) so the model can recover inside the same tool loop:

json
{
  "is_error": true,
  "error_type": "not_found",
  "message": "No context item at disk:/Users/evan/notes/architecture.md",
  "next_action_hint": "Nearby items under disk:/Users/evan/notes: disk:/Users/evan/notes/readme.md, disk:/Users/evan/notes/guide.md. Call context_tree({drive:\"disk\",path:\"/Users/evan/notes\"}) to see more."
}

The hint is built from findNearbyContextPaths — up to five siblings of the requested path's parent directory within the same drive, walking up until it finds a populated ancestor. context_read also returns error_type: "no_text_content" when the target exists but is binary (e.g. an image row).


Patch format for context_edit

ts
{ start_line: number, end_line: number, content: string }
  • start_line / end_line are 1-based inclusive.
  • end_line: 0 means insert without replacing.
  • content: "" means delete the line range.
  • Patches are applied bottom-up (descending start_line) so earlier line numbers remain stable.

Embedding cascade

Every mutation cascades into the embeddings table:

  • context_write → delete old chunks, re-chunk, re-embed, insert.
  • context_edit → same.
  • context_move → no embedding changes (embeddings reference the item id, not the path).
  • context_delete → cascade delete embedding rows.

Embeddings are stored as FLOAT[384] and queried by linear scan via array_cosine_distance() — no HNSW index, no VSS extension. The FTS index over chunk_content and title is rebuilt by rebuildSearchIndex() after every ingest write. See context-and-search.md for the full pipeline.


Why not just use files on disk?

A real filesystem would require:

  • path escaping, sandboxing, symlink resolution;
  • a separate indexer that must stay consistent with the files;
  • backup/versioning/synchronization logic.

A DuckDB row is already all of those things at once — transactional, searchable, and backed by a single file you can cp or sqlite3 (well, duckdb) into.

And the biggest reason: safety. A filesystem abstraction that happens to be a database is a filesystem the agent cannot escape. There is no .., no symlink, no /etc/passwd — just (drive, path) columns with a UNIQUE constraint. If you're comfortable letting a model make decisions on your behalf but not comfortable letting it touch your disk, that's exactly the trade Botholomew makes.

Released under the MIT License.