---
url: 'https://www.botholomew.com/architecture.md'
---
# Architecture

Botholomew is three cooperating process roles that share a single DuckDB
database:

1. **Workers** — short-lived or long-running `bun` processes that claim
   tasks from the queue, evaluate schedules, and run LLM tool loops.
   Multiple workers can run at once; each registers itself in the DB and
   heartbeats so dead ones are reaped.
2. **The chat TUI** — an Ink/React terminal UI you run on demand; it
   enqueues tasks, browses history, and can dispatch workers via the
   `spawn_worker` tool.
3. **The CLI** — everything else (`task add`, `schedule list`, `context
   search`, `worker list`, …). Each invocation opens its own DuckDB
   connection.

All share `.botholomew/data.duckdb`. DuckDB holds the file lock at the
instance level (not the connection), so **no process holds a DB
connection longer than a single logical operation**. Each CRUD call runs
inside a short-lived `withDb(dbPath, fn)` from `src/db/connection.ts`,
which acquires a connection, executes, and releases the instance when the
last overlapping caller in the process is done. `withRetry` wraps the
acquire path and retries with exponential backoff if another process is
holding the lock.

**Safety note.** None of these processes give the agent direct access to
your machine. Workers are the only things executing LLM tool calls, and
the only tools they see are the ones registered in `src/tools/` (all
operating inside `.botholomew/`) plus whichever MCP servers you
explicitly configured. There is no "just read the file system" escape
hatch. See [the virtual filesystem doc](virtual-filesystem.md) for the
full argument.

***

## The worker tick

A worker executes one `tick()` per cycle. In `--once` mode (the default),
a single tick runs and then the worker exits. In `--persist` mode, the
worker loops over ticks until it receives SIGTERM/SIGINT.

```
 tick() ─┐
         ├─► reset stale in_progress tasks (claimed > 3× max_tick_duration)
         ├─► processSchedules(workerId) — atomically claim each due
         │                                 schedule, ask the LLM which are
         │                                 "due", enqueue their tasks
         ├─► claimNextTask(workerId)   — highest-priority unblocked pending
         │                                 task; worker id is stamped on the
         │                                 `claimed_by` column
         ├─► createThread("worker_tick") — one thread per tick for logging
         ├─► buildSystemPrompt()       — always-context + task-relevant
         │                                 context
         ├─► runAgentLoop()            — multi-turn Anthropic tool-use loop
         │                                 every message, thinking block, tool
         │                                 call, and tool result is logged as
         │                                 an `interaction` row
         ├─► updateTaskStatus()        — complete / failed / waiting
         └─► endThread()
```

If no task is claimable and no schedule is due, `tick()` returns
`false`. A `--persist` worker then sleeps `tick_interval_seconds` before
trying again; a `--once` worker exits immediately.

See `src/worker/tick.ts`.

### Log format

Worker logs prefix every line with a local `HH:MM:SS` timestamp. Lifecycle
phases render as `[[phase-name]]` in bold magenta so they're easy to scan
and grep (`grep '\[\[' .botholomew/logs/<id>.log`). Phases emitted each tick:

* `[[tick-start]] #N`
* `[[evaluating-schedules]]` (only when any are enabled)
* `[[claiming-task]]`
* `[[tick-end]] #N Xs didWork=true|false`
* `[[sleeping]] Ns` (only when there was no work in a persist worker)

Background workers (spawned without a TTY) also mirror the conversation
thread to the log between `[[claiming-task]]` and `Task ... -> complete`,
so a `tail -f` shows what the LLM is actually doing:

* `[[assistant]] <full text response>` — assistant message blocks
* `[[tool-call]] <tool> <truncated JSON input>` — each tool invocation
* `[[tool-result]] <tool> ok|err in Ns` — tool outcome and duration

Full content (untruncated input, tool output, tokens) stays in the
`interactions` table; the log mirrors enough to follow the trace without
opening the DB. Foreground workers (`worker run`) keep their existing
streaming UX (per-token output and `▶`/`✓` markers) — these phase lines
are suppressed there to avoid duplication.

***

## Registration, heartbeat, reaping

Every worker writes a row into the `workers` table on start
(`registerWorker` in `src/db/workers.ts`) with its id (uuidv7), pid,
hostname, mode, optional pinned task id, optional `log_path`, and
`status='running'`. Detached workers (spawned via `worker start` or
`spawn_worker`) get a per-worker log file at
`.botholomew/logs/<worker-id>.log` — the spawn parent generates the id
and opens that file before launching the child, so the path is recorded
on the row from registration onward. Foreground workers (`worker run`)
have `log_path = null` and write to stdout instead.

From that moment, a non-blocking `setInterval` in
`src/worker/heartbeat.ts` bumps `last_heartbeat_at` every
`worker_heartbeat_interval_seconds` (default 15s) — independent of the
tick loop, so a worker mid-LLM-call still heartbeats reliably.

Persist workers also run a reaper interval
(`worker_reap_interval_seconds`, default 30s) that does two things:

1. Flips any worker whose heartbeat is older than
   `worker_dead_after_seconds` (default 60s) to `status='dead'` and
   releases every task and schedule claim it held. This is the
   failure-recovery path: anything from a terminal crash to a `kill -9`
   ends with the work reclaimable by another worker.
2. Deletes cleanly-stopped workers whose `stopped_at` is older than
   `worker_stopped_retention_seconds` (default 3600s). Dead workers are
   kept as forensic evidence; only the clean exits get auto-pruned so
   the `workers` table doesn't grow unbounded.

***

## The chat TUI

`botholomew chat` is a separate agent with its own system prompt and tool
set — it does **not** execute long-running work itself. Instead, it:

* answers questions about tasks, threads, and context,
* creates tasks (via `create_task`) that workers will pick up,
* spawns workers on demand (via `spawn_worker`) when the user wants work
  run right now,
* reads worker activity (`list_threads`, `view_thread`),
* looks up context items by path or UUID (`context_info`, `context_search`)
  and can refresh them in place (`context_refresh`),
* invokes **skills** (`/review`, `/standup`, …) defined in
  `.botholomew/skills/`,
* edits `beliefs.md` and `goals.md` via `update_beliefs` / `update_goals`.

It uses Anthropic's streaming API so tokens render in the TUI as they
arrive. Every session is itself a `chat_session` thread with the same
interaction log as a worker tick.

See `src/chat/` and `src/tui/`.

***

## Why two agents?

A single-agent design would force the chat loop to wait on whatever the
user asked — "summarize this 200-page PDF" blocks the UI for minutes. The
split:

* **Chat** is fast, streaming, and interactive. It understands the world
  via the database but doesn't touch it much.
* **Workers** are slow, autonomous, and batch-oriented. Each tick can
  take as long as it needs.

Both speak to the same database, so a worker's results are immediately
visible to the chat agent — and the chat agent can dispatch workers
without blocking.

***

## Automation without a resident daemon

Earlier versions of Botholomew shipped an OS-level watchdog (launchd on
macOS, systemd on Linux) to keep a single daemon alive. That's been
replaced: users now run workers directly, and there is no installed
background service. See [automation.md](automation.md) for cron-based
recipes and optional launchd/systemd examples if you want Botholomew to
advance on its own.

***

## Thread logging

Every interaction is persisted. A **thread** is one tick or one chat
session; an **interaction** is a single event within it (user message,
assistant message, tool call, tool result, thinking block, status change).

This gives Botholomew total observability without a separate tracing
stack — `botholomew thread view <id>` reads the same rows that produced
the work. Threads are also the chat agent's way of reporting on what
workers have been doing.

Schema lives in `src/db/sql/2-logging_tables.sql`; thread types are
`worker_tick` and `chat_session`.

***

## Connection model

Every process uses the same policy: **open a DuckDB connection for one
logical operation, then close it.**

* **Workers**: `tick()` takes a `dbPath`, not a held connection. Each
  call into `src/db/*` is wrapped in `withDb` — stale-task reset, task
  claim, thread create, every `logInteraction`, the status update. The
  LLM network round-trip holds no connection.
* **Heartbeat**: a separate `setInterval` opens its own short `withDb`
  every ~15s (`src/worker/heartbeat.ts`). This is deliberately decoupled
  from the tick loop so a long LLM call doesn't stall the heartbeat.
* **Chat**: `ChatSession` carries `dbPath`. Each write (user message
  log, tool-use log, tool-result log, title update, thread end) is its
  own `withDb`. Tool execution wraps each call in `withDb` so `ctx.conn`
  is scoped to that tool call only.
* **CLI invocations**: `withDb` in `src/commands/with-db.ts` opens a
  connection for the command, applies migrations, and closes when the
  callback returns.
* **TUI panels**: take `dbPath`, not `conn`, and wrap each refresh poll
  in `withDb`.

DuckDB's file lock is process-wide and held by the *instance*, not
individual connections. Within one process we refcount a shared instance
so overlapping `withDb` calls (e.g., parallel tool execution via
`Promise.all`, or the heartbeat firing alongside a tick) don't trip
DuckDB's "don't open the same DB twice" rule; when the last caller in the
process releases, we close the instance and free the OS-level lock so
another process can claim it.

Vector search uses `array_cosine_distance()` (core DuckDB, no extension)
over a linear scan of the `embeddings.embedding` column; the FTS
extension (`INSTALL fts; LOAD fts;`) is loaded at connect time for BM25
keyword search. See `src/db/connection.ts`.

***

## Multi-worker safety

Any number of workers can run against the same project concurrently
(spawned by CLI, the chat tool, cron, or `--persist`). Concurrency is
handled at the DB level:

* **Task claim** — `claimNextTask(conn, workerId)` issues an atomic
  `UPDATE tasks SET status='in_progress', claimed_by=?1 WHERE id=?2 AND
  status='pending' RETURNING *`. If another worker claimed the row
  first, `RETURNING` comes back empty and the loop tries the next
  candidate.
* **Schedule claim** — `claimSchedule(conn, id, workerId, opts)` is the
  same atomic UPDATE pattern, gated by both a
  `schedule_claim_stale_seconds` (default 300s) window on the existing
  claim and a `schedule_min_interval_seconds` (default 60s) window on
  `last_run_at`. Only one worker per schedule per window evaluates and
  enqueues tasks.
* **Stale release** — If a worker crashes mid-task, its claim is
  released when the reaper flips its row to `dead`. Existing `claim_at`
  staleness also catches tasks claimed for longer than 3× the tick
  duration, independent of the worker's heartbeat.

***

## Nuke: bulk database resets

During development and when reusing a project, you often want to wipe
part of the database without blowing away the whole `.botholomew/`
directory (which would also erase `soul.md`, `beliefs.md`, `goals.md`,
`config.json`, and your skills). `botholomew nuke` covers that:

| Scope | Clears |
|---|---|
| `nuke context` | `context_items`, `embeddings` |
| `nuke tasks` | `tasks` |
| `nuke schedules` | `schedules` |
| `nuke threads` | `threads`, `interactions` (both worker ticks and chat sessions) |
| `nuke all` | everything above plus `daemon_state` |

Each subcommand requires `-y`/`--yes` to actually delete — running
without the flag prints per-table row counts and exits, so it doubles
as a dry run. Nothing on disk (soul, beliefs, goals, config, skills) is
ever touched.

For safety, `nuke` refuses to run while any worker is in `status='running'`
— stop them first with `botholomew worker stop <id>`. The schema itself
(tables, `_migrations`) is always preserved.

See `src/commands/nuke.ts`.

***

## DB doctor: detect and repair index corruption

Under rare circumstances — typically after a hard crash or interrupted
write — DuckDB's primary-key index can fall out of sync with the row
data. The symptom is that `UPDATE`/`DELETE` against the affected rows
fails with `Invalid Input Error: Failed to delete all rows from index`.
Inside Bun, that FATAL error unwinds past the NAPI boundary as a C++
exception, surfacing as `panic: A C++ exception occurred` from
`Zig__GlobalObject__onCrash`. The CLI command, the worker tick loop, and
anything else that touches a corrupted row die immediately.

`botholomew db doctor` exists to detect and recover from this:

| Mode | What it does |
|---|---|
| `db doctor` (default) | For each user table, spawns a child Bun process that runs a self-update touching the PK index. Reports `ok` / `empty` / `missing` / `corrupt` per table. The child-process isolation is essential — a panic in the probe stays out of the doctor itself. |
| `db doctor --repair` | Refuses if any worker is **actually running** (PID alive). Stale `status='running'` rows whose PIDs are dead — the case that tends to coexist with workers-table corruption — are warned about but do not block repair, because flipping them to `stopped` would just trip the same corruption. Runs `CHECKPOINT`, `EXPORT DATABASE` to a timestamped directory under `.botholomew/`, renames the original `data.duckdb` (and `.wal`) to `data.duckdb.bak-<timestamp>`, opens a fresh DB at the original path, and `IMPORT DATABASE`s back. Indexes are rebuilt from data, which restores write integrity. After repair, `botholomew worker reap` cleans up the stale rows. |

Repair is idempotent and non-destructive: the original DB is preserved
as a `.bak-<timestamp>` file next to the new one. Delete the backup once
you've confirmed the rebuilt DB looks right.

See `src/db/doctor.ts` and `src/commands/db.ts`.

---

---
url: 'https://www.botholomew.com/automation.md'
---
# Automation

Botholomew no longer ships an OS-level watchdog. Earlier versions installed a
`launchd` plist or a `systemd` user service that kept a single daemon alive;
we dropped that because the install was heavy and opaque. Instead, you choose
how and when workers run.

This doc covers the common patterns. None of them are installed for you —
you copy the recipe that matches your needs.

***

## The shape of a scheduled run

`botholomew worker run` (one-shot, default mode) does one thing and exits:

1. Register a worker row in the DB.
2. Start a heartbeat `setInterval` so other workers know it's alive.
3. Evaluate any due schedules and enqueue their tasks.
4. Claim the next eligible pending task.
5. Run the LLM tool loop until the task is complete / failed / waiting.
6. Mark the worker `stopped` and exit.

If there's no eligible task, the worker exits immediately — safe to run on
a tight cron without overlapping concerns.

Two things make this safe to run concurrently with other workers:

* Task claims are atomic (`UPDATE ... WHERE status='pending' RETURNING *`).
* Schedule evaluation is gated by an atomic claim + a minimum-interval window,
  so two workers can't enqueue duplicate task batches from the same schedule.

See [architecture.md](architecture.md#multi-worker-safety).

***

## Pattern: cron (recommended)

One line. Put this in `crontab -e`:

```cron
# Every 5 minutes, advance one task in ~/projects/inbox-bot
*/5 * * * * cd ~/projects/inbox-bot && /usr/local/bin/botholomew worker run >> .botholomew/cron.log 2>&1
```

* Fire as often as you like; each fire is one task at most.
* Overlap is fine. If two fires start close together, one will claim the
  task and the other will exit without work.
* Resolve `botholomew` with a full path. cron's `PATH` is minimal; `which
  botholomew` from your shell gives you the right answer.
* Redirect to `.botholomew/cron.log` (or anywhere you like) so you can see
  what happened if a run misbehaves.

### More aggressive variants

If you have a backlog you want drained quickly, spawn background workers
every minute:

```cron
* * * * * cd ~/projects/inbox-bot && botholomew worker start >> .botholomew/cron.log 2>&1
```

Each worker still exits after one task; they just overlap freely. A
crashed worker is reaped within ~60s and its task goes back into the
queue.

***

## Pattern: a single long-running worker

Simplest UX for a workstation that's on most of the day: open a tmux or
screen pane and run a persist worker in it.

```bash
tmux new -s botholomew
botholomew worker run --persist
# Ctrl+B, D to detach
```

It'll tick every `tick_interval_seconds` (default 300) when the queue is
empty and back-to-back while there's work. Ctrl+C to stop cleanly (the
shutdown handler marks the worker `stopped`).

No cron, no watchdog, no systemd — and when you want to upgrade, you stop
the pane and start it again.

***

## Pattern: launchd (macOS, optional)

If you want Botholomew to survive logouts and start on boot without cron
or tmux, a minimal `~/Library/LaunchAgents/com.example.botholomew.plist`
looks like:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key><string>com.example.botholomew</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/botholomew</string>
    <string>--dir</string>
    <string>/Users/you/projects/inbox-bot</string>
    <string>worker</string>
    <string>run</string>
  </array>
  <key>StartInterval</key><integer>300</integer>
  <key>StandardOutPath</key>
  <string>/Users/you/projects/inbox-bot/.botholomew/launchd.log</string>
  <key>StandardErrorPath</key>
  <string>/Users/you/projects/inbox-bot/.botholomew/launchd.log</string>
</dict>
</plist>
```

Then:

```bash
launchctl load ~/Library/LaunchAgents/com.example.botholomew.plist
```

This runs `worker run` every 300s. You own the plist; Botholomew doesn't
touch it. If Botholomew lives in a folder launchd can't read (e.g., under
`~/Desktop` on newer macOS), grant Full Disk Access to whichever program
invokes the binary.

***

## Pattern: systemd user timer (Linux, optional)

Two files in `~/.config/systemd/user/`:

`botholomew-inbox.service`:

```ini
[Unit]
Description=Run one Botholomew worker tick

[Service]
Type=oneshot
WorkingDirectory=/home/you/projects/inbox-bot
ExecStart=/usr/local/bin/botholomew worker run
StandardOutput=append:/home/you/projects/inbox-bot/.botholomew/systemd.log
StandardError=append:/home/you/projects/inbox-bot/.botholomew/systemd.log
```

`botholomew-inbox.timer`:

```ini
[Unit]
Description=Run Botholomew every 5 minutes

[Timer]
OnBootSec=60
OnUnitActiveSec=5min
Unit=botholomew-inbox.service

[Install]
WantedBy=timers.target
```

Enable with:

```bash
systemctl --user daemon-reload
systemctl --user enable --now botholomew-inbox.timer
```

Same concurrency story as cron: each fire is one task at most.

***

## Troubleshooting

* **"Nothing's happening."** `botholomew worker list` shows every worker
  the DB has ever seen. Filter with `--status running` to see who's alive
  right now. If you see zero running and a non-empty queue, spawn one:
  `botholomew worker start --persist`.
* **"I see dead workers piling up."** Reaped crashes stay in the table as
  forensic evidence; only clean exits (`status='stopped'`) get auto-pruned
  (after `worker_stopped_retention_seconds`, default 1 hour). If dead rows
  are bothering you, `DELETE FROM workers WHERE status='dead'` clears them
  safely. `botholomew worker list --status dead` shows the list first.
* **"Cron runs aren't firing."** Check `grep CRON /var/log/syslog`
  (Linux) or `log show --predicate 'process == "cron"'` (macOS). Common
  causes: minimal `PATH`, or a relative path to `botholomew`.
* **"Two workers keep claiming the same task."** They don't — by design.
  The `claimed_by` column is stamped by the atomic UPDATE, so only one
  wins. If you're seeing duplicate **output**, it's because the task was
  re-run after its worker was reaped — check `worker list --status dead`.
* **"The log is getting huge."** Rotate it yourself (logrotate, newsyslog).
  Botholomew used to do this inside the old watchdog; it no longer does.

***

## Why no built-in watchdog?

Feedback from early users: installing `launchctl`/`systemctl` entries was
heavy, platform-specific, and opaque — and because it was installed per
project, it accumulated in `~/Library/LaunchAgents/` faster than users
expected. Replacing it with "run `worker run` however you already run
things" makes the footprint predictable and the failure modes familiar.
If you do want boot-time survival, the templates above give you what the
old watchdog provided, without the magic.

---

---
url: 'https://www.botholomew.com/owl-character-sheet.md'
---
# Botholomew Owl — Character Sheet

The Botholomew mascot is a small ASCII owl. All poses are 3 lines tall and
roughly 7 characters wide so they can be swapped frame-by-frame in the TUI.

***

## Base / Neutral

```
 {o,o}
 /)_)
  " "
```

***

## Emotions

### Happy

```
 {^,^}
 /)_)
  " "
```

### Excited

```
 {*,*}
 /)_)
  " "
```

### Sad

```
 {;,;}
 /)_)
  " "
```

### Surprised

```
 {O,O}
 /)_)
  " "
```

### Sleeping

```
 {-,-}
 /)_)
  " "
```

### Thinking

```
 {o,o}
 /)_) ?
  " "
```

### Confused

```
 {o,o}
 /)_) ~
  " "
```

### Dizzy

```
 {@,@}
 /)_)
  " "
```

### Alert / Error

```
 {!,!}
 /)_)
  " "
```

***

## Directional

### Wink

```
 {-,o}
 /)_)
  " "
```

### Looking Left

```
 {o,o}
 (_(\ 
  " "
```

### Looking Right

```
 {o,o}
 /)_)
  " "
```

***

## Poses

### Wings Up (celebrating)

```
 {^,^}
/)   (\
  " "
```

### Wings Out (presenting)

```
 {o,o}
/)_)/>
  " "
```

### Reading

```
 {o,o}
 /)_)
 _|"|_
```

### Typing

```
 {o,o}
 /)_)
 _|||_
```

***

## Animation Sequences

### Idle (looping)

Slow blink cycle, ~400ms per frame:

```
Frame 0:  Frame 1:  Frame 2:  Frame 3:
 {o,o}    {o,o}     {-,-}     {o,o}
 /)_)     /)_)      /)_)      /)_)
  " "      " "       " "       " "
```

### Thinking (looping)

Eyes shift side to side while thinking:

```
Frame 0:  Frame 1:  Frame 2:  Frame 3:
 {o,o}    {o,o}     {o,o}     {o,o}
 /)_) ?   /)_) .    /)_) ..   /)_) ...
  " "      " "       " "       " "
```

### Working (looping)

Typing animation:

```
Frame 0:  Frame 1:  Frame 2:  Frame 3:
 {o,o}    {-,o}     {o,o}     {o,-}
 /)_)     /)_)      /)_)      /)_)
 _|||_    _|||_     _|||_     _|||_
```

### Success

Quick celebration:

```
Frame 0:  Frame 1:  Frame 2:
 {o,o}    {^,^}     {^,^}
 /)_)    /)   (\   /)   (\
  " "      " "       " "
```

### Error

Surprise then alert:

```
Frame 0:  Frame 1:  Frame 2:
 {o,o}    {O,O}     {!,!}
 /)_)     /)_)      /)_)
  " "      " "       " "
```

### Startup

Wake-up sequence (play once):

```
Frame 0:  Frame 1:  Frame 2:  Frame 3:  Frame 4:
 {-,-}    {-,-}     {o,-}     {o,o}     {^,^}
 /)_)     /)_)      /)_)      /)_)      /)_)
  " "      " "       " "       " "       " "
```

---

---
url: 'https://www.botholomew.com/changelog.md'
description: 'Release history for Botholomew, pulled from GitHub releases.'
---

# Changelog

All releases are published to [GitHub](https://github.com/evantahler/botholomew/releases) and [npm](https://www.npmjs.com/package/botholomew).

---

---
url: 'https://www.botholomew.com/configuration.md'
---
# Configuration

Botholomew reads its settings from `.botholomew/config.json`. The full
schema lives in `src/config/schemas.ts`.

```json
{
  "anthropic_api_key": "",
  "model": "claude-opus-4-6",
  "chunker_model": "claude-haiku-4-5-20251001",
  "embedding_model": "Xenova/bge-small-en-v1.5",
  "embedding_dimension": 384,
  "tick_interval_seconds": 300,
  "max_tick_duration_seconds": 120,
  "system_prompt_override": "",
  "max_turns": 0,
  "worker_heartbeat_interval_seconds": 15,
  "worker_dead_after_seconds": 60,
  "worker_reap_interval_seconds": 30,
  "worker_stopped_retention_seconds": 3600,
  "schedule_min_interval_seconds": 60,
  "schedule_claim_stale_seconds": 300,
  "log_level": ""
}
```

***

## Keys

| Key | Default | Purpose |
|---|---|---|
| `anthropic_api_key` | `""` | Anthropic key. `ANTHROPIC_API_KEY` env var overrides. |
| `model` | `claude-opus-4-6` | Claude model for the main agent loop (workers + chat). |
| `chunker_model` | `claude-haiku-4-5-20251001` | Smaller/cheaper model used to propose chunk boundaries during ingestion and evaluate schedules. |
| `embedding_model` | `Xenova/bge-small-en-v1.5` | A local [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) feature-extraction model. Weights are downloaded on first use and cached under `.botholomew/models/`. Any feature-extraction model in the Xenova/\* namespace works — e.g. `Xenova/multilingual-e5-small` (also 384-dim) for non-English content. |
| `embedding_dimension` | `384` | Vector dimension. Must match the model. Changing model + dimension requires running `botholomew context reembed` to recompute every stored vector — old and new vectors aren't comparable. |
| `tick_interval_seconds` | `300` | Seconds a `--persist` worker sleeps between ticks **when there's no work**. It ticks back-to-back while a backlog exists. |
| `max_tick_duration_seconds` | `120` | Soft cap per tick. Stale-task reset fires at `3×` this value. |
| `system_prompt_override` | `""` | Appended to the built-in system prompt. Use this for project-specific instructions that should be always-loaded without editing `soul.md`. |
| `max_turns` | `0` | Maximum tool-use turns per agent loop (0 = unlimited). Safety net against runaway loops. |
| `worker_heartbeat_interval_seconds` | `15` | How often a running worker writes to `workers.last_heartbeat_at`. Runs on its own `setInterval`, independent of the tick loop, so long LLM calls don't starve the heartbeat. |
| `worker_dead_after_seconds` | `60` | A worker whose heartbeat is older than this is considered dead. The reaper flips its status to `dead` and releases every task/schedule claim it held. |
| `worker_reap_interval_seconds` | `30` | How often a `--persist` worker scans for dead peers to reap and prunes old cleanly-stopped workers. One-shot workers don't run the reaper. |
| `worker_stopped_retention_seconds` | `3600` | Cleanly-stopped workers older than this are deleted from the `workers` table. Dead workers are kept as forensic evidence and not auto-pruned. |
| `schedule_min_interval_seconds` | `60` | Minimum gap between successive evaluations of the same schedule. A schedule that ran less than this many seconds ago is skipped. |
| `schedule_claim_stale_seconds` | `300` | If a worker claimed a schedule but never released it (crash), another worker may steal the claim after this many seconds. |
| `log_level` | `""` | Verbosity for `botholomew` CLI logs. One of `silent`, `error`, `warn`, `info`, `debug`. Empty string falls back to the runtime default (`info` normally, `error` under `NODE_ENV=test`). `BOTHOLOMEW_LOG_LEVEL` env var overrides this. |

***

## Environment variables

| Var | Effect |
|---|---|
| `ANTHROPIC_API_KEY` | Overrides `anthropic_api_key` in config. |
| `BOTHOLOMEW_LOG_LEVEL` | Overrides `log_level` in config. One of `silent`, `error`, `warn`, `info`, `debug`. |
| `BOTHOLOMEW_NO_UPDATE_CHECK` | Disable the background "new version available" check. |

***

## Tuning guidance

**For personal/low-volume use:** defaults are fine. One tick every five
minutes is plenty when tasks are mostly "every morning, summarize my
email".

**For bursty workloads:** lower `tick_interval_seconds` to 30–60. A
persist worker only sleeps when the queue is empty, so this is safe — it
just reduces latency between the last item landing and the next tick
firing. Alternatively, spawn more one-shot workers (via cron or chat)
and leave the interval alone.

**For multi-worker setups:** if you routinely run more than a handful of
workers, consider lowering `worker_reap_interval_seconds` (so dead ones
are cleaned quickly) and raising `worker_dead_after_seconds` (so a
temporary DB-lock hiccup doesn't flip a live worker to dead). The
defaults (30s reap, 60s threshold) are conservative.

**For model-cost sensitivity:**

* Switch `model` to `claude-sonnet-4-*` or `claude-haiku-*`. Opus is the
  default because quality on complex knowledge work matters more than
  per-token cost for most users, but Sonnet handles the majority of
  tasks well.
* The `chunker_model` is already Haiku — leave it there.
* Lower `max_turns` (e.g., 15) to hard-cap tool-use budgets.

**For prompt-sensitive workflows:** use `system_prompt_override` to add
instructions without touching `soul.md`. This keeps the default
personality intact while layering on project-specific rules ("always
respond in British English", "never call mcp\_exec on the slack server
without confirmation", …).

***

## Per-project vs. global

There is no global config — everything is per-project. This is
deliberate: different projects have different goals, different MCP
servers, different beliefs. One Botholomew project's config shouldn't
leak into another's.

---

---
url: 'https://www.botholomew.com/context-and-search.md'
---
# Context & hybrid search

Botholomew's knowledge layer is a hybrid keyword + vector search system
backed entirely by DuckDB. It's how the agent finds "that thing I
mentioned last week" across thousands of ingested documents without
calling out to a vector DB service.

***

## The pipeline

When you add a document (`botholomew context add ./report.pdf` or the
agent writes via `context_write`), this happens:

```
 content ─► create context_item row  (drive, path)
         ─► LLM-driven chunker (claude-haiku-4-5 by default)
         ─► embedder (local @huggingface/transformers, default Xenova/bge-small-en-v1.5, 384-dim)
         ─► embeddings table (FLOAT[384])
         ─► rebuild FTS index (BM25 over chunk_content + title)
         ─► indexed_at set on the context_item
```

See `src/context/ingest.ts`, `src/context/chunker.ts`, and
`src/context/embedder.ts`.

***

## LLM-driven chunking

Fixed-size sliding-window chunking shreds structure: a heading lands in
one chunk, its bullets in another, and semantic search returns incoherent
fragments. Botholomew instead asks a **small, fast** model (Haiku by
default) to propose chunk boundaries for each document:

```json
{
  "chunks": [
    { "start_line": 1,   "end_line": 42  },
    { "start_line": 43,  "end_line": 98  }
  ]
}
```

The chunker only returns line ranges (1-based, inclusive) — see
`CHUNKER_TOOL` in `src/context/chunker.ts`.

Each chunk is embedded separately; the `title` and `description` come
from the parent `context_item` (set at ingestion time), are prepended
to the chunk's text at embed time (along with a `Source: drive:/path`
line), and surface in search results as the snippet. If the chunker
errors or times out, ingestion falls back to a deterministic
paragraph/line splitter (`chunkByTextSplit` in `src/context/chunker.ts`)
— semantic quality suffers, but the item still gets embedded.

***

## Storage

Context items live in `context_items` with a single identity key:

```sql
CREATE TABLE context_items (
  id         TEXT PRIMARY KEY,
  title      TEXT NOT NULL,
  content    TEXT,
  mime_type  TEXT NOT NULL DEFAULT 'text/plain',
  drive      TEXT NOT NULL,
  path       TEXT NOT NULL,
  indexed_at TEXT,
  ...
);
CREATE UNIQUE INDEX idx_context_items_drive_path
  ON context_items(drive, path);
```

That unique index is load-bearing: `context add` looks up `(drive, path)`
on every input to decide whether the ingest is a new insert or a refresh
of an existing row.

Embeddings live in `embeddings`:

```sql
CREATE TABLE embeddings (
  id               TEXT PRIMARY KEY,
  context_item_id  TEXT NOT NULL,
  chunk_index      INTEGER NOT NULL,
  chunk_content    TEXT,
  title            TEXT NOT NULL,
  description      TEXT NOT NULL DEFAULT '',
  embedding        FLOAT[384],
  created_at       TEXT NOT NULL DEFAULT (current_timestamp::VARCHAR),
  UNIQUE(context_item_id, chunk_index)
);
```

Vector similarity uses `array_cosine_distance` — a core DuckDB function,
no extension required. There is no HNSW index: at our scale (hundreds
to low thousands of rows) a linear scan beats the operational cost of
the experimental-persistence HNSW path, which has bitten us with
intermittent corruption more than once. Revisit when row counts reach
the millions.

Keyword search uses the **DuckDB FTS extension** (`INSTALL fts; LOAD
fts;`) for BM25 ranking over `chunk_content` and `title`. The FTS index
is a **snapshot** — it does not update incrementally on INSERT /
DELETE. Every writer must call `rebuildSearchIndex(conn)` from
`src/db/embeddings.ts` after its transaction commits. The ingest
pipeline (`src/context/ingest.ts`) is the only writer today and does
this automatically.

***

## Hybrid search

`hybridSearch()` in `src/db/embeddings.ts` combines two signals:

1. **Keyword** — `fts_main_embeddings.match_bm25(id, query)` over
   `chunk_content` and `title`. BM25 handles tokenization, stemming,
   stopwords, and length-normalized scoring, so multi-term queries
   strictly *increase* recall over single-term queries.
2. **Vector** — `array_cosine_distance(embedding, $query_embedding)`
   via a linear scan over the `embeddings` table.

Results are merged with reciprocal rank fusion (k=60), joined back to
`context_items` to pick up each hit's `drive` and `path`, and returned
as `(ref, title, score, snippet)`.

Exposed to the agent as `search_semantic` and `search_grep`, and to
you as `botholomew context search "..."`.

***

## Drives

Every context item lives under a **drive** — the name of its origin.
The built-in drives are:

| Drive | What lives there | Refreshable? |
|---|---|---|
| `disk`       | Local files (path = absolute filesystem path) | yes (re-reads from disk) |
| `url`        | Generic HTTP(S) pages (path = full URL) | yes (re-fetches) |
| `agent`      | Agent-authored scratch content | no (no external origin) |
| `google-docs` | Google Docs documents (path = doc id) | not yet |
| `github`     | GitHub repo content (path = /owner/repo/...) | not yet |

Drive detection lives in `src/context/drives.ts`. `detectDriveFromUrl`
inspects the URL (and optionally the MCP server name that served the
content) and returns the right `(drive, path)` pair. To add a new
drive, extend that function with a new pattern.

A refresh dispatch that isn't yet implemented (`google-docs`, `github`)
returns a per-item `error` so the user knows to re-add the URL
explicitly. Those items are still fully searchable — they just aren't
auto-refreshable yet.

***

## Contextual loading

When a worker picks up a task, `buildSystemPrompt()`
(`src/worker/prompt.ts`) doesn't just dump every context file into the
prompt — that would blow the context window. Instead:

1. All markdown files with frontmatter `loading: always` are included
   verbatim (e.g., `soul.md`, `beliefs.md`, `goals.md`).
2. The task name + description is embedded.
3. `hybridSearch()` finds top-N relevant chunks from the database.
4. Those chunks are appended to the system prompt as task-specific
   context, labelled with their `drive:/path` ref so the agent can
   jump to the full item via `context_read`.
5. Markdown files with `loading: contextual` are included only if their
   content shares keywords with the task.

***

## Loading context

Context gets into Botholomew two ways: local ingestion, and an
LLM-driven loading agent that handles URLs. There is **no LLM
placement** — the origin of the content determines its (drive, path)
directly.

### Local files and folders

```bash
botholomew context add ./notes               # walks the directory
botholomew context add ./report.pdf          # single file
botholomew context add ~/Documents/strategy
```

`context add` walks directories recursively, detects mime types, and
feeds every file through the ingestion pipeline (item → chunks →
embeddings). Every local file is stored with:

* `drive = "disk"`
* `path = <absolute filesystem path>`

Binary files (PDFs, images) are stored in `content_blob` with
`is_textual = false`; textual files are indexed for hybrid search.

### Remote content via a loading agent

URLs aren't `fetch()`d directly. Botholomew runs a focused LLM agent
(`src/context/fetcher.ts`) whose only job is to retrieve the content at
a URL using the MCP tools you have configured:

```bash
botholomew context add https://docs.google.com/document/d/abc123/edit
botholomew context add https://github.com/evantahler/botholomew/blob/main/README.md
botholomew context add https://example.com/blog/post

# Hand the fetcher extra guidance (auth notes, tool hints, etc.)
botholomew context add https://internal.corp/doc \
  --prompt-addition "Use the corp-wiki MCP server, not Firecrawl"
```

The fetcher runs a tool-use loop (up to 10 turns) with a small tool set:

* `mcp_list_tools` / `mcp_search` — discover which MCP tools are
  available and which might handle this URL.
* `mcp_info` — read a tool's input schema before calling it.
* `mcp_exec` — execute an MCP tool. The harness captures the full
  result and sends the LLM **only a 2,000-char preview**, keyed by the
  call's `tool_use_id`. Large pages don't explode the context window.
* `accept_content(exec_call_id, title, mime_type?)` — terminal. The
  agent picks which captured exec result to save by its id; the harness
  stores the full content it already has in memory. At save time the
  harness consults `detectDriveFromUrl(url, serverName)` to assign the
  right drive (e.g. `google-docs:/<docId>` when the Google Docs MCP
  served the content).
* `request_http_fallback()` — terminal. Explicit signal that no MCP
  tool fits; the harness then runs a plain `fetch()` + HTML strip.
* `report_failure(message)` — terminal. Surfaces an actionable message
  back to you ("this Google Doc is private — share it with your service
  account") instead of a silent failure.

If no MCPX client is configured at all, or if the loop exceeds its turn
budget, the fetcher falls back to plain HTTP with a 30s timeout and
extracts `<title>` for textual content. HTTP-fallback items live under
drive `url`.

### Collision handling

Before doing anything expensive, `context add` checks each input's
`(drive, path)` against what's already in context. If the same
`(drive, path)` is already ingested, the item is routed per
`--on-conflict`:

| Policy      | Behavior                                                                 |
| ----------- | ------------------------------------------------------------------------ |
| `error`     | Fast-fail if any input is already in context. |
| `overwrite` | Refresh content from the origin (diff + selective re-embed). |
| `skip` *(default)* | Log and move on — no write, no error. |

Re-running `context add` on already-ingested items is a no-op by
default. Use `--on-conflict=overwrite` when you want to refresh stored
content (or `botholomew context refresh` for the idiomatic flow), and
`--on-conflict=error` when you want a hard failure on collisions.

The agent-side `context_write` tool follows the same convention:
defaults to `on_conflict='error'` and returns a PATs-style
`error_type: "path_conflict"` with a `next_action_hint` that guides the
agent to `context_read` first or pass `on_conflict='overwrite'`.
On success, `context_write` also returns a `tree` field — a
`context_tree` snapshot of the current drive — so the agent can see
what else is nearby without a follow-up call.

### Refreshing stale content

```bash
botholomew context refresh disk:/Users/evan/notes/strategy.md
botholomew context refresh README.md           # bare path → resolves to disk:/<abs>
botholomew context refresh docs/*.md           # multiple paths (shell glob)
botholomew context refresh --all               # every non-agent item
```

`refresh` dispatches on the drive:

* `disk` → re-reads from the filesystem.
* `agent` → skipped (no external origin).
* Everything else → re-runs the loading agent against
  `context_items.source_url`, which is captured at ingest time. The
  built-in `url` drive also accepts its own path as a fallback (the path
  is the URL). Items without `source_url` — legacy rows created before
  that column landed, or rows from a drive whose origin isn't a URL —
  surface a per-item error and the user must re-add from URL. Refresh
  has no knowledge of any specific remote service; everything goes
  through `source_url`.

In all cases it compares the new content against what's stored, updates
only when they differ, and re-embeds only the changed items. Missing
files are reported, not silently dropped.

The same logic is exposed to the agent as the `context_refresh` tool,
which takes `ref` (a UUID, `drive:/path`, or `drive:/prefix` for a
subtree) or `all: true` and returns a structured summary along with a
post-refresh `tree` snapshot.

***

## Local embeddings

Botholomew runs embeddings locally via
[`@huggingface/transformers`](https://huggingface.co/docs/transformers.js).
The default model is `Xenova/bge-small-en-v1.5` (384-dim, ~33 MB). Weights
are downloaded the first time the model is used and cached under
`.botholomew/models/` — subsequent runs load from disk in milliseconds.

No API key, no per-token cost, no network dependency at query time. The
model loads lazily on the first embed call, so CLI startup stays fast.

ONNX Runtime runs in **WASM** mode (`onnxruntime-web`) rather than the
default native `onnxruntime-node` bindings, because the native bindings
segfault under Bun when another native module (DuckDB) is loaded in the
same process — see [oven-sh/bun#26081](https://github.com/oven-sh/bun/issues/26081).
The switch is implemented as a small `bun patch` against
`@huggingface/transformers` (see `patches/`) plus a `wasmPaths` override
in `src/context/embedder-impl.ts` that points the WASM loader at the
`onnxruntime-web/dist/` files already on disk — no CDN fetch at runtime.

> **Maintaining the patch.** When bumping `@huggingface/transformers`,
> re-run `bun patch '@huggingface/transformers@<version>'`, reapply the
> three edits in `src/backends/onnx.js` (drop the static
> `onnxruntime-node` import; route the `IS_NODE_ENV` branch to `ONNX_WEB`
> with `wasm` defaults), and run `bun patch --commit`. If the patch ever
> stops applying cleanly, the new `embedder.test.ts` regression case
> (DuckDB + embedder in the same process) will catch it.

To use a different model, set `embedding_model` and `embedding_dimension`
in `.botholomew/config.json`. Any feature-extraction model from the
Xenova/\* namespace works — for example, `Xenova/multilingual-e5-small`
(also 384-dim) handles mixed-language content much better than the default.

Changing models means old vectors and new vectors live in different
embedding spaces and aren't comparable. Run `botholomew context reembed`
to rebuild every vector with the new model.

History: an older milestone shipped with OpenAI
`text-embedding-3-small` (1536-dim) for quality reasons. Migration 18
(`18-reset_embeddings_for_local.sql`) reverts that decision — modern
small open-source models close the quality gap, and "no API key
required" is more in line with Botholomew's local-first stance.

---

---
url: 'https://www.botholomew.com/captures.md'
---
# Doc captures (screenshots & GIFs)

Screenshots and GIFs of the chat TUI are **generated**, not hand-taken, so they
stay current as the TUI evolves. One command regenerates every asset; the diff
of `docs/assets/` tells reviewers what changed.

## How it works

Two pieces:

1. **[VHS](https://github.com/charmbracelet/vhs)** drives a real PTY and
   renders a declarative `.tape` script (typed keystrokes + sleeps) into a
   GIF, MP4, or PNG.
2. **Fake LLM mode** — when `BOTHOLOMEW_FAKE_LLM=1` is set, every Anthropic
   client in the codebase is swapped for a scripted stub that streams
   fixture-defined replies (see `src/worker/fake-llm.ts`). This makes
   captures hermetic: no API key required, no network, and every run produces
   the same output.

## Install once

```bash
brew install vhs ttyd ffmpeg
```

(Linux: `apt install ttyd ffmpeg` plus VHS from its
[releases page](https://github.com/charmbracelet/vhs/releases).)

## Regenerate all assets

```bash
bun run capture
```

The script creates an ephemeral project directory under `$TMPDIR`, runs
`botholomew init` in it, then runs VHS once per tape in `docs/tapes/` — serially,
since VHS contends for the tty. Output GIFs land in `docs/assets/`. Commit
those changes alongside the TUI change that prompted them.

Run a single tape:

```bash
bun run capture chat-happy-path
```

## Adding a new capture

1. **Write a fixture** under `docs/tapes/fixtures/<name>.json`:

   ```json
   {
     "turns": [
       {
         "match": "optional regex against the user's message",
         "text": "The reply to stream back.",
         "chunkSize": 5,
         "delayMs": 30
       }
     ]
   }
   ```

   Turns without a `match` are consumed in order. Add `toolCalls` if the
   capture needs to show tool use. An optional top-level `env` object is
   merged into the VHS process env — handy for enabling capture-only hooks
   like `BOTHOLOMEW_CAPTURE_TAB_CYCLE` (see *Capture-only hooks* below).
   Fixtures are optional: a tape that doesn't invoke `botholomew chat`
   (e.g. a CLI demo) can skip the fixture file entirely.

2. **Write a tape** at `docs/tapes/<name>.tape`:

   ```tape
   Source docs/tapes/_common.tape
   Output docs/assets/<name>.gif

   Sleep 1s
   Type "botholomew chat"
   Sleep 600ms
   Enter
   Sleep 4s

   Type `whats on my schedule today`
   Sleep 600ms
   Enter
   Sleep 10s
   ```

   Note: `Type "..."` (double-quoted) for the shell command, `` Type `...` ``
   (backticked) for anything typed into the TUI — see the limitations
   section above.

   The fixture file must share the tape's base name. `_common.tape` pins
   terminal dimensions, theme, font, and typing speed — source it from every
   tape for a consistent look.

3. **Run** `bun run capture <name>` and review the output in `docs/assets/`.

4. **Embed** the GIF from the relevant doc with `![alt](./assets/<name>.gif)`.

## Why this approach

* **Deterministic.** Fake replies + pinned VHS settings mean byte-stable GIFs
  (modulo VHS upgrades). `git diff docs/assets/` is meaningful.
* **Hermetic.** No API key needed, so CI can regenerate captures on merge.
* **Decoupled.** The TUI itself is unchanged — the fake swap lives at the
  worker LLM boundary (`src/worker/llm-client.ts`), so the same stub can be
  reused for deterministic agent-loop tests.

## Known VHS/ttyd limitations

A few real sharp edges surfaced while building this; they're all worth
knowing before you write a new tape.

* **Always use backticks for `Type` content, not double-quotes.** VHS's tape
  parser drops characters from double-quoted strings when they're piped
  through ttyd into an Ink raw-mode TUI — you'll see only some of what you
  typed, or nothing at all. The correct form is:

  ```tape
  Type `whats on my schedule today`
  ```

  Double-quoted `Type "..."` is fine at the shell level (before the TUI
  launches), but use backticks for anything typed into the TUI input bar.

* **`Sleep N` is seconds.** `Sleep 500` is 8 minutes and 20 seconds. Always
  suffix: `Sleep 500ms`, `Sleep 2s`.

* **Non-text keystrokes (`Tab`, `Escape`) don't reliably reach Ink.** VHS's
  `Tab` / `Escape` commands send escape sequences that Ink's legacy parser
  under ttyd doesn't recognize. `Enter` works (it's just `\r`). If you need
  to drive tab navigation in a capture, use `-p "<prompt>"` to auto-submit an
  initial message, or add a CLI flag that lets the capture land on a
  specific tab.

* **Under `BOTHOLOMEW_FAKE_LLM=1` the chat command forces Ink's
  kitty-keyboard mode to `"disabled"`** (see `src/commands/chat.ts`), because
  ttyd can't negotiate the Kitty Keyboard protocol. Without that, even
  plain-text typing is dropped. Don't remove that guard without
  re-running `bun run capture`.

* **`Hide` … `Show` hides keystrokes from the recording.** If you want
  viewers to see the command being typed out, just don't use `Hide` — start
  the tape with the shell prompt visible and let the typing animation play.

## Capture-only hooks

The TUI has one env-var-gated affordance that exists purely for captures,
because VHS can't keystroke its way through the tab bar:

* **`BOTHOLOMEW_CAPTURE_TAB_CYCLE=<dwell-ms>`** (default `2500`) — when set,
  `src/tui/App.tsx` schedules timers that walk `activeTab` through
  2 → 3 → 4 → 5 → 6 → 7 → 1 with the given dwell between tabs. The hook is a
  no-op unless the env var is defined, so it doesn't affect normal use.
  `docs/tapes/full-tour.tape` enables this via its fixture's `env` block.

Seeded capture data (one task, one high-priority task, one schedule, one
context file) is added to every capture's ephemeral project directory by
`scripts/capture.ts`, so Tasks / Schedules / Context panels have visible
rows from the first frame.

## Keybinding reference (for the real TUI — not for tapes)

* `Tab` cycles tabs; `Shift+Tab` is not wired up.
* `1`–`7` jump to a tab **only when not on the Chat tab** (on Chat those keys
  are input).
* `Escape` returns to Chat from any other tab.
* `/` opens the slash-command popup; type to filter; `Escape` dismisses.
* `Ctrl+C` exits the TUI.

---

---
url: 'https://www.botholomew.com/getting-started.md'
---
# Get started

This page walks you from a clean machine to a running Botholomew worker
processing tasks. For deeper background, see
[Architecture](./architecture.md).

## Prerequisites

* **[Bun](https://bun.sh) 1.1+** — Botholomew is a Bun-native CLI.
* **An Anthropic API key** — Claude is the reasoning model.
* Embeddings run locally via `@huggingface/transformers` (default
  `Xenova/bge-small-en-v1.5`, 384-dim). The first call downloads ~33 MB
  of weights to `.botholomew/models/`; no API key is required.
* Optional: any [MCP servers](./mcpx.md) you want to expose to the agent
  (Gmail, Slack, GitHub, etc.) — managed through
  [MCPX](https://github.com/evantahler/mcpx).

## Install

```bash
bun install -g botholomew
```

Or run from a checkout:

```bash
git clone https://github.com/evantahler/botholomew
cd botholomew
bun install
bun run dev -- --help
```

## Initialize a project

In any directory you want Botholomew to operate inside:

```bash
botholomew init
```

This creates a `.botholomew/` directory with templates and a fresh
DuckDB database:

```
my-project/
  .botholomew/
    soul.md               # always-loaded identity (not agent-editable)
    beliefs.md            # always-loaded, agent-editable priors
    goals.md              # always-loaded, agent-editable goals
    capabilities.md       # always-loaded, agent-editable tool inventory
    config.json           # models, tick interval, API keys
    data.duckdb           # tasks, schedules, context, embeddings, logs
    mcpx/servers.json     # external MCP servers (Gmail, Slack, …)
    skills/               # slash commands (built-ins + user-defined)
    logs/                 # per-worker log files
```

Everything the agent can touch is here — see
[The virtual filesystem](./virtual-filesystem.md) for why.

## Configure API keys

Either export the environment variable:

```bash
export ANTHROPIC_API_KEY=sk-ant-...
```

…or set it in `.botholomew/config.json`. See
[Configuration](./configuration.md) for every key and its default.

## Queue work and run a worker

```bash
# Add a task to the queue
botholomew task add "Summarize every markdown file in ~/notes"

# Process it
botholomew worker run                  # one-shot: claim and run one task
botholomew worker run --persist        # long-running: loop until you stop it
```

Want it to run on its own? See [Automation](./automation.md) for cron,
tmux, launchd, and systemd recipes.

## Chat interactively

```bash
botholomew chat
```

The chat command opens an [Ink/React TUI](./tui.md) with eight tabs —
chat, tasks, workers, context, schedules, threads, history, and logs —
plus slash-command autocomplete, a message queue, tool-call
visualization, and a live workers panel.

## What's next

* [The CLI reference](https://github.com/evantahler/botholomew#the-cli)
  on GitHub
* [Architecture](./architecture.md) — workers, chat, shared DB
* [Tasks & schedules](./tasks-and-schedules.md) — the claim loop and
  recurring schedules
* [Context & hybrid search](./context-and-search.md) — ingest files,
  folders, and URLs
* [MCPX integration](./mcpx.md) — wire up external services
* [Skills](./skills.md) — slash-command templates the agent can also
  author at runtime

---

---
url: 'https://www.botholomew.com/mcpx.md'
---
# MCPX integration

Botholomew has no network, no shell, and no filesystem access on its
own. Everything external — reading email, searching the web, talking to
GitHub — comes from MCP servers, managed per project via
[**MCPX**](https://github.com/evantahler/mcpx).

Think of MCPX as the `package.json` of the agent's tools: a
project-local manifest (`.botholomew/mcpx/servers.json`) lists the MCP
servers this project can use, and workers and the chat session connect
to them at startup.

You have two options for *how* those servers run:

* **Run individual servers yourself.** Point MCPX at a stdio command
  (`npx ...`) or a remote HTTP endpoint. Good for a handful of
  well-known integrations.
* **Use an MCP gateway.** A gateway like
  [Arcade.dev](https://www.arcade.dev/) exposes hundreds of
  authenticated tools (Gmail, Google Drive, Slack, GitHub, Notion,
  Linear, …) behind one endpoint, handles OAuth for you, and is
  maintained centrally. Configure it once and Botholomew sees the full
  tool surface.

***

## Configuration

`.botholomew/mcpx/servers.json` uses the standard MCP client config
format:

```json
{
  "mcpServers": {
    "gmail": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-gmail"],
      "env": {}
    },
    "arcade": {
      "url": "https://api.arcade.dev/mcp/engineering",
      "headers": {
        "Authorization": "Bearer arc_xxxxxxx",
        "Arcade-User-ID": "you@example.com"
      }
    }
  }
}
```

Stdio entries launch a subprocess and speak MCP over pipes. Entries
with a `url` connect to a remote MCP server — this is the shape Arcade
(and most hosted MCP gateways) expect: a gateway endpoint plus
`headers` for auth. See [Arcade's docs](https://docs.arcade.dev/mcp-servers)
for the list of gateway URLs and how `Arcade-User-ID` scopes tool
access per user. MCPX accepts both shapes.

***

## Managing servers from the CLI

```bash
botholomew mcpx servers                                      # list configured server names
botholomew mcpx list                                         # every tool / resource / prompt across all configured servers
botholomew mcpx ping                                         # check connectivity to all servers (or pass names to filter)
botholomew mcpx add gmail --command npx --args "-y,@modelcontextprotocol/server-gmail"
botholomew mcpx add arcade --url https://api.arcade.dev/mcp/engineering
botholomew mcpx remove gmail                                 # --dry-run to preview, --keep-auth to keep stored tokens
botholomew mcpx auth arcade                                  # OAuth / token flow for HTTP servers
botholomew mcpx deauth arcade                                # clear stored OAuth tokens for a server
botholomew mcpx search "read email"                          # keyword + semantic search over all tools
botholomew mcpx info gmail                                   # server overview
botholomew mcpx info gmail list_messages                     # input schema for one tool
botholomew mcpx exec gmail list_messages '{"maxResults":10}' # dry-run a tool call
botholomew mcpx import-global                                # copy ~/.mcpx into this project
botholomew mcpx index                                        # rebuild the tool search index
botholomew mcpx resource arcade                              # list resources for a server (or read one by URI)
botholomew mcpx prompt arcade                                # list prompts for a server (or render one by name)
botholomew mcpx task <action> <server> [taskId]              # list/get/result/cancel async tool tasks
```

Every subcommand is a thin passthrough to the `mcpx` CLI, so
`botholomew mcpx <cmd> --help` shows the upstream reference — including
every option and argument for that command. The only exception is
`import-global`, which is Botholomew-specific.

`mcpx exec` is the fastest way to confirm a server is wired up before
handing it to the agent. `mcpx auth` runs the OAuth flow for HTTP
servers that need it (most Arcade gateways do), and `mcpx
import-global` is the usual way to bootstrap a new project from your
global `~/.mcpx/` configuration. Note that `--args` and `--env` take
**comma-separated** values — quote them so your shell doesn't split
them (e.g. `--args "-y,@scope/pkg"`).

***

## Lifecycle

`createMcpxClient(projectDir)` in `src/mcpx/client.ts`:

1. Reads `servers.json`.
2. Connects to every server.
3. Returns an `McpxClient | null` (`null` if no servers are configured).

Each worker (and the chat session) holds the client for its lifetime
and calls `client.close()` on SIGTERM/SIGINT. CLI commands like
`botholomew mcpx exec` open a client, do their work, and close it.

***

## How the agent sees MCP tools

Rather than flood the model's tool list with every MCP tool from every
server — which can easily be hundreds — Botholomew exposes a small set
of **meta-tools** the agent uses to discover and invoke MCP tools
dynamically:

| Tool | Purpose |
|---|---|
| `mcp_list_tools` | List MCP servers and the tools they provide |
| `mcp_search` | Semantic search across all MCP tool names + descriptions |
| `mcp_info` | Get the JSON Schema for a specific tool's input |
| `mcp_exec` | Execute a tool with validated input |

So the agent's flow to "check my email" looks like:

```
 mcp_search("read email") ────► returns gmail.list_messages, gmail.get_message, ...
 mcp_info("gmail.list_messages") ──► returns input schema
 mcp_exec("gmail.list_messages", { maxResults: 10 }) ──► actual email
```

This keeps the primary tool list small and lets you plug in dozens of
MCP servers without blowing the context window.

See `src/tools/mcp/*.ts`.

***

## Logging

Every MCP call is logged to the current thread as a `tool_use` /
`tool_result` interaction pair — identical to how built-in tools are
logged. Duration and token counts are captured. Query the `interactions`
table (or run `botholomew thread view`) to see exactly what the agent
sent and got back.

***

## When to add a server

You want an MCP server when:

* The agent needs to reach a specific service (Gmail, Slack, GitHub,
  Linear, Notion).
* You want to give the agent *write* access somewhere — sending
  messages, creating issues, editing docs.
* You're ingesting remote content into context — Firecrawl for web
  pages, Google Docs MCP for docs, etc.

You don't need a server when:

* The work happens entirely in `.botholomew/` (the virtual filesystem,
  embeddings, tasks, schedules).
* You just want Claude to *read* something you already put in context —
  `context_read` / `search_semantic` are enough.

---

---
url: 'https://www.botholomew.com/persistent-context.md'
---
# Persistent context & agent self-modification

The `.botholomew/` directory contains a handful of markdown files that
shape how the agent thinks. Some the agent can rewrite; some it can't.
Every one is versioned by frontmatter.

***

## The default files

`botholomew init` creates:

| File | Loading | Agent-editable? | Purpose |
|---|---|---|---|
| `soul.md` | `always` | **no** | Identity — who the agent is, how it behaves |
| `beliefs.md` | `always` | yes | Priors the agent has learned about the world/project |
| `goals.md` | `always` | yes | Current goals; updated as goals complete or change |
| `capabilities.md` | `always` | yes | LLM-summarized, thematic inventory of what the agent can do (built-in + MCPX); no specific tool names |

Each uses YAML frontmatter to declare its behavior:

```yaml
---
loading: always          # or "contextual"
agent-modification: true # or false
---

# Beliefs

- I should be concise and clear in my work products.
- I should ask for help when I'm stuck rather than guessing.
```

***

## Loading modes

**`loading: always`** — the file is concatenated into every system
prompt, verbatim. Use sparingly. `soul.md`, `beliefs.md`, and `goals.md`
are always-loaded.

**`loading: contextual`** — the file is included only if its content
shares keywords with the caller's current intent. The worker derives
keywords from the running task's name and description; the chat agent
derives them from your most recent message. Use this for
topic-specific notes ("Everything I know about our invoicing system")
that shouldn't pollute the prompt on unrelated tasks.

See `loadPersistentContext()` and `extractKeywords()` in
`src/worker/prompt.ts`.

***

## The hardcoded `## Style` block

After the persistent-context files (and after the optional MCP
section), every system prompt — worker and chat alike — appends a
short `## Style` block defined as `STYLE_RULES` in
`src/worker/prompt.ts`. It tells the model to skip sycophantic
preambles ("You're absolutely right!", "Great question!"), push back
when the user is wrong, and report failures and uncertainty directly.
This is hardcoded so it applies to every install without needing to
re-run `botholomew init`. Anything you put in `soul.md` or
`beliefs.md` still loads above it and can layer on top — if you'd
rather have a warm, chatty agent, say so there.

***

## Agent self-modification

When `agent-modification: true`, the agent can rewrite the file using
the `update_beliefs` or `update_goals` tools (`src/tools/context/`).
The flow:

1. Agent calls `update_beliefs` with the new full file content.
2. The tool reads the existing file, parses frontmatter with
   `gray-matter`, preserves the frontmatter block, and writes back the
   new body.
3. A `context_update` interaction is logged to the current thread, so
   you can see — and audit — every time the agent changed its own
   priors.

Files without `agent-modification: true` are read-only to the agent,
even if the tool is called — the tool checks the frontmatter and
refuses.

***

## `capabilities.md` — high-level tool inventory

`capabilities.md` is the same shape as `beliefs.md` / `goals.md`
(always-loaded, agent-editable), but its body is machine-generated
rather than hand-written. It's a **thematic summary** of what the
agent can do — built-in capabilities grouped into coarse themes (task
management, virtual filesystem, search, threads, …) and one theme per
external service reachable through MCPX (Gmail, GitHub, Linear, …).
Specific tool names are intentionally **omitted** from the rendered
file; the agent uses `mcp_list_tools`, `mcp_search`, or `mcp_info` to
look up exact names when it actually needs to invoke a tool. This
keeps the always-loaded context small (tens of lines instead of
hundreds).

Summarization uses Claude (the `chunker_model` from config) on every
refresh. When no Anthropic API key is configured, a static fallback
listing is rendered with internal themes + MCPX server names and tool
counts.

It's seeded at `botholomew init` with the built-in tools already
populated. Regenerate it any time via:

* `botholomew capabilities` — CLI refresh (honors `--no-mcp`)
* `capabilities_refresh` — the agent calls this tool itself when it
  suspects the inventory has drifted (new MCPX servers added, tools
  renamed, file deleted)
* `/capabilities` — the matching slash command in chat

Frontmatter is preserved on regeneration, so you can safely flip
`loading` to `contextual` if you'd rather only surface the file when
the task mentions tools.

***

## What this actually looks like

A typical `beliefs.md` after a few weeks of use:

```yaml
---
loading: always
agent-modification: true
---

# Beliefs

- Evan prefers bullet-point summaries over paragraphs.
- The "Q4 planning" doc in /notes is the canonical source for revenue targets.
- The worker should escalate to a "waiting" status if a task needs access
  to a tool that isn't configured, instead of failing outright.
- When summarizing email, strip quoted replies — they add tokens without
  value.
```

None of those were in the seed template — they accumulated as the agent
worked tasks and the chat user confirmed them. That's the whole point:
the agent gets smarter about *your* workflow over time, and you can
read (and edit) exactly what it believes.

***

## Why not put this all in a vector store?

Beliefs and goals are high-priority, always-loaded text — they're what
the agent uses to decide *what to do*, not raw reference material. Burying
them in a vector index means the agent might not retrieve them when it
matters. Keeping them as flat markdown with a hard `always` flag makes
them impossible to miss.

Long-form reference material (ingested PDFs, web pages, meeting notes)
lives in the [context & embeddings system](context-and-search.md)
instead. The two are complementary:

* **Persistent context** = how the agent thinks.
* **Context items / embeddings** = what the agent knows.

***

## Adding your own

Drop any `.md` file into `.botholomew/` with frontmatter:

```yaml
---
loading: contextual
agent-modification: false
---

# Our deployment checklist

1. Bump version in package.json
2. Run bun test && bun run lint
3. ...
```

Tasks mentioning "deploy", "release", or "version" — and chat messages
mentioning the same — will now include this file in the system prompt
automatically. You didn't have to register it anywhere. On every tick
the worker reads every `.md` file in `.botholomew/`, extracts words
longer than three characters from the current task's name and
description, and includes any `loading: contextual` file whose content
contains at least one of those words. The chat agent does the same on
every turn, using your most recent message as the keyword source. See
`loadPersistentContext()` in `src/worker/prompt.ts` for the exact
logic.

---

---
url: 'https://www.botholomew.com/skills.md'
---
# Skills (slash commands)

Skills are user-defined slash commands for the chat TUI. A skill is a
markdown file with frontmatter and a prompt template; when you type
`/<name>` in chat, the template is rendered and sent as a user message.

Think of them as reusable prompts — "summarize this conversation",
"review this file", "give me a standup update" — parameterized and
version-controlled alongside the project.

***

## File format

Skills live in `.botholomew/skills/<name>.md`:

```yaml
---
name: review
description: "Review a file for quality and issues"
arguments:
  - name: file
    description: "Path to the file to review"
    required: true
  - name: focus
    description: "What to focus on (security, performance, etc.)"
    required: false
    default: "general quality"
---

Please review the file at `$1`. Read it with the available tools, then provide:
1. A brief summary of what the file does
2. Any issues or concerns (bugs, security, performance)
3. Suggestions for improvement
4. An overall assessment (focus: $2)
```

**Frontmatter fields:**

| Field | Required? | Purpose |
|---|---|---|
| `name` | no | Defaults to filename. Determines the slash command name |
| `description` | yes | Shown in `/skills` listing and `/help` |
| `arguments` | no | Array of argument definitions (name, description, required, default) |

**Body:** Markdown prompt template with variable substitution.

***

## Variable substitution

| Placeholder | Meaning |
|---|---|
| `$ARGUMENTS` | The entire argument string as typed |
| `$1`, `$2`, … | Positional arguments (split on whitespace, quoted strings respected) |
| `$<name>` | Named placeholder bound to the declared argument with that `name` (same value as the matching positional `$N`) |

Missing optional arguments fall back to their `default`. Missing required
arguments cause a validation error before the skill is sent — the TUI
prints a `Usage:` line and never calls the LLM.

Named and positional placeholders refer to the same underlying value.
The `arguments[]` order in frontmatter sets each name's slot: the first
declared argument is `$1` and `$<first-name>`, the second is `$2` and
`$<second-name>`, and so on. Named placeholders are word-boundary
matched (so `$start` won't clip `$start_date`), and longer names are
substituted first when both exist.

Example:

```
> /review src/cli.ts security
```

becomes `$1 = $file = "src/cli.ts"`, `$2 = $focus = "security"`, and
`$ARGUMENTS = "src/cli.ts security"`.

***

## Built-in defaults

`botholomew init` ships three skills out of the box:

**`summarize.md`** — summarize the current chat conversation.

**`standup.md`** — generate a standup update from recent tasks (completed
in the last 24h + in progress).

**`capabilities.md`** — rescan every built-in and MCPX tool and rewrite
`.botholomew/capabilities.md` (see
[persistent-context.md](persistent-context.md#capabilitiesmd--high-level-tool-inventory)).

More are easy to add; see the quickstart below.

***

## Invoking skills

From inside `botholomew chat`:

```
> /skills              # list all available skills
> /summarize           # run the summarize skill
> /review src/cli.ts   # positional argument becomes $1
```

### Autocomplete popup

Typing `/` at the start of the input pops up a menu of matching
commands (built-ins `/help`, `/skills`, `/clear`, `/exit` plus every
skill loaded from `.botholomew/skills/`). Each row shows the command
name and its description.

| Key | Action |
|---|---|
| `↑` / `↓` | Move the highlight |
| `Tab` or `Return` | Accept the highlighted command (fills in `/<name> ` so you can type arguments) |
| `Esc` | Close the popup without changing the input |

The popup filters as you keep typing, and it disappears once you type
a space — so a second `Return` submits the message as usual.

When a skill runs, the TUI's user bubble shows the literal slash command
you typed (e.g. `/review src/cli.ts security`), not the rendered prompt
body — that keeps the chat transcript readable. The agent still receives
the fully-rendered prompt as its user message.

***

## Managing skills from chat

Skills aren't write-once-via-CLI: the chat agent can list, read, create,
edit, search, and delete them on demand. Six tools are exposed to the chat
agent:

| Tool | What it does |
|---|---|
| `skill_list` | List skills (name, description, args, file path) |
| `skill_read` | Read a skill's raw file contents and parsed fields |
| `skill_search` | Keyword search across name, description, body, and arg metadata |
| `skill_write` | Create or overwrite a skill (`on_conflict: 'error' \| 'overwrite'`) |
| `skill_edit` | Apply git-style line-range patches to an existing skill |
| `skill_delete` | Delete a skill file by name |

Newly written or edited skills are picked up at the start of the *next*
user message — `ChatSession.skills` is reloaded from disk in
`sendMessage`. So a typical flow looks like:

```
> save this prompt as a skill called daily-log so I can run it tomorrow
[agent calls skill_write]
> /daily-log              # works immediately, no chat restart needed
```

`skill_write` rejects the reserved built-in names (`help`, `skills`,
`clear`, `exit`) with `error_type: "reserved_name"`. It also normalizes
names to `[a-z0-9-]`, sets the frontmatter `name` to match the
filename, and re-parses the generated file before writing — so an
invalid skill never lands on disk.

`skill_edit` re-parses after applying patches and refuses to write
if the result fails validation, so you can't break a skill from chat.

**Editing skills outside the chat** (e.g., with your text editor) still
requires a chat restart — the in-memory cache is only refreshed inside
`sendMessage`.

***

## CLI management

```bash
botholomew skill list                 # table of all skills (supports --limit / --offset)
botholomew skill show review          # print the full skill file
botholomew skill create daily-log     # scaffold a new skill
botholomew skill validate             # parse every .botholomew/skills/*.md and report errors
botholomew skill validate path.md     # validate a single file (handy before committing)
```

`skill show` exits non-zero if the name doesn't match a loaded skill, and
prints the available skill names to stderr. `skill validate` exits
non-zero if any file fails to parse, so it fits naturally into a
pre-commit hook or CI check.

Skills are parsed by `src/skills/parser.ts` and loaded from disk by
`src/skills/loader.ts`. The `ChatSession` caches them on session start
and reloads them at the top of every `sendMessage` — so skills the
chat agent creates or edits via the `skill_*` tools are usable on the
next user message. Direct file edits made outside the running chat
(e.g., from your editor) take effect on the next user message in any
active session, but won't appear retroactively in history.

***

## Writing a good skill

* **Be explicit about what you want.** The model doesn't know the shape
  of the output unless you describe it.
* **Use positional args, not free-form.** `/review src/cli.ts` is easier
  to tab-complete than `/review --file=src/cli.ts`. In the body, you
  can reference each argument either positionally (`$1`, `$2`) or by
  the name you gave it (`$file`, `$focus`) — pick whichever reads
  better.
* **Reference tools by name.** "Read the file with `context_read`" nudges
  the agent toward the right tool and keeps token counts down.
* **Keep them short.** A skill is a prompt, not a program. If your skill
  is 200 lines of conditional logic, it probably wants to be a real
  tool.

***

## Why not just type the prompt?

Because you'll type it a hundred times. Skills are pure convenience —
but they're also version-controllable, shareable (copy a `skills/`
directory between projects), and discoverable (`/skills` shows them all).
They turn "the prompt I always use for standup updates" into a durable
project asset.

---

---
url: 'https://www.botholomew.com/tasks-and-schedules.md'
---
# Tasks & schedules

The task queue is Botholomew's execution substrate. Humans and agents
both write to it; workers are the readers.

***

## Tasks

A task is a unit of work with a lifecycle:

```
 pending ──► in_progress ──► complete
     │            │          │ failed
     │            │          │ waiting
     │            └── (reset by timeout)
     ▼
 blocked (via blocked_by)
```

**Columns** (`src/db/sql/1-core_tables.sql`):

| Field | Type | Notes |
|---|---|---|
| `id` | TEXT | UUIDv7 |
| `name` | TEXT | Short title |
| `description` | TEXT | Full description for the LLM |
| `priority` | ENUM | `low` / `medium` / `high` |
| `status` | ENUM | `pending` / `in_progress` / `failed` / `complete` / `waiting` |
| `waiting_reason` | TEXT | Set when the agent calls `wait_task` |
| `claimed_by` | TEXT | Worker id (`workers.id`) that claimed it |
| `claimed_at` | TEXT | ISO timestamp |
| `blocked_by` | JSON\[] | Array of task IDs that must complete first |
| `context_ids` | JSON\[] | Context items referenced by this task |
| `output` | TEXT | The `summary` from `complete_task` (added in migration 8) |

***

## The claim loop

`claimNextTask(conn, workerId)` in `src/db/tasks.ts`:

1. Select `pending` tasks where every `blocked_by` ID is in status
   `complete`.
2. Order by priority, then `created_at`.
3. Atomically `UPDATE ... WHERE status='pending' RETURNING *`, stamping
   the calling worker's id on `claimed_by`. If `RETURNING` comes back
   empty, another worker claimed it first — the loop tries the next
   candidate.

Multiple workers can race on the same queue safely because the atomic
UPDATE is serialized at the DuckDB instance level.

A worker holds its claimed task for the duration of the tick. Two
cleanup paths release stuck tasks:

* **Timeout**: `resetStaleTasks()` (called at the top of every tick)
  reclaims rows whose `claimed_at` is older than
  `max_tick_duration_seconds * 3` and sets them back to `pending`.
* **Dead worker**: `reapDeadWorkers()` flips any worker whose
  `last_heartbeat_at` is older than `worker_dead_after_seconds` to
  `dead` and releases every task and schedule claim held by that
  worker. See [architecture.md](architecture.md#registration-heartbeat-reaping).

A single worker can also target a specific task via
`claimSpecificTask(conn, taskId, workerId)` — used by
`botholomew worker run --task-id <id>` and the chat `spawn_worker` tool.

***

## DAG validation

`blocked_by` defines a dependency DAG. Cycles would deadlock the claim
loop, so `validateBlockedBy()` rejects them at insert time:

* DFS from each blocker, looking for a path back to the task being
  created.
* If any path exists, `createTask()` throws.

This is cheap because the graph is almost always shallow — the common
pattern is "produce N subtasks from a schedule" which is a flat
one-level fan-out.

***

## Predecessor outputs

When the agent works a task that was blocked by others, it doesn't start
from zero. `runAgentLoop()` (`src/worker/llm.ts`) fetches each blocker's
`output` (the summary passed to `complete_task`) and injects it into the
user message:

```
Task:
Name: Produce weekly summary
Description: ...
Priority: medium

Predecessor Task Outputs:
### Read email (01JE...)
- 3 urgent threads from customers about Q4 rollover...

### Check calendar (01JE...)
- 5 meetings this week, 2 with external stakeholders...
```

This is how multi-step workflows chain without a dedicated orchestrator.

***

## Schedules

A schedule is a recurring task template described in natural language:

```bash
botholomew schedule add "Morning review" \
  --frequency "every weekday at 7am" \
  --description "Read my email, check my calendar, draft a morning summary"
```

**Columns:**

| Field | Notes |
|---|---|
| `frequency` | Plain text — "every morning", "weekly on Mondays", "every 2 hours" |
| `last_run_at` | ISO timestamp of last evaluation that created tasks |
| `enabled` | Boolean |
| `claimed_by` | Worker id currently evaluating this schedule (or null) |
| `claimed_at` | ISO timestamp when the current claim was taken |

***

## LLM-evaluated "is it due?"

Instead of parsing cron expressions, `processSchedules(dbPath, config, workerId)`
(`src/worker/schedules.ts`) first **claims** each enabled schedule via an
atomic `UPDATE schedules SET claimed_by=?1 WHERE id=?2 AND (claimed_at IS
NULL OR claimed_at < stale_cutoff) AND (last_run_at IS NULL OR last_run_at
< now - min_interval) RETURNING *`. Only the worker that wins the claim
evaluates that schedule — so two concurrent workers evaluating the same
schedule never produce duplicate task batches.

Once a worker holds the claim, it asks the model:

> Given the frequency `"every weekday at 7am"`, `last_run_at`
> \= 2025-04-16T07:03:12Z, and now = 2025-04-17T07:41:05Z — is this
> schedule due? If yes, what task(s) should be created?

The LLM returns structured output: `{ isDue: boolean, tasksToCreate:
Array<{ name, description, priority }> }`. If the schedule describes a
multi-step workflow ("read email and summarize"), the model can return
multiple tasks with `blocked_by` linking them — so a schedule naturally
expands into a chained DAG.

Trade-offs:

* **Flexibility.** "Every weekday at 7am, except US holidays, unless I'm
  on vacation (check calendar)" is specifiable in English and evaluable
  by the model.
* **Cost.** One (cheap) model call per enabled schedule per tick. For
  dozens of schedules this is negligible; for thousands, you'd want a
  parser.
* **Drift.** The model's idea of "morning" might not match yours.
  Tighten the frequency text if you see misfires.

`botholomew schedule trigger <id>` runs the same evaluation loop on
demand and creates the task(s) immediately — handy for verifying that
a new schedule produces the tasks you expect without waiting for the
next tick.

***

## Running the queue by hand

```bash
# Add work
botholomew task add "Draft Q4 retro" --priority high

# Inspect (newest first; supports --status, --priority, --limit, --offset)
botholomew task list --status pending
botholomew task list --limit 20 --offset 20
botholomew task view <id>

# Run a worker now (foreground, one-shot by default)
botholomew worker run
botholomew worker run --persist       # long-running tick loop
botholomew worker run --task-id <id>  # target a specific task

# Unstick a task
botholomew task reset <id>
botholomew task delete <id>

# Manually fire a schedule
botholomew schedule trigger <id>
```

All of the same operations are available to the chat agent (`create_task`,
`list_tasks`, `view_task`, `update_task`, `delete_task`, `create_schedule`,
`list_schedules`) so you can drive the queue conversationally too.
`delete_task` refuses tasks in `in_progress` — the worker has no
mid-execution interrupt, so wait for it to finish or run
`botholomew task reset <id>` from the CLI first.

---

---
url: 'https://www.botholomew.com/tools.md'
---
# The Tool class

Every tool the agent can call — and every matching CLI subcommand you
can run yourself — is defined once as a `ToolDefinition`. A single
definition drives three consumers:

1. **The Anthropic SDK** (via `input_schema: JSONSchema`) so the model
   can call it.
2. **Commander.js** via an auto-generated subcommand.
3. **Tests**, which import the tool directly and call `execute()`.

This lives in `src/tools/tool.ts`.

***

## Shape of a tool

```ts
import { z } from "zod";
import type { ToolDefinition } from "../tool.ts";

const inputSchema = z.object({
  summary: z.string().describe("Summary of work done"),
});

const outputSchema = z.object({
  message: z.string(),
  is_error: z.boolean(),
});

export const completeTaskTool = {
  name: "complete_task",
  description:
    "Mark the current task as complete with a summary of what was accomplished.",
  group: "task",
  terminal: true,
  inputSchema,
  outputSchema,
  execute: async (input, ctx) => ({
    message: `Task completed: ${input.summary}`,
    is_error: false,
  }),
} satisfies ToolDefinition<typeof inputSchema, typeof outputSchema>;
```

**Fields:**

| Field | Purpose |
|---|---|
| `name` | Snake-case identifier; also the CLI subcommand name |
| `description` | Used for both the LLM tool definition and CLI help text |
| `group` | Groups tools into CLI namespaces (`task`, `file`, `dir`, …) |
| `terminal` | If `true`, the agent loop ends when this tool is called (e.g., `complete_task`, `fail_task`, `wait_task`) |
| `inputSchema` | Zod schema with `.describe()` per field — becomes JSON Schema for the model and Commander flags for the CLI |
| `outputSchema` | Zod schema guaranteeing the shape of the response |
| `execute` | The actual implementation, receiving validated input and a `ToolContext` |

***

## ToolContext

Every tool receives a `ToolContext`:

```ts
interface ToolContext {
  conn: DbConnection;             // short-lived connection, scoped to this tool call
  dbPath: string;                 // for long-running tools that manage their own withDb
  projectDir: string;             // absolute path to the project
  config: Required<BotholomewConfig>;  // resolved config (API keys, model, …)
  mcpxClient: McpxClient | null;  // external MCP tools (may be null)
}
```

This is the only capability surface. A tool that isn't handed an
`mcpxClient` can't reach the network; a tool that doesn't use `conn` or
`dbPath` can't touch the database.

### `conn` vs `dbPath`

The executor (`runAgentLoop` / `runChatTurn`) wraps each tool call in
`withDb(dbPath, async (conn) => tool.execute(input, { ...ctx, conn }))`.
That means:

* `ctx.conn` is **already open** for the duration of one `execute()` call
  and will be closed immediately after. Use it for ordinary tools that
  do one or two quick queries.
* `ctx.dbPath` is for tools that run long enough that holding the file
  lock would block the worker or CLI (e.g., `context_refresh` re-fetching
  many URLs). Wrap each DB touch in
  `await withDb(ctx.dbPath, async (conn) => { … })` so the lock is
  released between items.

DuckDB holds the file lock at the instance level. A tool that hangs on
`ctx.conn` through a long network round-trip keeps that lock held. When
in doubt, prefer granular `ctx.dbPath` wrapping.

***

## Anthropic adapter

`toAnthropicTools()` walks the registry and converts each Zod input
schema to the Anthropic SDK's `Tool` type using `z.toJSONSchema()`:

```ts
{
  name: "context_write",
  description:
    "Write content to a context item. By default, fails if the (drive, path) already exists — pass on_conflict='overwrite' to replace.",
  input_schema: {
    type: "object",
    properties: { /* derived from Zod */ },
    required: ["drive", "path", "content"],
  }
}
```

`context_write` accepts an optional `on_conflict: "error" | "overwrite"`
input (default `"error"`). A collision returns `is_error: true`,
`error_type: "path_conflict"`, and a `next_action_hint` that steers the
model back to `context_read` or a retry with `on_conflict='overwrite'`.

`runAgentLoop()` feeds this array into `client.messages.create({ tools:
... })`. When the model emits a `tool_use` block, the loop looks up the
tool by name via `getTool(name)`, validates the input against
`inputSchema`, calls `execute()`, and returns the result as a
`tool_result` block.

Terminal tools (the ones with `terminal: true`) tell the loop to stop.
For workers, those are `complete_task`, `fail_task`, and `wait_task` —
any of which transitions the task out of `in_progress`.

***

## CLI adapter

`registerToolsAsCLI(program)` iterates the registry and generates a
Commander subcommand per tool, grouped by `group`:

```bash
botholomew context read disk:/Users/evan/notes/meeting.md --offset 10 --limit 20
botholomew context tree disk:/Users/evan/notes --max-depth 3
botholomew search semantic "quarterly revenue"
```

Positional args and `--options` are derived from the Zod schema shape.
The same validation that runs for the LLM runs here, so you get the same
error messages.

***

## Registry

Tools register themselves on import, so adding a tool is a one-file
change:

1. Create `src/tools/<group>/<name>.ts` exporting a
   `ToolDefinition`.
2. Add `registerTool(myTool);` to `src/tools/registry.ts`.
3. Write a test in `test/tools/<group>/<name>.test.ts`.

No central dispatch table to edit, no LLM tool list to update, no CLI
command to wire. The Zod schema is the source of truth.

***

## `capabilities_refresh` — the meta-tool

The `capabilities`-group tool `capabilities_refresh` exists so the
agent can keep its own tool inventory fresh. It walks `getAllTools()`
and `mcpxClient.listTools()`, then asks Claude (via
`chunker_model`) to produce a **thematic summary** — one line per
theme (e.g. "Gmail — read, send, draft, search, and reply to emails")
rather than a line per tool. The result is written to
`.botholomew/capabilities.md` (preserving frontmatter). Because that
file is loaded into every system prompt, the next boot picks up the
new inventory without another round-trip. Specific tool names are
intentionally absent from the rendered file; the agent uses
`mcp_list_tools` / `mcp_search` / `mcp_info` to look them up at
call-time. See
[persistent-context.md](persistent-context.md#capabilitiesmd--high-level-tool-inventory)
for when the agent should call it. The matching CLI surface is
`botholomew capabilities`, and the slash command is `/capabilities`.

***

## Why Zod for the schema?

Zod gives us three things at once:

* **Runtime validation.** Untrusted inputs (from the model, from the
  CLI) are validated before `execute()` runs. A malformed tool call
  becomes a clear `tool_result` error the model can recover from, not a
  crash.
* **TypeScript inference.** `z.infer<typeof inputSchema>` gives
  `execute()` a statically-typed `input` parameter.
* **JSON Schema export.** `z.toJSONSchema()` produces the schema the
  Anthropic API needs without a separate definition.

The entire adapter layer is ~80 lines (`src/tools/tool.ts`) because Zod
does the heavy lifting.

---

---
url: 'https://www.botholomew.com/tui.md'
---
# The TUI (`botholomew chat`)

![Tour of every tab in the chat TUI](./assets/full-tour.gif)

Botholomew ships with an interactive terminal UI for talking to the
agent, inspecting its work, and managing the local database. It's built
on [Ink 6](https://github.com/vadimdemedes/ink) + React 19 — a real
React tree, just rendered to ANSI characters instead of DOM nodes.

The TUI is not a thin wrapper around the CLI. It's an 8-tab dashboard
that runs against the same DuckDB workers use, so you can watch tasks
claim and complete, browse the agent's memory, edit schedules, and
monitor workers in real time.

***

## Launching

```bash
botholomew chat                      # new thread
botholomew chat --thread-id <id>     # resume a previous thread
botholomew chat -p "summarize inbox" # one-shot: send prompt, then chat
```

The TUI does not auto-spawn workers — dispatch them explicitly via the
CLI (`botholomew worker start --persist`) or have the chat agent call
the `spawn_worker` tool. Exiting the TUI prints the thread ID and the
exact command to resume it.

Thread titles are auto-generated by the LLM from the first user
message and updated in the status bar every 5s.

***

## The eight tabs

The TUI is organized as eight sibling panels. Only one is visible at a
time. All panels stay mounted — switching tabs hides them with CSS
(`display="none"`) rather than unmounting, so scroll position and
filter state survive a round trip.

| # | Tab | What it's for |
|---|---|---|
| 1 | **Chat** | Talk to the agent. Streamed responses, tool-call boxes, slash commands, message queue. |
| 2 | **Tools** | Scrollable log of every tool call in the current session, with full input/output. |
| 3 | **Context** | Browse the agent's "virtual filesystem" (DuckDB-backed). Preview, search, delete. |
| 4 | **Tasks** | Task queue with status + priority filters. View details, payloads, and predecessor outputs. |
| 5 | **Threads** | Browse past chat and worker threads. View interactions, delete with confirmation. |
| 6 | **Schedules** | Recurring work. Toggle enabled/disabled, delete, inspect last run. |
| 7 | **Workers** | Live view of registered workers (running / stopped / dead), pid, mode, heartbeat age. `f` cycles the status filter. |
| 8 | **Help** | System info, worker status, keyboard reference. |

### 1. Chat

The default view. User and assistant messages render as bubbles,
tool calls render as compact boxes beneath the message that triggered
them, and the input bar sits at the bottom.

Completed messages are printed via Ink's `<Static>` component so they
live in real terminal scrollback — you can select and copy them with
your terminal's native tools, and they survive tab switches without
re-layout.

While the agent is streaming, text flushes to the screen on a ~50 ms
timer (~20 fps) to keep the terminal from flickering. A spinner marks
in-flight tool calls. While the model is assembling a tool-call input
(streaming a large JSON args block), a `Preparing tool call: <name>...`
spinner is shown so the UI doesn't appear frozen.

### 2. Tools

Every tool call the agent has made in the current session, in order.
`↑`/`↓` selects a row; the detail pane shows the full input JSON and
the full (untruncated) output. `Shift+↑`/`↓` or `j`/`k` scroll the
detail pane.

Tool calls from **MCP** (`mcp_exec`) are displayed as
`<server> / <tool>` — e.g. `Linear / CreateIssue` — with the `server`
and `tool` fields extracted from the input JSON so the name stays
readable.

See [tools.md](tools.md) for the underlying `ToolDefinition` pattern
and [mcpx.md](mcpx.md) for how MCP tools are merged into the agent's
toolset.

### 3. Context

Interactive browser for context items (the agent's "virtual
filesystem" — see [virtual-filesystem.md](virtual-filesystem.md) and
[context-and-search.md](context-and-search.md)).

* `↑`/`↓` navigate.
* `Enter` picks a drive (at the top level), expands a directory, or
  previews a file.
* `Backspace` goes up one directory; at the root of a drive, it returns
  to the drive picker.
* `/` opens a hybrid (keyword + vector) search across all drives.
* `d` deletes the selected item.

Markdown files (detected by `mime_type === "text/markdown"` or a `.md`
extension on the path) are rendered through `Bun.markdown.ansi` so
headers, emphasis, lists, and fenced code blocks show with terminal
formatting. Other file types render as plain text.

### 4. Tasks

The task queue, with filters for status (pending / in\_progress /
completed / failed) and priority. Select a row to see the full task
body, its payload, predecessor outputs (for DAG tasks), and the log of
attempts. `r` refreshes. See [tasks-and-schedules.md](tasks-and-schedules.md).

### 5. Threads

Every thread ever persisted to the project DB, with a type filter
(chat vs. worker). Threads store the full interaction history
(messages, tool calls, tool results) — the same data the agent uses to
reconstruct context on resume.

`d` deletes a thread, with a yes/no confirmation. You can't delete the
thread you're currently attached to.

### 6. Schedules

Recurring tasks. Toggle `e`nabled, `d`elete, or `r`efresh. Schedules
are evaluated by an LLM pass during the tick loop against natural-language
rules like "every weekday at 9am" — see
[tasks-and-schedules.md](tasks-and-schedules.md).

### 7. Workers

Live view of every worker registered against this project (status
filter cycles with `f`: all → running → stopped → dead → all). Each row
shows status, short id, mode, and heartbeat age. The detail pane has
full id, pid, hostname, started time, heartbeat time, stopped time (if
any), pinned task id (if any), and the per-worker log path.

Press `l` to swap the detail pane into a **log view** that tails the
selected worker's log file (`.botholomew/logs/<id>.log`). The log
auto-refreshes every ~1.5 s and follows the bottom by default — scroll
up with `Shift+↑`, `k`, or `K` to pause following; `G` (or scrolling
back to the bottom) resumes it. Press `l` again to return to the
detail view. Foreground workers (`worker run`) have no log file, so
the log view shows an empty-state message instead.

The panel polls the DB every ~3s. Workers heartbeat every
`worker_heartbeat_interval_seconds` (default 15s); ones older than
`worker_dead_after_seconds` (default 60s) get flagged `dead` by a
peer's reaper. Start new workers from the CLI (`botholomew worker
start --persist`) or have the chat agent call `spawn_worker`.

### 8. Help

Project directory, active thread ID, worker status summary, and the
full keyboard reference.

***

## The input bar

The bar at the bottom of the Chat tab is a custom multi-line input
(not `ink-text-input`). It supports:

* **Multi-line editing** — `⌥+Enter` (Alt+Enter) inserts a newline;
  plain `Enter` submits.
* **History** — `↑`/`↓` walks through previously submitted messages.
  Works across skills and plain messages.
* **Slash autocomplete** — type `/` at the start of the line to open a
  popup (see below).
* **Blinking cursor** — visual cue for focus.
* **Stable input handlers** — keypresses are handled via `useRef`-stable
  callbacks, so Ink's `useInput` doesn't re-register stdin listeners on
  every render. Historically this was the difference between a smooth
  typing experience and one that pegged a CPU core under fast input.

### Slash-command popup

Typing `/` with nothing before it opens the autocomplete popup.

| Key | Action |
|---|---|
| `↑` / `↓` | Move the highlight |
| `Return` | Submit the highlighted command if it takes no arguments; otherwise insert `/<name> ` so you can type args |
| `Tab` | Insert the highlighted completion as `/<name> ` without submitting (lets you edit before sending) |
| `Esc` | Close the popup (keeps what you typed) |

Built-in commands are `/help`, `/skills`, `/clear`, and `/exit`.
`/clear` ends the current chat thread (persisted, still resumable via
`botholomew chat --thread-id <id>`) and starts a fresh one on the same
session, so you can reset context without losing the conversation.
Every file in `.botholomew/skills/` is also surfaced in the popup with
its description. See [skills.md](skills.md) for the file format and how
skills are invoked with positional arguments.

Skills that reference `$1` / `$ARGUMENTS` (or declare `arguments` in
frontmatter) are treated as argument-taking: `Return` inserts
`/<name> ` and waits for your input. Skills without placeholders, like
the built-ins, submit in a single `Return`.

***

## The message queue

You can keep typing while the agent is working. Each submitted message
is appended to a queue that drains sequentially — when the current
turn finishes, the next queued message is sent automatically.

On the Chat tab, when at least one message is queued:

| Key | Action |
|---|---|
| `Ctrl+J` | Select the next queued message |
| `Ctrl+K` | Select the previous queued message |
| `Ctrl+E` | Edit the selected message (moves it back into the input bar) |
| `Ctrl+X` | Delete the selected message from the queue |

The queue is ephemeral (in-memory, not persisted) — it's a way to
batch follow-ups without interrupting a tool loop mid-flight.

***

## Tool-call visualization

Every tool call the agent makes renders as a small box under the
assistant message that triggered it:

```
  ⟳ Linear / CreateIssue (exec) ({"team":"..."})
  ✔ Linear / CreateIssue (exec) ({"team":"..."})
    → {"id":"...","url":"https://..."}
  ✘ Linear / CreateIssue (exec) ({"team":"..."})
    → {"is_error":true,"error":"Team not found..."}
```

| Marker | State |
|---|---|
| `⟳` | Running |
| `✔` | Succeeded |
| `✘` | Errored (`is_error: true` in the tool result) |

The input preview is truncated to 60 chars and the output preview to
120 — head over to the **Tools** tab for the full payload.

When a tool returns more than `MAX_INLINE_CHARS` (see
`src/worker/large-results.ts`), Botholomew routes the full payload
through the large-results cache and shows a stub instead:

```
  ✔ context_read ({"path":"big.md"})
    ⚡ Paginated for LLM [42K, 8pg]
```

The agent sees paged access to the result via dedicated tools; the TUI
just shows the summary to keep the chat view compact.

***

## Keyboard reference (consolidated)

### Global (any tab)

| Key | Action |
|---|---|
| `Tab` | Cycle to the next tab |
| `1`–`8` | Jump to tab N (not on Chat — the Chat input consumes digits) |
| `Esc` | Return to Chat from any other tab |
| `Ctrl+C` | Exit the TUI |

### Chat tab

| Key | Action |
|---|---|
| `Enter` | Send / queue message |
| `⌥+Enter` | Insert newline |
| `↑` / `↓` | Browse input history |
| `/` | Open slash-command popup |
| `Return` | Run highlighted command (popup open, no-arg) / insert `/<name> ` if args needed |
| `Tab` | Insert highlighted command as `/<name> ` without submitting |
| `Esc` | Close popup |
| `Ctrl+J` / `Ctrl+K` | Select queued message |
| `Ctrl+E` | Edit queued message |
| `Ctrl+X` | Delete queued message |

### List panels (Tools / Tasks / Threads / Schedules)

| Key | Action |
|---|---|
| `↑` / `↓` | Move selection |
| `Shift+↑` / `Shift+↓` | Scroll detail pane |
| `j` / `k` | Scroll detail pane (alternate) |
| `f` | Cycle filter (status, type, enabled — per panel) |
| `p` | Cycle priority filter (Tasks only) |
| `r` | Refresh from DB |
| `d` | Delete with confirmation (Threads, Schedules, Context) |
| `e` | Toggle enable/disable (Schedules only) |

### Workers tab

| Key | Action |
|---|---|
| `↑` / `↓` | Select worker |
| `f` | Cycle status filter (all → running → stopped → dead) |
| `l` | Toggle between detail and log-tail view |
| `Shift+↑` / `Shift+↓` | Scroll log up/down (log view) |
| `j` / `k` | Scroll log down/up by one line (log view) |
| `J` / `K` | Page scroll the log (log view) |
| `g` / `G` | Jump to top / bottom of log (log view, `G` resumes follow) |

### Context tab

| Key | Action |
|---|---|
| `↑` / `↓` | Navigate |
| `Enter` | Expand directory / preview file |
| `Backspace` | Go up one directory |
| `/` | Search |
| `d` | Delete selected item |

***

## Theming

The TUI detects whether your terminal has a dark or light background
and picks colors accordingly. Detection order:

1. `COLORFGBG` environment variable (set by Terminal.app, iTerm2, xterm).
2. On macOS, `defaults read -g AppleInterfaceStyle`.
3. Fallback: dark.

All colors and ANSI codes live in `src/tui/theme.ts`, so one import is
the single source of truth for hue choices across every panel.

***

## Architecture notes

A few choices worth knowing if you're reading or modifying the TUI:

* **Streaming is throttled.** `App.tsx` flushes `streamingText` every
  50 ms max during a response, not per-token. Per-token flushing
  caused visible flicker and ~30× the React commits.
* **Scroll state lives at the root.** Each list panel (`TaskPanel`,
  `ThreadPanel`, etc.) keeps its scroll offset lifted up so that
  switching tabs doesn't reset your position.
* **All tabs stay mounted.** Inactive panels are hidden with
  `display="none"` instead of being unmounted. Remounting the
  Context/Tasks panels would re-query DuckDB and lose filter state.
* **Completed messages render via `<Static>`.** They're written to
  terminal scrollback once and never re-rendered — essential for
  performance in long sessions, and it means the chat history survives
  in your terminal buffer after you exit.
* **Input handlers are ref-stable.** `InputBar` and `App` both install
  a single `useInput` handler wrapped in `useCallback` with
  `useRef`-backed state reads. A prior bug caused 100 % CPU under fast
  typing because every keystroke re-registered the stdin listener.
* **Kitty keyboard protocol.** The TUI enables Kitty's
  `disambiguateEscapeCodes` flag when starting Ink so modifiers
  (`Shift+↑`, `⌥+Enter`, etc.) are distinguishable from plain arrow /
  Enter presses on supporting terminals.

***

## Troubleshooting

* **Colors look off / no colors at all.** Your terminal may not be
  reporting `COLORFGBG`. Export it manually (`export COLORFGBG="15;0"`
  for light, `"15;16"` for dark) or rely on the macOS fallback.
* **No workers running.** The TUI does not auto-spawn workers — check
  `botholomew worker list --status running`. If nothing is alive, start
  one with `botholomew worker start --persist` or have the chat agent
  call `spawn_worker`.
* **Weird layout in tmux / split panes.** Ink needs a stable terminal
  width; if the pane resizes mid-render, large tool-call boxes can
  wrap oddly. A fresh `Ctrl+L` usually sorts it.
* **`⌥+Enter` inserts the literal character `¬` or similar.** Your
  terminal is sending Option as Meta. Enable "Use Option as Meta key"
  in your terminal profile (Terminal.app, iTerm2, Ghostty all support
  this).

***

## Related docs

* [Skills (slash commands)](skills.md) — the `/<name>` commands the
  popup surfaces.
* [Architecture](architecture.md) — how the TUI, workers, and CLI
  share one DuckDB.
* [Tasks & schedules](tasks-and-schedules.md) — what the Tasks and
  Schedules tabs are actually managing.
* [Context & hybrid search](context-and-search.md) — backs the Context
  tab's search.
* [Virtual filesystem](virtual-filesystem.md) — the `context_*` tools
  visible in the Tools tab.
* [MCPX](mcpx.md) — how `mcp_exec` calls get routed to external
  servers.

---

---
url: 'https://www.botholomew.com/virtual-filesystem.md'
---
# The virtual filesystem

Botholomew's agent has no access to your real filesystem. Every piece of
content the agent can touch lives in the `context_items` table as a row
identified by a `(drive, path)` pair. When the agent calls
`context_read({ drive: "disk", path: "/Users/evan/notes/meeting.md" })`,
it is **not** opening that file on disk — it's reading the row that was
ingested from it.

This is deliberate, and it's the single most important safety property
of the system:

* **Safety.** The agent cannot read your home directory, cannot
  overwrite your SSH keys, cannot `rm -rf` anything, cannot exfiltrate
  files it wasn't handed. A prompt-injected instruction telling it to
  "read `~/.ssh/id_rsa`" has nothing to act on — that path doesn't
  exist in its world unless you ingested it. The worst a rogue agent
  can do is corrupt rows inside `.botholomew/data.duckdb`, which you
  can recover from a backup of a single file.
* **Portability.** The entire "filesystem" is a single DuckDB file you
  can copy, share, or back up.
* **Searchability.** Every "file" is already indexed, chunked, embedded,
  and queryable.
* **History.** Everything the agent writes is recorded in
  `threads`/`interactions`, so you can audit every change.

***

## Drives

Every context item lives under a **drive**. The drive names the origin
of the content; the path is whatever that origin natively uses.

| Drive | Path shape | Example ref |
|---|---|---|
| `disk` | absolute filesystem path | `disk:/Users/evan/notes/meeting.md` |
| `url` | full URL (with scheme) | `url:/https://example.com/post` |
| `agent` | arbitrary agent-chosen path | `agent:/notes/scratch.md` |
| `google-docs` | Google Docs document id | `google-docs:/1AbCDEFGhij` |
| `github` | `/<owner>/<repo>/<rest>` | `github:/evantahler/botholomew/README.md` |

The `drive:/path` string form is the display and CLI convention.
Internally, `context_items` has two columns — `drive TEXT` and
`path TEXT` — with a `UNIQUE(drive, path)` index. That index is the
identity key: an ingest that hits an existing `(drive, path)` is a
refresh, never a duplicate.

New drives (additional MCP services) can be added by teaching
`src/context/drives.ts:detectDriveFromUrl` to recognize their URLs
and extract the right path shape.

### The `agent` drive

Content written by the agent itself (via `context_write`) defaults to
the `agent` drive. It has no external origin, so it's never a candidate
for `context_refresh`.

***

## The mapping

| Filesystem concept | DuckDB representation |
|---|---|
| Identity            | `(context_items.drive, context_items.path)` — unique together |
| Display form        | `drive:/path` (e.g. `disk:/Users/x/foo.md`) |
| File contents       | `context_items.content` (TEXT) or `content_blob` (BLOB) |
| MIME type           | `context_items.mime_type` |
| Directory           | A row with `mime_type = 'inode/directory'` |
| Directory listing   | Items filtered by `drive` and a path prefix, with intermediate directory segments derived from the matching paths |
| Binary file         | `is_textual = false`, content in `content_blob` |
| Ingestion time      | `indexed_at`, `created_at`, `updated_at` |

***

## The agent's tools

All tools that operate on context items take `(drive, path)` together.
For `context_read`, `context_info`, and `context_exists`, `path` can
also be a bare UUID or a `drive:/path` string — in those cases `drive`
is ignored.

**Discovery:**

| Tool | What it does |
|---|---|
| `context_list_drives` | List every drive that has content, with counts — a good first call when you don't know what's ingested |
| `context_tree`        | With no `drive`: list drives. With a drive: render a tree of that drive — the agent's bird's-eye view |

**Directory operations:**

| Tool | What it does |
|---|---|
| `context_create_dir` | Create a directory placeholder row (defaults to `drive: "agent"`) |
| `context_dir_size`   | Sum `length(content)` for items under a drive/prefix |

**File operations:**

| Tool | What it does |
|---|---|
| `context_read`        | Read an item's content; slice by line (`offset`/`limit`) |
| `context_write`       | Upsert a row, trigger re-chunk + re-embed, return a tree snapshot (defaults to `drive: "agent"`) |
| `context_edit`        | Apply git-style line-range patches |
| `context_delete`      | Remove by (drive, path) or recursively by prefix |
| `context_copy`        | Duplicate a row to a new (drive, path) |
| `context_move`        | Rename or relocate a row — can move between drives |
| `context_info`        | Return metadata (size, lines, mime, indexed\_at, drive, path, ref) |
| `context_exists`      | (drive, path) existence check |
| `context_count_lines` | Count `\n` in content |

These are also exposed from the host CLI:

```bash
botholomew context add ~/notes/meeting.md        # ingests as disk:/Users/.../meeting.md
botholomew context add https://github.com/evantahler/botholomew/blob/main/README.md
                                                 # ingests as github:/evantahler/botholomew/README.md
botholomew context list
botholomew context read disk:/Users/evan/notes/meeting.md
botholomew context tree disk:/Users/evan/notes
```

***

## Structured errors from `context_read` / `context_info`

When the agent passes a path that doesn't resolve, these tools return a
structured `is_error: true` response (they do **not** throw) so the model
can recover inside the same tool loop:

```json
{
  "is_error": true,
  "error_type": "not_found",
  "message": "No context item at disk:/Users/evan/notes/architecture.md",
  "next_action_hint": "Nearby items under disk:/Users/evan/notes: disk:/Users/evan/notes/readme.md, disk:/Users/evan/notes/guide.md. Call context_tree({drive:\"disk\",path:\"/Users/evan/notes\"}) to see more."
}
```

The hint is built from `findNearbyContextPaths` — up to five siblings
of the requested path's parent directory within the same drive, walking
up until it finds a populated ancestor. `context_read` also returns
`error_type: "no_text_content"` when the target exists but is binary
(e.g. an image row).

***

## Patch format for `context_edit`

```ts
{ start_line: number, end_line: number, content: string }
```

* `start_line` / `end_line` are 1-based inclusive.
* `end_line: 0` means **insert** without replacing.
* `content: ""` means **delete** the line range.
* Patches are applied bottom-up (descending `start_line`) so earlier
  line numbers remain stable.

***

## Embedding cascade

Every mutation cascades into the embeddings table:

* `context_write` → delete old chunks, re-chunk, re-embed, insert.
* `context_edit` → same.
* `context_move` → no embedding changes (embeddings reference the item id, not the path).
* `context_delete` → cascade delete embedding rows.

Embeddings are stored as `FLOAT[384]` and queried by linear scan via
`array_cosine_distance()` — no HNSW index, no VSS extension. The FTS
index over `chunk_content` and `title` is rebuilt by
`rebuildSearchIndex()` after every ingest write. See
[context-and-search.md](context-and-search.md) for the full pipeline.

***

## Why not just use files on disk?

A real filesystem would require:

* path escaping, sandboxing, symlink resolution;
* a separate indexer that must stay consistent with the files;
* backup/versioning/synchronization logic.

A DuckDB row is already all of those things at once — transactional,
searchable, and backed by a single file you can `cp` or `sqlite3` (well,
`duckdb`) into.

And the biggest reason: **safety**. A filesystem abstraction that
happens to be a database is a filesystem the agent cannot escape.
There is no `..`, no symlink, no `/etc/passwd` — just `(drive, path)`
columns with a `UNIQUE` constraint. If you're comfortable letting a
model make decisions on your behalf but not comfortable letting it
touch your disk, that's exactly the trade Botholomew makes.