--- url: 'https://www.botholomew.com/architecture.md' --- # Architecture Botholomew is three cooperating process roles that share a single DuckDB database: 1. **Workers** — short-lived or long-running `bun` processes that claim tasks from the queue, evaluate schedules, and run LLM tool loops. Multiple workers can run at once; each registers itself in the DB and heartbeats so dead ones are reaped. 2. **The chat TUI** — an Ink/React terminal UI you run on demand; it enqueues tasks, browses history, and can dispatch workers via the `spawn_worker` tool. 3. **The CLI** — everything else (`task add`, `schedule list`, `context search`, `worker list`, …). Each invocation opens its own DuckDB connection. All share `.botholomew/data.duckdb`. DuckDB holds the file lock at the instance level (not the connection), so **no process holds a DB connection longer than a single logical operation**. Each CRUD call runs inside a short-lived `withDb(dbPath, fn)` from `src/db/connection.ts`, which acquires a connection, executes, and releases the instance when the last overlapping caller in the process is done. `withRetry` wraps the acquire path and retries with exponential backoff if another process is holding the lock. **Safety note.** None of these processes give the agent direct access to your machine. Workers are the only things executing LLM tool calls, and the only tools they see are the ones registered in `src/tools/` (all operating inside `.botholomew/`) plus whichever MCP servers you explicitly configured. There is no "just read the file system" escape hatch. See [the virtual filesystem doc](virtual-filesystem.md) for the full argument. *** ## The worker tick A worker executes one `tick()` per cycle. In `--once` mode (the default), a single tick runs and then the worker exits. In `--persist` mode, the worker loops over ticks until it receives SIGTERM/SIGINT. ``` tick() ─┐ ├─► reset stale in_progress tasks (claimed > 3× max_tick_duration) ├─► processSchedules(workerId) — atomically claim each due │ schedule, ask the LLM which are │ "due", enqueue their tasks ├─► claimNextTask(workerId) — highest-priority unblocked pending │ task; worker id is stamped on the │ `claimed_by` column ├─► createThread("worker_tick") — one thread per tick for logging ├─► buildSystemPrompt() — always-context + task-relevant │ context ├─► runAgentLoop() — multi-turn Anthropic tool-use loop │ every message, thinking block, tool │ call, and tool result is logged as │ an `interaction` row ├─► updateTaskStatus() — complete / failed / waiting └─► endThread() ``` If no task is claimable and no schedule is due, `tick()` returns `false`. A `--persist` worker then sleeps `tick_interval_seconds` before trying again; a `--once` worker exits immediately. See `src/worker/tick.ts`. ### Log format Worker logs prefix every line with a local `HH:MM:SS` timestamp. Lifecycle phases render as `[[phase-name]]` in bold magenta so they're easy to scan and grep (`grep '\[\[' .botholomew/logs/.log`). Phases emitted each tick: * `[[tick-start]] #N` * `[[evaluating-schedules]]` (only when any are enabled) * `[[claiming-task]]` * `[[tick-end]] #N Xs didWork=true|false` * `[[sleeping]] Ns` (only when there was no work in a persist worker) Background workers (spawned without a TTY) also mirror the conversation thread to the log between `[[claiming-task]]` and `Task ... -> complete`, so a `tail -f` shows what the LLM is actually doing: * `[[assistant]] ` — assistant message blocks * `[[tool-call]] ` — each tool invocation * `[[tool-result]] ok|err in Ns` — tool outcome and duration Full content (untruncated input, tool output, tokens) stays in the `interactions` table; the log mirrors enough to follow the trace without opening the DB. Foreground workers (`worker run`) keep their existing streaming UX (per-token output and `▶`/`✓` markers) — these phase lines are suppressed there to avoid duplication. *** ## Registration, heartbeat, reaping Every worker writes a row into the `workers` table on start (`registerWorker` in `src/db/workers.ts`) with its id (uuidv7), pid, hostname, mode, optional pinned task id, optional `log_path`, and `status='running'`. Detached workers (spawned via `worker start` or `spawn_worker`) get a per-worker log file at `.botholomew/logs/.log` — the spawn parent generates the id and opens that file before launching the child, so the path is recorded on the row from registration onward. Foreground workers (`worker run`) have `log_path = null` and write to stdout instead. From that moment, a non-blocking `setInterval` in `src/worker/heartbeat.ts` bumps `last_heartbeat_at` every `worker_heartbeat_interval_seconds` (default 15s) — independent of the tick loop, so a worker mid-LLM-call still heartbeats reliably. Persist workers also run a reaper interval (`worker_reap_interval_seconds`, default 30s) that does two things: 1. Flips any worker whose heartbeat is older than `worker_dead_after_seconds` (default 60s) to `status='dead'` and releases every task and schedule claim it held. This is the failure-recovery path: anything from a terminal crash to a `kill -9` ends with the work reclaimable by another worker. 2. Deletes cleanly-stopped workers whose `stopped_at` is older than `worker_stopped_retention_seconds` (default 3600s). Dead workers are kept as forensic evidence; only the clean exits get auto-pruned so the `workers` table doesn't grow unbounded. *** ## The chat TUI `botholomew chat` is a separate agent with its own system prompt and tool set — it does **not** execute long-running work itself. Instead, it: * answers questions about tasks, threads, and context, * creates tasks (via `create_task`) that workers will pick up, * spawns workers on demand (via `spawn_worker`) when the user wants work run right now, * reads worker activity (`list_threads`, `view_thread`), * looks up context items by path or UUID (`context_info`, `context_search`) and can refresh them in place (`context_refresh`), * invokes **skills** (`/review`, `/standup`, …) defined in `.botholomew/skills/`, * edits `beliefs.md` and `goals.md` via `update_beliefs` / `update_goals`. It uses Anthropic's streaming API so tokens render in the TUI as they arrive. Every session is itself a `chat_session` thread with the same interaction log as a worker tick. See `src/chat/` and `src/tui/`. *** ## Why two agents? A single-agent design would force the chat loop to wait on whatever the user asked — "summarize this 200-page PDF" blocks the UI for minutes. The split: * **Chat** is fast, streaming, and interactive. It understands the world via the database but doesn't touch it much. * **Workers** are slow, autonomous, and batch-oriented. Each tick can take as long as it needs. Both speak to the same database, so a worker's results are immediately visible to the chat agent — and the chat agent can dispatch workers without blocking. *** ## Automation without a resident daemon Earlier versions of Botholomew shipped an OS-level watchdog (launchd on macOS, systemd on Linux) to keep a single daemon alive. That's been replaced: users now run workers directly, and there is no installed background service. See [automation.md](automation.md) for cron-based recipes and optional launchd/systemd examples if you want Botholomew to advance on its own. *** ## Thread logging Every interaction is persisted. A **thread** is one tick or one chat session; an **interaction** is a single event within it (user message, assistant message, tool call, tool result, thinking block, status change). This gives Botholomew total observability without a separate tracing stack — `botholomew thread view ` reads the same rows that produced the work. Threads are also the chat agent's way of reporting on what workers have been doing. Schema lives in `src/db/sql/2-logging_tables.sql`; thread types are `worker_tick` and `chat_session`. *** ## Connection model Every process uses the same policy: **open a DuckDB connection for one logical operation, then close it.** * **Workers**: `tick()` takes a `dbPath`, not a held connection. Each call into `src/db/*` is wrapped in `withDb` — stale-task reset, task claim, thread create, every `logInteraction`, the status update. The LLM network round-trip holds no connection. * **Heartbeat**: a separate `setInterval` opens its own short `withDb` every ~15s (`src/worker/heartbeat.ts`). This is deliberately decoupled from the tick loop so a long LLM call doesn't stall the heartbeat. * **Chat**: `ChatSession` carries `dbPath`. Each write (user message log, tool-use log, tool-result log, title update, thread end) is its own `withDb`. Tool execution wraps each call in `withDb` so `ctx.conn` is scoped to that tool call only. * **CLI invocations**: `withDb` in `src/commands/with-db.ts` opens a connection for the command, applies migrations, and closes when the callback returns. * **TUI panels**: take `dbPath`, not `conn`, and wrap each refresh poll in `withDb`. DuckDB's file lock is process-wide and held by the *instance*, not individual connections. Within one process we refcount a shared instance so overlapping `withDb` calls (e.g., parallel tool execution via `Promise.all`, or the heartbeat firing alongside a tick) don't trip DuckDB's "don't open the same DB twice" rule; when the last caller in the process releases, we close the instance and free the OS-level lock so another process can claim it. Vector search uses `array_cosine_distance()` (core DuckDB, no extension) over a linear scan of the `embeddings.embedding` column; the FTS extension (`INSTALL fts; LOAD fts;`) is loaded at connect time for BM25 keyword search. See `src/db/connection.ts`. *** ## Multi-worker safety Any number of workers can run against the same project concurrently (spawned by CLI, the chat tool, cron, or `--persist`). Concurrency is handled at the DB level: * **Task claim** — `claimNextTask(conn, workerId)` issues an atomic `UPDATE tasks SET status='in_progress', claimed_by=?1 WHERE id=?2 AND status='pending' RETURNING *`. If another worker claimed the row first, `RETURNING` comes back empty and the loop tries the next candidate. * **Schedule claim** — `claimSchedule(conn, id, workerId, opts)` is the same atomic UPDATE pattern, gated by both a `schedule_claim_stale_seconds` (default 300s) window on the existing claim and a `schedule_min_interval_seconds` (default 60s) window on `last_run_at`. Only one worker per schedule per window evaluates and enqueues tasks. * **Stale release** — If a worker crashes mid-task, its claim is released when the reaper flips its row to `dead`. Existing `claim_at` staleness also catches tasks claimed for longer than 3× the tick duration, independent of the worker's heartbeat. *** ## Nuke: bulk database resets During development and when reusing a project, you often want to wipe part of the database without blowing away the whole `.botholomew/` directory (which would also erase `soul.md`, `beliefs.md`, `goals.md`, `config.json`, and your skills). `botholomew nuke` covers that: | Scope | Clears | |---|---| | `nuke context` | `context_items`, `embeddings` | | `nuke tasks` | `tasks` | | `nuke schedules` | `schedules` | | `nuke threads` | `threads`, `interactions` (both worker ticks and chat sessions) | | `nuke all` | everything above plus `daemon_state` | Each subcommand requires `-y`/`--yes` to actually delete — running without the flag prints per-table row counts and exits, so it doubles as a dry run. Nothing on disk (soul, beliefs, goals, config, skills) is ever touched. For safety, `nuke` refuses to run while any worker is in `status='running'` — stop them first with `botholomew worker stop `. The schema itself (tables, `_migrations`) is always preserved. See `src/commands/nuke.ts`. *** ## DB doctor: detect and repair index corruption Under rare circumstances — typically after a hard crash or interrupted write — DuckDB's primary-key index can fall out of sync with the row data. The symptom is that `UPDATE`/`DELETE` against the affected rows fails with `Invalid Input Error: Failed to delete all rows from index`. Inside Bun, that FATAL error unwinds past the NAPI boundary as a C++ exception, surfacing as `panic: A C++ exception occurred` from `Zig__GlobalObject__onCrash`. The CLI command, the worker tick loop, and anything else that touches a corrupted row die immediately. `botholomew db doctor` exists to detect and recover from this: | Mode | What it does | |---|---| | `db doctor` (default) | For each user table, spawns a child Bun process that runs a self-update touching the PK index. Reports `ok` / `empty` / `missing` / `corrupt` per table. The child-process isolation is essential — a panic in the probe stays out of the doctor itself. | | `db doctor --repair` | Refuses if any worker is **actually running** (PID alive). Stale `status='running'` rows whose PIDs are dead — the case that tends to coexist with workers-table corruption — are warned about but do not block repair, because flipping them to `stopped` would just trip the same corruption. Runs `CHECKPOINT`, `EXPORT DATABASE` to a timestamped directory under `.botholomew/`, renames the original `data.duckdb` (and `.wal`) to `data.duckdb.bak-`, opens a fresh DB at the original path, and `IMPORT DATABASE`s back. Indexes are rebuilt from data, which restores write integrity. After repair, `botholomew worker reap` cleans up the stale rows. | Repair is idempotent and non-destructive: the original DB is preserved as a `.bak-` file next to the new one. Delete the backup once you've confirmed the rebuilt DB looks right. See `src/db/doctor.ts` and `src/commands/db.ts`. --- --- url: 'https://www.botholomew.com/automation.md' --- # Automation Botholomew no longer ships an OS-level watchdog. Earlier versions installed a `launchd` plist or a `systemd` user service that kept a single daemon alive; we dropped that because the install was heavy and opaque. Instead, you choose how and when workers run. This doc covers the common patterns. None of them are installed for you — you copy the recipe that matches your needs. *** ## The shape of a scheduled run `botholomew worker run` (one-shot, default mode) does one thing and exits: 1. Register a worker row in the DB. 2. Start a heartbeat `setInterval` so other workers know it's alive. 3. Evaluate any due schedules and enqueue their tasks. 4. Claim the next eligible pending task. 5. Run the LLM tool loop until the task is complete / failed / waiting. 6. Mark the worker `stopped` and exit. If there's no eligible task, the worker exits immediately — safe to run on a tight cron without overlapping concerns. Two things make this safe to run concurrently with other workers: * Task claims are atomic (`UPDATE ... WHERE status='pending' RETURNING *`). * Schedule evaluation is gated by an atomic claim + a minimum-interval window, so two workers can't enqueue duplicate task batches from the same schedule. See [architecture.md](architecture.md#multi-worker-safety). *** ## Pattern: cron (recommended) One line. Put this in `crontab -e`: ```cron # Every 5 minutes, advance one task in ~/projects/inbox-bot */5 * * * * cd ~/projects/inbox-bot && /usr/local/bin/botholomew worker run >> .botholomew/cron.log 2>&1 ``` * Fire as often as you like; each fire is one task at most. * Overlap is fine. If two fires start close together, one will claim the task and the other will exit without work. * Resolve `botholomew` with a full path. cron's `PATH` is minimal; `which botholomew` from your shell gives you the right answer. * Redirect to `.botholomew/cron.log` (or anywhere you like) so you can see what happened if a run misbehaves. ### More aggressive variants If you have a backlog you want drained quickly, spawn background workers every minute: ```cron * * * * * cd ~/projects/inbox-bot && botholomew worker start >> .botholomew/cron.log 2>&1 ``` Each worker still exits after one task; they just overlap freely. A crashed worker is reaped within ~60s and its task goes back into the queue. *** ## Pattern: a single long-running worker Simplest UX for a workstation that's on most of the day: open a tmux or screen pane and run a persist worker in it. ```bash tmux new -s botholomew botholomew worker run --persist # Ctrl+B, D to detach ``` It'll tick every `tick_interval_seconds` (default 300) when the queue is empty and back-to-back while there's work. Ctrl+C to stop cleanly (the shutdown handler marks the worker `stopped`). No cron, no watchdog, no systemd — and when you want to upgrade, you stop the pane and start it again. *** ## Pattern: launchd (macOS, optional) If you want Botholomew to survive logouts and start on boot without cron or tmux, a minimal `~/Library/LaunchAgents/com.example.botholomew.plist` looks like: ```xml Labelcom.example.botholomew ProgramArguments /usr/local/bin/botholomew --dir /Users/you/projects/inbox-bot worker run StartInterval300 StandardOutPath /Users/you/projects/inbox-bot/.botholomew/launchd.log StandardErrorPath /Users/you/projects/inbox-bot/.botholomew/launchd.log ``` Then: ```bash launchctl load ~/Library/LaunchAgents/com.example.botholomew.plist ``` This runs `worker run` every 300s. You own the plist; Botholomew doesn't touch it. If Botholomew lives in a folder launchd can't read (e.g., under `~/Desktop` on newer macOS), grant Full Disk Access to whichever program invokes the binary. *** ## Pattern: systemd user timer (Linux, optional) Two files in `~/.config/systemd/user/`: `botholomew-inbox.service`: ```ini [Unit] Description=Run one Botholomew worker tick [Service] Type=oneshot WorkingDirectory=/home/you/projects/inbox-bot ExecStart=/usr/local/bin/botholomew worker run StandardOutput=append:/home/you/projects/inbox-bot/.botholomew/systemd.log StandardError=append:/home/you/projects/inbox-bot/.botholomew/systemd.log ``` `botholomew-inbox.timer`: ```ini [Unit] Description=Run Botholomew every 5 minutes [Timer] OnBootSec=60 OnUnitActiveSec=5min Unit=botholomew-inbox.service [Install] WantedBy=timers.target ``` Enable with: ```bash systemctl --user daemon-reload systemctl --user enable --now botholomew-inbox.timer ``` Same concurrency story as cron: each fire is one task at most. *** ## Troubleshooting * **"Nothing's happening."** `botholomew worker list` shows every worker the DB has ever seen. Filter with `--status running` to see who's alive right now. If you see zero running and a non-empty queue, spawn one: `botholomew worker start --persist`. * **"I see dead workers piling up."** Reaped crashes stay in the table as forensic evidence; only clean exits (`status='stopped'`) get auto-pruned (after `worker_stopped_retention_seconds`, default 1 hour). If dead rows are bothering you, `DELETE FROM workers WHERE status='dead'` clears them safely. `botholomew worker list --status dead` shows the list first. * **"Cron runs aren't firing."** Check `grep CRON /var/log/syslog` (Linux) or `log show --predicate 'process == "cron"'` (macOS). Common causes: minimal `PATH`, or a relative path to `botholomew`. * **"Two workers keep claiming the same task."** They don't — by design. The `claimed_by` column is stamped by the atomic UPDATE, so only one wins. If you're seeing duplicate **output**, it's because the task was re-run after its worker was reaped — check `worker list --status dead`. * **"The log is getting huge."** Rotate it yourself (logrotate, newsyslog). Botholomew used to do this inside the old watchdog; it no longer does. *** ## Why no built-in watchdog? Feedback from early users: installing `launchctl`/`systemctl` entries was heavy, platform-specific, and opaque — and because it was installed per project, it accumulated in `~/Library/LaunchAgents/` faster than users expected. Replacing it with "run `worker run` however you already run things" makes the footprint predictable and the failure modes familiar. If you do want boot-time survival, the templates above give you what the old watchdog provided, without the magic. --- --- url: 'https://www.botholomew.com/owl-character-sheet.md' --- # Botholomew Owl — Character Sheet The Botholomew mascot is a small ASCII owl. All poses are 3 lines tall and roughly 7 characters wide so they can be swapped frame-by-frame in the TUI. *** ## Base / Neutral ``` {o,o} /)_) " " ``` *** ## Emotions ### Happy ``` {^,^} /)_) " " ``` ### Excited ``` {*,*} /)_) " " ``` ### Sad ``` {;,;} /)_) " " ``` ### Surprised ``` {O,O} /)_) " " ``` ### Sleeping ``` {-,-} /)_) " " ``` ### Thinking ``` {o,o} /)_) ? " " ``` ### Confused ``` {o,o} /)_) ~ " " ``` ### Dizzy ``` {@,@} /)_) " " ``` ### Alert / Error ``` {!,!} /)_) " " ``` *** ## Directional ### Wink ``` {-,o} /)_) " " ``` ### Looking Left ``` {o,o} (_(\ " " ``` ### Looking Right ``` {o,o} /)_) " " ``` *** ## Poses ### Wings Up (celebrating) ``` {^,^} /) (\ " " ``` ### Wings Out (presenting) ``` {o,o} /)_)/> " " ``` ### Reading ``` {o,o} /)_) _|"|_ ``` ### Typing ``` {o,o} /)_) _|||_ ``` *** ## Animation Sequences ### Idle (looping) Slow blink cycle, ~400ms per frame: ``` Frame 0: Frame 1: Frame 2: Frame 3: {o,o} {o,o} {-,-} {o,o} /)_) /)_) /)_) /)_) " " " " " " " " ``` ### Thinking (looping) Eyes shift side to side while thinking: ``` Frame 0: Frame 1: Frame 2: Frame 3: {o,o} {o,o} {o,o} {o,o} /)_) ? /)_) . /)_) .. /)_) ... " " " " " " " " ``` ### Working (looping) Typing animation: ``` Frame 0: Frame 1: Frame 2: Frame 3: {o,o} {-,o} {o,o} {o,-} /)_) /)_) /)_) /)_) _|||_ _|||_ _|||_ _|||_ ``` ### Success Quick celebration: ``` Frame 0: Frame 1: Frame 2: {o,o} {^,^} {^,^} /)_) /) (\ /) (\ " " " " " " ``` ### Error Surprise then alert: ``` Frame 0: Frame 1: Frame 2: {o,o} {O,O} {!,!} /)_) /)_) /)_) " " " " " " ``` ### Startup Wake-up sequence (play once): ``` Frame 0: Frame 1: Frame 2: Frame 3: Frame 4: {-,-} {-,-} {o,-} {o,o} {^,^} /)_) /)_) /)_) /)_) /)_) " " " " " " " " " " ``` --- --- url: 'https://www.botholomew.com/changelog.md' description: 'Release history for Botholomew, pulled from GitHub releases.' --- # Changelog All releases are published to [GitHub](https://github.com/evantahler/botholomew/releases) and [npm](https://www.npmjs.com/package/botholomew). --- --- url: 'https://www.botholomew.com/configuration.md' --- # Configuration Botholomew reads its settings from `.botholomew/config.json`. The full schema lives in `src/config/schemas.ts`. ```json { "anthropic_api_key": "", "model": "claude-opus-4-6", "chunker_model": "claude-haiku-4-5-20251001", "embedding_model": "Xenova/bge-small-en-v1.5", "embedding_dimension": 384, "tick_interval_seconds": 300, "max_tick_duration_seconds": 120, "system_prompt_override": "", "max_turns": 0, "worker_heartbeat_interval_seconds": 15, "worker_dead_after_seconds": 60, "worker_reap_interval_seconds": 30, "worker_stopped_retention_seconds": 3600, "schedule_min_interval_seconds": 60, "schedule_claim_stale_seconds": 300, "log_level": "" } ``` *** ## Keys | Key | Default | Purpose | |---|---|---| | `anthropic_api_key` | `""` | Anthropic key. `ANTHROPIC_API_KEY` env var overrides. | | `model` | `claude-opus-4-6` | Claude model for the main agent loop (workers + chat). | | `chunker_model` | `claude-haiku-4-5-20251001` | Smaller/cheaper model used to propose chunk boundaries during ingestion and evaluate schedules. | | `embedding_model` | `Xenova/bge-small-en-v1.5` | A local [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) feature-extraction model. Weights are downloaded on first use and cached under `.botholomew/models/`. Any feature-extraction model in the Xenova/\* namespace works — e.g. `Xenova/multilingual-e5-small` (also 384-dim) for non-English content. | | `embedding_dimension` | `384` | Vector dimension. Must match the model. Changing model + dimension requires running `botholomew context reembed` to recompute every stored vector — old and new vectors aren't comparable. | | `tick_interval_seconds` | `300` | Seconds a `--persist` worker sleeps between ticks **when there's no work**. It ticks back-to-back while a backlog exists. | | `max_tick_duration_seconds` | `120` | Soft cap per tick. Stale-task reset fires at `3×` this value. | | `system_prompt_override` | `""` | Appended to the built-in system prompt. Use this for project-specific instructions that should be always-loaded without editing `soul.md`. | | `max_turns` | `0` | Maximum tool-use turns per agent loop (0 = unlimited). Safety net against runaway loops. | | `worker_heartbeat_interval_seconds` | `15` | How often a running worker writes to `workers.last_heartbeat_at`. Runs on its own `setInterval`, independent of the tick loop, so long LLM calls don't starve the heartbeat. | | `worker_dead_after_seconds` | `60` | A worker whose heartbeat is older than this is considered dead. The reaper flips its status to `dead` and releases every task/schedule claim it held. | | `worker_reap_interval_seconds` | `30` | How often a `--persist` worker scans for dead peers to reap and prunes old cleanly-stopped workers. One-shot workers don't run the reaper. | | `worker_stopped_retention_seconds` | `3600` | Cleanly-stopped workers older than this are deleted from the `workers` table. Dead workers are kept as forensic evidence and not auto-pruned. | | `schedule_min_interval_seconds` | `60` | Minimum gap between successive evaluations of the same schedule. A schedule that ran less than this many seconds ago is skipped. | | `schedule_claim_stale_seconds` | `300` | If a worker claimed a schedule but never released it (crash), another worker may steal the claim after this many seconds. | | `log_level` | `""` | Verbosity for `botholomew` CLI logs. One of `silent`, `error`, `warn`, `info`, `debug`. Empty string falls back to the runtime default (`info` normally, `error` under `NODE_ENV=test`). `BOTHOLOMEW_LOG_LEVEL` env var overrides this. | *** ## Environment variables | Var | Effect | |---|---| | `ANTHROPIC_API_KEY` | Overrides `anthropic_api_key` in config. | | `BOTHOLOMEW_LOG_LEVEL` | Overrides `log_level` in config. One of `silent`, `error`, `warn`, `info`, `debug`. | | `BOTHOLOMEW_NO_UPDATE_CHECK` | Disable the background "new version available" check. | *** ## Tuning guidance **For personal/low-volume use:** defaults are fine. One tick every five minutes is plenty when tasks are mostly "every morning, summarize my email". **For bursty workloads:** lower `tick_interval_seconds` to 30–60. A persist worker only sleeps when the queue is empty, so this is safe — it just reduces latency between the last item landing and the next tick firing. Alternatively, spawn more one-shot workers (via cron or chat) and leave the interval alone. **For multi-worker setups:** if you routinely run more than a handful of workers, consider lowering `worker_reap_interval_seconds` (so dead ones are cleaned quickly) and raising `worker_dead_after_seconds` (so a temporary DB-lock hiccup doesn't flip a live worker to dead). The defaults (30s reap, 60s threshold) are conservative. **For model-cost sensitivity:** * Switch `model` to `claude-sonnet-4-*` or `claude-haiku-*`. Opus is the default because quality on complex knowledge work matters more than per-token cost for most users, but Sonnet handles the majority of tasks well. * The `chunker_model` is already Haiku — leave it there. * Lower `max_turns` (e.g., 15) to hard-cap tool-use budgets. **For prompt-sensitive workflows:** use `system_prompt_override` to add instructions without touching `soul.md`. This keeps the default personality intact while layering on project-specific rules ("always respond in British English", "never call mcp\_exec on the slack server without confirmation", …). *** ## Per-project vs. global There is no global config — everything is per-project. This is deliberate: different projects have different goals, different MCP servers, different beliefs. One Botholomew project's config shouldn't leak into another's. --- --- url: 'https://www.botholomew.com/context-and-search.md' --- # Context & hybrid search Botholomew's knowledge layer is a hybrid keyword + vector search system backed entirely by DuckDB. It's how the agent finds "that thing I mentioned last week" across thousands of ingested documents without calling out to a vector DB service. *** ## The pipeline When you add a document (`botholomew context add ./report.pdf` or the agent writes via `context_write`), this happens: ``` content ─► create context_item row (drive, path) ─► LLM-driven chunker (claude-haiku-4-5 by default) ─► embedder (local @huggingface/transformers, default Xenova/bge-small-en-v1.5, 384-dim) ─► embeddings table (FLOAT[384]) ─► rebuild FTS index (BM25 over chunk_content + title) ─► indexed_at set on the context_item ``` See `src/context/ingest.ts`, `src/context/chunker.ts`, and `src/context/embedder.ts`. *** ## LLM-driven chunking Fixed-size sliding-window chunking shreds structure: a heading lands in one chunk, its bullets in another, and semantic search returns incoherent fragments. Botholomew instead asks a **small, fast** model (Haiku by default) to propose chunk boundaries for each document: ```json { "chunks": [ { "start_line": 1, "end_line": 42 }, { "start_line": 43, "end_line": 98 } ] } ``` The chunker only returns line ranges (1-based, inclusive) — see `CHUNKER_TOOL` in `src/context/chunker.ts`. Each chunk is embedded separately; the `title` and `description` come from the parent `context_item` (set at ingestion time), are prepended to the chunk's text at embed time (along with a `Source: drive:/path` line), and surface in search results as the snippet. If the chunker errors or times out, ingestion falls back to a deterministic paragraph/line splitter (`chunkByTextSplit` in `src/context/chunker.ts`) — semantic quality suffers, but the item still gets embedded. *** ## Storage Context items live in `context_items` with a single identity key: ```sql CREATE TABLE context_items ( id TEXT PRIMARY KEY, title TEXT NOT NULL, content TEXT, mime_type TEXT NOT NULL DEFAULT 'text/plain', drive TEXT NOT NULL, path TEXT NOT NULL, indexed_at TEXT, ... ); CREATE UNIQUE INDEX idx_context_items_drive_path ON context_items(drive, path); ``` That unique index is load-bearing: `context add` looks up `(drive, path)` on every input to decide whether the ingest is a new insert or a refresh of an existing row. Embeddings live in `embeddings`: ```sql CREATE TABLE embeddings ( id TEXT PRIMARY KEY, context_item_id TEXT NOT NULL, chunk_index INTEGER NOT NULL, chunk_content TEXT, title TEXT NOT NULL, description TEXT NOT NULL DEFAULT '', embedding FLOAT[384], created_at TEXT NOT NULL DEFAULT (current_timestamp::VARCHAR), UNIQUE(context_item_id, chunk_index) ); ``` Vector similarity uses `array_cosine_distance` — a core DuckDB function, no extension required. There is no HNSW index: at our scale (hundreds to low thousands of rows) a linear scan beats the operational cost of the experimental-persistence HNSW path, which has bitten us with intermittent corruption more than once. Revisit when row counts reach the millions. Keyword search uses the **DuckDB FTS extension** (`INSTALL fts; LOAD fts;`) for BM25 ranking over `chunk_content` and `title`. The FTS index is a **snapshot** — it does not update incrementally on INSERT / DELETE. Every writer must call `rebuildSearchIndex(conn)` from `src/db/embeddings.ts` after its transaction commits. The ingest pipeline (`src/context/ingest.ts`) is the only writer today and does this automatically. *** ## Hybrid search `hybridSearch()` in `src/db/embeddings.ts` combines two signals: 1. **Keyword** — `fts_main_embeddings.match_bm25(id, query)` over `chunk_content` and `title`. BM25 handles tokenization, stemming, stopwords, and length-normalized scoring, so multi-term queries strictly *increase* recall over single-term queries. 2. **Vector** — `array_cosine_distance(embedding, $query_embedding)` via a linear scan over the `embeddings` table. Results are merged with reciprocal rank fusion (k=60), joined back to `context_items` to pick up each hit's `drive` and `path`, and returned as `(ref, title, score, snippet)`. Exposed to the agent as `search_semantic` and `search_grep`, and to you as `botholomew context search "..."`. *** ## Drives Every context item lives under a **drive** — the name of its origin. The built-in drives are: | Drive | What lives there | Refreshable? | |---|---|---| | `disk` | Local files (path = absolute filesystem path) | yes (re-reads from disk) | | `url` | Generic HTTP(S) pages (path = full URL) | yes (re-fetches) | | `agent` | Agent-authored scratch content | no (no external origin) | | `google-docs` | Google Docs documents (path = doc id) | not yet | | `github` | GitHub repo content (path = /owner/repo/...) | not yet | Drive detection lives in `src/context/drives.ts`. `detectDriveFromUrl` inspects the URL (and optionally the MCP server name that served the content) and returns the right `(drive, path)` pair. To add a new drive, extend that function with a new pattern. A refresh dispatch that isn't yet implemented (`google-docs`, `github`) returns a per-item `error` so the user knows to re-add the URL explicitly. Those items are still fully searchable — they just aren't auto-refreshable yet. *** ## Contextual loading When a worker picks up a task, `buildSystemPrompt()` (`src/worker/prompt.ts`) doesn't just dump every context file into the prompt — that would blow the context window. Instead: 1. All markdown files with frontmatter `loading: always` are included verbatim (e.g., `soul.md`, `beliefs.md`, `goals.md`). 2. The task name + description is embedded. 3. `hybridSearch()` finds top-N relevant chunks from the database. 4. Those chunks are appended to the system prompt as task-specific context, labelled with their `drive:/path` ref so the agent can jump to the full item via `context_read`. 5. Markdown files with `loading: contextual` are included only if their content shares keywords with the task. *** ## Loading context Context gets into Botholomew two ways: local ingestion, and an LLM-driven loading agent that handles URLs. There is **no LLM placement** — the origin of the content determines its (drive, path) directly. ### Local files and folders ```bash botholomew context add ./notes # walks the directory botholomew context add ./report.pdf # single file botholomew context add ~/Documents/strategy ``` `context add` walks directories recursively, detects mime types, and feeds every file through the ingestion pipeline (item → chunks → embeddings). Every local file is stored with: * `drive = "disk"` * `path = ` Binary files (PDFs, images) are stored in `content_blob` with `is_textual = false`; textual files are indexed for hybrid search. ### Remote content via a loading agent URLs aren't `fetch()`d directly. Botholomew runs a focused LLM agent (`src/context/fetcher.ts`) whose only job is to retrieve the content at a URL using the MCP tools you have configured: ```bash botholomew context add https://docs.google.com/document/d/abc123/edit botholomew context add https://github.com/evantahler/botholomew/blob/main/README.md botholomew context add https://example.com/blog/post # Hand the fetcher extra guidance (auth notes, tool hints, etc.) botholomew context add https://internal.corp/doc \ --prompt-addition "Use the corp-wiki MCP server, not Firecrawl" ``` The fetcher runs a tool-use loop (up to 10 turns) with a small tool set: * `mcp_list_tools` / `mcp_search` — discover which MCP tools are available and which might handle this URL. * `mcp_info` — read a tool's input schema before calling it. * `mcp_exec` — execute an MCP tool. The harness captures the full result and sends the LLM **only a 2,000-char preview**, keyed by the call's `tool_use_id`. Large pages don't explode the context window. * `accept_content(exec_call_id, title, mime_type?)` — terminal. The agent picks which captured exec result to save by its id; the harness stores the full content it already has in memory. At save time the harness consults `detectDriveFromUrl(url, serverName)` to assign the right drive (e.g. `google-docs:/` when the Google Docs MCP served the content). * `request_http_fallback()` — terminal. Explicit signal that no MCP tool fits; the harness then runs a plain `fetch()` + HTML strip. * `report_failure(message)` — terminal. Surfaces an actionable message back to you ("this Google Doc is private — share it with your service account") instead of a silent failure. If no MCPX client is configured at all, or if the loop exceeds its turn budget, the fetcher falls back to plain HTTP with a 30s timeout and extracts `` for textual content. HTTP-fallback items live under drive `url`. ### Collision handling Before doing anything expensive, `context add` checks each input's `(drive, path)` against what's already in context. If the same `(drive, path)` is already ingested, the item is routed per `--on-conflict`: | Policy | Behavior | | ----------- | ------------------------------------------------------------------------ | | `error` | Fast-fail if any input is already in context. | | `overwrite` | Refresh content from the origin (diff + selective re-embed). | | `skip` *(default)* | Log and move on — no write, no error. | Re-running `context add` on already-ingested items is a no-op by default. Use `--on-conflict=overwrite` when you want to refresh stored content (or `botholomew context refresh` for the idiomatic flow), and `--on-conflict=error` when you want a hard failure on collisions. The agent-side `context_write` tool follows the same convention: defaults to `on_conflict='error'` and returns a PATs-style `error_type: "path_conflict"` with a `next_action_hint` that guides the agent to `context_read` first or pass `on_conflict='overwrite'`. On success, `context_write` also returns a `tree` field — a `context_tree` snapshot of the current drive — so the agent can see what else is nearby without a follow-up call. ### Refreshing stale content ```bash botholomew context refresh disk:/Users/evan/notes/strategy.md botholomew context refresh README.md # bare path → resolves to disk:/<abs> botholomew context refresh docs/*.md # multiple paths (shell glob) botholomew context refresh --all # every non-agent item ``` `refresh` dispatches on the drive: * `disk` → re-reads from the filesystem. * `agent` → skipped (no external origin). * Everything else → re-runs the loading agent against `context_items.source_url`, which is captured at ingest time. The built-in `url` drive also accepts its own path as a fallback (the path is the URL). Items without `source_url` — legacy rows created before that column landed, or rows from a drive whose origin isn't a URL — surface a per-item error and the user must re-add from URL. Refresh has no knowledge of any specific remote service; everything goes through `source_url`. In all cases it compares the new content against what's stored, updates only when they differ, and re-embeds only the changed items. Missing files are reported, not silently dropped. The same logic is exposed to the agent as the `context_refresh` tool, which takes `ref` (a UUID, `drive:/path`, or `drive:/prefix` for a subtree) or `all: true` and returns a structured summary along with a post-refresh `tree` snapshot. *** ## Local embeddings Botholomew runs embeddings locally via [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js). The default model is `Xenova/bge-small-en-v1.5` (384-dim, ~33 MB). Weights are downloaded the first time the model is used and cached under `.botholomew/models/` — subsequent runs load from disk in milliseconds. No API key, no per-token cost, no network dependency at query time. The model loads lazily on the first embed call, so CLI startup stays fast. ONNX Runtime runs in **WASM** mode (`onnxruntime-web`) rather than the default native `onnxruntime-node` bindings, because the native bindings segfault under Bun when another native module (DuckDB) is loaded in the same process — see [oven-sh/bun#26081](https://github.com/oven-sh/bun/issues/26081). The switch is implemented as a small `bun patch` against `@huggingface/transformers` (see `patches/`) plus a `wasmPaths` override in `src/context/embedder-impl.ts` that points the WASM loader at the `onnxruntime-web/dist/` files already on disk — no CDN fetch at runtime. > **Maintaining the patch.** When bumping `@huggingface/transformers`, > re-run `bun patch '@huggingface/transformers@<version>'`, reapply the > three edits in `src/backends/onnx.js` (drop the static > `onnxruntime-node` import; route the `IS_NODE_ENV` branch to `ONNX_WEB` > with `wasm` defaults), and run `bun patch --commit`. If the patch ever > stops applying cleanly, the new `embedder.test.ts` regression case > (DuckDB + embedder in the same process) will catch it. To use a different model, set `embedding_model` and `embedding_dimension` in `.botholomew/config.json`. Any feature-extraction model from the Xenova/\* namespace works — for example, `Xenova/multilingual-e5-small` (also 384-dim) handles mixed-language content much better than the default. Changing models means old vectors and new vectors live in different embedding spaces and aren't comparable. Run `botholomew context reembed` to rebuild every vector with the new model. History: an older milestone shipped with OpenAI `text-embedding-3-small` (1536-dim) for quality reasons. Migration 18 (`18-reset_embeddings_for_local.sql`) reverts that decision — modern small open-source models close the quality gap, and "no API key required" is more in line with Botholomew's local-first stance. --- --- url: 'https://www.botholomew.com/captures.md' --- # Doc captures (screenshots & GIFs) Screenshots and GIFs of the chat TUI are **generated**, not hand-taken, so they stay current as the TUI evolves. One command regenerates every asset; the diff of `docs/assets/` tells reviewers what changed. ## How it works Two pieces: 1. **[VHS](https://github.com/charmbracelet/vhs)** drives a real PTY and renders a declarative `.tape` script (typed keystrokes + sleeps) into a GIF, MP4, or PNG. 2. **Fake LLM mode** — when `BOTHOLOMEW_FAKE_LLM=1` is set, every Anthropic client in the codebase is swapped for a scripted stub that streams fixture-defined replies (see `src/worker/fake-llm.ts`). This makes captures hermetic: no API key required, no network, and every run produces the same output. ## Install once ```bash brew install vhs ttyd ffmpeg ``` (Linux: `apt install ttyd ffmpeg` plus VHS from its [releases page](https://github.com/charmbracelet/vhs/releases).) ## Regenerate all assets ```bash bun run capture ``` The script creates an ephemeral project directory under `$TMPDIR`, runs `botholomew init` in it, then runs VHS once per tape in `docs/tapes/` — serially, since VHS contends for the tty. Output GIFs land in `docs/assets/`. Commit those changes alongside the TUI change that prompted them. Run a single tape: ```bash bun run capture chat-happy-path ``` ## Adding a new capture 1. **Write a fixture** under `docs/tapes/fixtures/<name>.json`: ```json { "turns": [ { "match": "optional regex against the user's message", "text": "The reply to stream back.", "chunkSize": 5, "delayMs": 30 } ] } ``` Turns without a `match` are consumed in order. Add `toolCalls` if the capture needs to show tool use. An optional top-level `env` object is merged into the VHS process env — handy for enabling capture-only hooks like `BOTHOLOMEW_CAPTURE_TAB_CYCLE` (see *Capture-only hooks* below). Fixtures are optional: a tape that doesn't invoke `botholomew chat` (e.g. a CLI demo) can skip the fixture file entirely. 2. **Write a tape** at `docs/tapes/<name>.tape`: ```tape Source docs/tapes/_common.tape Output docs/assets/<name>.gif Sleep 1s Type "botholomew chat" Sleep 600ms Enter Sleep 4s Type `whats on my schedule today` Sleep 600ms Enter Sleep 10s ``` Note: `Type "..."` (double-quoted) for the shell command, `` Type `...` `` (backticked) for anything typed into the TUI — see the limitations section above. The fixture file must share the tape's base name. `_common.tape` pins terminal dimensions, theme, font, and typing speed — source it from every tape for a consistent look. 3. **Run** `bun run capture <name>` and review the output in `docs/assets/`. 4. **Embed** the GIF from the relevant doc with `![alt](./assets/<name>.gif)`. ## Why this approach * **Deterministic.** Fake replies + pinned VHS settings mean byte-stable GIFs (modulo VHS upgrades). `git diff docs/assets/` is meaningful. * **Hermetic.** No API key needed, so CI can regenerate captures on merge. * **Decoupled.** The TUI itself is unchanged — the fake swap lives at the worker LLM boundary (`src/worker/llm-client.ts`), so the same stub can be reused for deterministic agent-loop tests. ## Known VHS/ttyd limitations A few real sharp edges surfaced while building this; they're all worth knowing before you write a new tape. * **Always use backticks for `Type` content, not double-quotes.** VHS's tape parser drops characters from double-quoted strings when they're piped through ttyd into an Ink raw-mode TUI — you'll see only some of what you typed, or nothing at all. The correct form is: ```tape Type `whats on my schedule today` ``` Double-quoted `Type "..."` is fine at the shell level (before the TUI launches), but use backticks for anything typed into the TUI input bar. * **`Sleep N` is seconds.** `Sleep 500` is 8 minutes and 20 seconds. Always suffix: `Sleep 500ms`, `Sleep 2s`. * **Non-text keystrokes (`Tab`, `Escape`) don't reliably reach Ink.** VHS's `Tab` / `Escape` commands send escape sequences that Ink's legacy parser under ttyd doesn't recognize. `Enter` works (it's just `\r`). If you need to drive tab navigation in a capture, use `-p "<prompt>"` to auto-submit an initial message, or add a CLI flag that lets the capture land on a specific tab. * **Under `BOTHOLOMEW_FAKE_LLM=1` the chat command forces Ink's kitty-keyboard mode to `"disabled"`** (see `src/commands/chat.ts`), because ttyd can't negotiate the Kitty Keyboard protocol. Without that, even plain-text typing is dropped. Don't remove that guard without re-running `bun run capture`. * **`Hide` … `Show` hides keystrokes from the recording.** If you want viewers to see the command being typed out, just don't use `Hide` — start the tape with the shell prompt visible and let the typing animation play. ## Capture-only hooks The TUI has one env-var-gated affordance that exists purely for captures, because VHS can't keystroke its way through the tab bar: * **`BOTHOLOMEW_CAPTURE_TAB_CYCLE=<dwell-ms>`** (default `2500`) — when set, `src/tui/App.tsx` schedules timers that walk `activeTab` through 2 → 3 → 4 → 5 → 6 → 7 → 1 with the given dwell between tabs. The hook is a no-op unless the env var is defined, so it doesn't affect normal use. `docs/tapes/full-tour.tape` enables this via its fixture's `env` block. Seeded capture data (one task, one high-priority task, one schedule, one context file) is added to every capture's ephemeral project directory by `scripts/capture.ts`, so Tasks / Schedules / Context panels have visible rows from the first frame. ## Keybinding reference (for the real TUI — not for tapes) * `Tab` cycles tabs; `Shift+Tab` is not wired up. * `1`–`7` jump to a tab **only when not on the Chat tab** (on Chat those keys are input). * `Escape` returns to Chat from any other tab. * `/` opens the slash-command popup; type to filter; `Escape` dismisses. * `Ctrl+C` exits the TUI. --- --- url: 'https://www.botholomew.com/getting-started.md' --- # Get started This page walks you from a clean machine to a running Botholomew worker processing tasks. For deeper background, see [Architecture](./architecture.md). ## Prerequisites * **[Bun](https://bun.sh) 1.1+** — Botholomew is a Bun-native CLI. * **An Anthropic API key** — Claude is the reasoning model. * Embeddings run locally via `@huggingface/transformers` (default `Xenova/bge-small-en-v1.5`, 384-dim). The first call downloads ~33 MB of weights to `.botholomew/models/`; no API key is required. * Optional: any [MCP servers](./mcpx.md) you want to expose to the agent (Gmail, Slack, GitHub, etc.) — managed through [MCPX](https://github.com/evantahler/mcpx). ## Install ```bash bun install -g botholomew ``` Or run from a checkout: ```bash git clone https://github.com/evantahler/botholomew cd botholomew bun install bun run dev -- --help ``` ## Initialize a project In any directory you want Botholomew to operate inside: ```bash botholomew init ``` This creates a `.botholomew/` directory with templates and a fresh DuckDB database: ``` my-project/ .botholomew/ soul.md # always-loaded identity (not agent-editable) beliefs.md # always-loaded, agent-editable priors goals.md # always-loaded, agent-editable goals capabilities.md # always-loaded, agent-editable tool inventory config.json # models, tick interval, API keys data.duckdb # tasks, schedules, context, embeddings, logs mcpx/servers.json # external MCP servers (Gmail, Slack, …) skills/ # slash commands (built-ins + user-defined) logs/ # per-worker log files ``` Everything the agent can touch is here — see [The virtual filesystem](./virtual-filesystem.md) for why. ## Configure API keys Either export the environment variable: ```bash export ANTHROPIC_API_KEY=sk-ant-... ``` …or set it in `.botholomew/config.json`. See [Configuration](./configuration.md) for every key and its default. ## Queue work and run a worker ```bash # Add a task to the queue botholomew task add "Summarize every markdown file in ~/notes" # Process it botholomew worker run # one-shot: claim and run one task botholomew worker run --persist # long-running: loop until you stop it ``` Want it to run on its own? See [Automation](./automation.md) for cron, tmux, launchd, and systemd recipes. ## Chat interactively ```bash botholomew chat ``` The chat command opens an [Ink/React TUI](./tui.md) with eight tabs — chat, tasks, workers, context, schedules, threads, history, and logs — plus slash-command autocomplete, a message queue, tool-call visualization, and a live workers panel. ## What's next * [The CLI reference](https://github.com/evantahler/botholomew#the-cli) on GitHub * [Architecture](./architecture.md) — workers, chat, shared DB * [Tasks & schedules](./tasks-and-schedules.md) — the claim loop and recurring schedules * [Context & hybrid search](./context-and-search.md) — ingest files, folders, and URLs * [MCPX integration](./mcpx.md) — wire up external services * [Skills](./skills.md) — slash-command templates the agent can also author at runtime --- --- url: 'https://www.botholomew.com/mcpx.md' --- # MCPX integration Botholomew has no network, no shell, and no filesystem access on its own. Everything external — reading email, searching the web, talking to GitHub — comes from MCP servers, managed per project via [**MCPX**](https://github.com/evantahler/mcpx). Think of MCPX as the `package.json` of the agent's tools: a project-local manifest (`.botholomew/mcpx/servers.json`) lists the MCP servers this project can use, and workers and the chat session connect to them at startup. You have two options for *how* those servers run: * **Run individual servers yourself.** Point MCPX at a stdio command (`npx ...`) or a remote HTTP endpoint. Good for a handful of well-known integrations. * **Use an MCP gateway.** A gateway like [Arcade.dev](https://www.arcade.dev/) exposes hundreds of authenticated tools (Gmail, Google Drive, Slack, GitHub, Notion, Linear, …) behind one endpoint, handles OAuth for you, and is maintained centrally. Configure it once and Botholomew sees the full tool surface. *** ## Configuration `.botholomew/mcpx/servers.json` uses the standard MCP client config format: ```json { "mcpServers": { "gmail": { "type": "stdio", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-gmail"], "env": {} }, "arcade": { "url": "https://api.arcade.dev/mcp/engineering", "headers": { "Authorization": "Bearer arc_xxxxxxx", "Arcade-User-ID": "you@example.com" } } } } ``` Stdio entries launch a subprocess and speak MCP over pipes. Entries with a `url` connect to a remote MCP server — this is the shape Arcade (and most hosted MCP gateways) expect: a gateway endpoint plus `headers` for auth. See [Arcade's docs](https://docs.arcade.dev/mcp-servers) for the list of gateway URLs and how `Arcade-User-ID` scopes tool access per user. MCPX accepts both shapes. *** ## Managing servers from the CLI ```bash botholomew mcpx servers # list configured server names botholomew mcpx list # every tool / resource / prompt across all configured servers botholomew mcpx ping # check connectivity to all servers (or pass names to filter) botholomew mcpx add gmail --command npx --args "-y,@modelcontextprotocol/server-gmail" botholomew mcpx add arcade --url https://api.arcade.dev/mcp/engineering botholomew mcpx remove gmail # --dry-run to preview, --keep-auth to keep stored tokens botholomew mcpx auth arcade # OAuth / token flow for HTTP servers botholomew mcpx deauth arcade # clear stored OAuth tokens for a server botholomew mcpx search "read email" # keyword + semantic search over all tools botholomew mcpx info gmail # server overview botholomew mcpx info gmail list_messages # input schema for one tool botholomew mcpx exec gmail list_messages '{"maxResults":10}' # dry-run a tool call botholomew mcpx import-global # copy ~/.mcpx into this project botholomew mcpx index # rebuild the tool search index botholomew mcpx resource arcade # list resources for a server (or read one by URI) botholomew mcpx prompt arcade # list prompts for a server (or render one by name) botholomew mcpx task <action> <server> [taskId] # list/get/result/cancel async tool tasks ``` Every subcommand is a thin passthrough to the `mcpx` CLI, so `botholomew mcpx <cmd> --help` shows the upstream reference — including every option and argument for that command. The only exception is `import-global`, which is Botholomew-specific. `mcpx exec` is the fastest way to confirm a server is wired up before handing it to the agent. `mcpx auth` runs the OAuth flow for HTTP servers that need it (most Arcade gateways do), and `mcpx import-global` is the usual way to bootstrap a new project from your global `~/.mcpx/` configuration. Note that `--args` and `--env` take **comma-separated** values — quote them so your shell doesn't split them (e.g. `--args "-y,@scope/pkg"`). *** ## Lifecycle `createMcpxClient(projectDir)` in `src/mcpx/client.ts`: 1. Reads `servers.json`. 2. Connects to every server. 3. Returns an `McpxClient | null` (`null` if no servers are configured). Each worker (and the chat session) holds the client for its lifetime and calls `client.close()` on SIGTERM/SIGINT. CLI commands like `botholomew mcpx exec` open a client, do their work, and close it. *** ## How the agent sees MCP tools Rather than flood the model's tool list with every MCP tool from every server — which can easily be hundreds — Botholomew exposes a small set of **meta-tools** the agent uses to discover and invoke MCP tools dynamically: | Tool | Purpose | |---|---| | `mcp_list_tools` | List MCP servers and the tools they provide | | `mcp_search` | Semantic search across all MCP tool names + descriptions | | `mcp_info` | Get the JSON Schema for a specific tool's input | | `mcp_exec` | Execute a tool with validated input | So the agent's flow to "check my email" looks like: ``` mcp_search("read email") ────► returns gmail.list_messages, gmail.get_message, ... mcp_info("gmail.list_messages") ──► returns input schema mcp_exec("gmail.list_messages", { maxResults: 10 }) ──► actual email ``` This keeps the primary tool list small and lets you plug in dozens of MCP servers without blowing the context window. See `src/tools/mcp/*.ts`. *** ## Logging Every MCP call is logged to the current thread as a `tool_use` / `tool_result` interaction pair — identical to how built-in tools are logged. Duration and token counts are captured. Query the `interactions` table (or run `botholomew thread view`) to see exactly what the agent sent and got back. *** ## When to add a server You want an MCP server when: * The agent needs to reach a specific service (Gmail, Slack, GitHub, Linear, Notion). * You want to give the agent *write* access somewhere — sending messages, creating issues, editing docs. * You're ingesting remote content into context — Firecrawl for web pages, Google Docs MCP for docs, etc. You don't need a server when: * The work happens entirely in `.botholomew/` (the virtual filesystem, embeddings, tasks, schedules). * You just want Claude to *read* something you already put in context — `context_read` / `search_semantic` are enough. --- --- url: 'https://www.botholomew.com/persistent-context.md' --- # Persistent context & agent self-modification The `.botholomew/` directory contains a handful of markdown files that shape how the agent thinks. Some the agent can rewrite; some it can't. Every one is versioned by frontmatter. *** ## The default files `botholomew init` creates: | File | Loading | Agent-editable? | Purpose | |---|---|---|---| | `soul.md` | `always` | **no** | Identity — who the agent is, how it behaves | | `beliefs.md` | `always` | yes | Priors the agent has learned about the world/project | | `goals.md` | `always` | yes | Current goals; updated as goals complete or change | | `capabilities.md` | `always` | yes | LLM-summarized, thematic inventory of what the agent can do (built-in + MCPX); no specific tool names | Each uses YAML frontmatter to declare its behavior: ```yaml --- loading: always # or "contextual" agent-modification: true # or false --- # Beliefs - I should be concise and clear in my work products. - I should ask for help when I'm stuck rather than guessing. ``` *** ## Loading modes **`loading: always`** — the file is concatenated into every system prompt, verbatim. Use sparingly. `soul.md`, `beliefs.md`, and `goals.md` are always-loaded. **`loading: contextual`** — the file is included only if its content shares keywords with the caller's current intent. The worker derives keywords from the running task's name and description; the chat agent derives them from your most recent message. Use this for topic-specific notes ("Everything I know about our invoicing system") that shouldn't pollute the prompt on unrelated tasks. See `loadPersistentContext()` and `extractKeywords()` in `src/worker/prompt.ts`. *** ## The hardcoded `## Style` block After the persistent-context files (and after the optional MCP section), every system prompt — worker and chat alike — appends a short `## Style` block defined as `STYLE_RULES` in `src/worker/prompt.ts`. It tells the model to skip sycophantic preambles ("You're absolutely right!", "Great question!"), push back when the user is wrong, and report failures and uncertainty directly. This is hardcoded so it applies to every install without needing to re-run `botholomew init`. Anything you put in `soul.md` or `beliefs.md` still loads above it and can layer on top — if you'd rather have a warm, chatty agent, say so there. *** ## Agent self-modification When `agent-modification: true`, the agent can rewrite the file using the `update_beliefs` or `update_goals` tools (`src/tools/context/`). The flow: 1. Agent calls `update_beliefs` with the new full file content. 2. The tool reads the existing file, parses frontmatter with `gray-matter`, preserves the frontmatter block, and writes back the new body. 3. A `context_update` interaction is logged to the current thread, so you can see — and audit — every time the agent changed its own priors. Files without `agent-modification: true` are read-only to the agent, even if the tool is called — the tool checks the frontmatter and refuses. *** ## `capabilities.md` — high-level tool inventory `capabilities.md` is the same shape as `beliefs.md` / `goals.md` (always-loaded, agent-editable), but its body is machine-generated rather than hand-written. It's a **thematic summary** of what the agent can do — built-in capabilities grouped into coarse themes (task management, virtual filesystem, search, threads, …) and one theme per external service reachable through MCPX (Gmail, GitHub, Linear, …). Specific tool names are intentionally **omitted** from the rendered file; the agent uses `mcp_list_tools`, `mcp_search`, or `mcp_info` to look up exact names when it actually needs to invoke a tool. This keeps the always-loaded context small (tens of lines instead of hundreds). Summarization uses Claude (the `chunker_model` from config) on every refresh. When no Anthropic API key is configured, a static fallback listing is rendered with internal themes + MCPX server names and tool counts. It's seeded at `botholomew init` with the built-in tools already populated. Regenerate it any time via: * `botholomew capabilities` — CLI refresh (honors `--no-mcp`) * `capabilities_refresh` — the agent calls this tool itself when it suspects the inventory has drifted (new MCPX servers added, tools renamed, file deleted) * `/capabilities` — the matching slash command in chat Frontmatter is preserved on regeneration, so you can safely flip `loading` to `contextual` if you'd rather only surface the file when the task mentions tools. *** ## What this actually looks like A typical `beliefs.md` after a few weeks of use: ```yaml --- loading: always agent-modification: true --- # Beliefs - Evan prefers bullet-point summaries over paragraphs. - The "Q4 planning" doc in /notes is the canonical source for revenue targets. - The worker should escalate to a "waiting" status if a task needs access to a tool that isn't configured, instead of failing outright. - When summarizing email, strip quoted replies — they add tokens without value. ``` None of those were in the seed template — they accumulated as the agent worked tasks and the chat user confirmed them. That's the whole point: the agent gets smarter about *your* workflow over time, and you can read (and edit) exactly what it believes. *** ## Why not put this all in a vector store? Beliefs and goals are high-priority, always-loaded text — they're what the agent uses to decide *what to do*, not raw reference material. Burying them in a vector index means the agent might not retrieve them when it matters. Keeping them as flat markdown with a hard `always` flag makes them impossible to miss. Long-form reference material (ingested PDFs, web pages, meeting notes) lives in the [context & embeddings system](context-and-search.md) instead. The two are complementary: * **Persistent context** = how the agent thinks. * **Context items / embeddings** = what the agent knows. *** ## Adding your own Drop any `.md` file into `.botholomew/` with frontmatter: ```yaml --- loading: contextual agent-modification: false --- # Our deployment checklist 1. Bump version in package.json 2. Run bun test && bun run lint 3. ... ``` Tasks mentioning "deploy", "release", or "version" — and chat messages mentioning the same — will now include this file in the system prompt automatically. You didn't have to register it anywhere. On every tick the worker reads every `.md` file in `.botholomew/`, extracts words longer than three characters from the current task's name and description, and includes any `loading: contextual` file whose content contains at least one of those words. The chat agent does the same on every turn, using your most recent message as the keyword source. See `loadPersistentContext()` in `src/worker/prompt.ts` for the exact logic. --- --- url: 'https://www.botholomew.com/skills.md' --- # Skills (slash commands) Skills are user-defined slash commands for the chat TUI. A skill is a markdown file with frontmatter and a prompt template; when you type `/<name>` in chat, the template is rendered and sent as a user message. Think of them as reusable prompts — "summarize this conversation", "review this file", "give me a standup update" — parameterized and version-controlled alongside the project. *** ## File format Skills live in `.botholomew/skills/<name>.md`: ```yaml --- name: review description: "Review a file for quality and issues" arguments: - name: file description: "Path to the file to review" required: true - name: focus description: "What to focus on (security, performance, etc.)" required: false default: "general quality" --- Please review the file at `$1`. Read it with the available tools, then provide: 1. A brief summary of what the file does 2. Any issues or concerns (bugs, security, performance) 3. Suggestions for improvement 4. An overall assessment (focus: $2) ``` **Frontmatter fields:** | Field | Required? | Purpose | |---|---|---| | `name` | no | Defaults to filename. Determines the slash command name | | `description` | yes | Shown in `/skills` listing and `/help` | | `arguments` | no | Array of argument definitions (name, description, required, default) | **Body:** Markdown prompt template with variable substitution. *** ## Variable substitution | Placeholder | Meaning | |---|---| | `$ARGUMENTS` | The entire argument string as typed | | `$1`, `$2`, … | Positional arguments (split on whitespace, quoted strings respected) | | `$<name>` | Named placeholder bound to the declared argument with that `name` (same value as the matching positional `$N`) | Missing optional arguments fall back to their `default`. Missing required arguments cause a validation error before the skill is sent — the TUI prints a `Usage:` line and never calls the LLM. Named and positional placeholders refer to the same underlying value. The `arguments[]` order in frontmatter sets each name's slot: the first declared argument is `$1` and `$<first-name>`, the second is `$2` and `$<second-name>`, and so on. Named placeholders are word-boundary matched (so `$start` won't clip `$start_date`), and longer names are substituted first when both exist. Example: ``` > /review src/cli.ts security ``` becomes `$1 = $file = "src/cli.ts"`, `$2 = $focus = "security"`, and `$ARGUMENTS = "src/cli.ts security"`. *** ## Built-in defaults `botholomew init` ships three skills out of the box: **`summarize.md`** — summarize the current chat conversation. **`standup.md`** — generate a standup update from recent tasks (completed in the last 24h + in progress). **`capabilities.md`** — rescan every built-in and MCPX tool and rewrite `.botholomew/capabilities.md` (see [persistent-context.md](persistent-context.md#capabilitiesmd--high-level-tool-inventory)). More are easy to add; see the quickstart below. *** ## Invoking skills From inside `botholomew chat`: ``` > /skills # list all available skills > /summarize # run the summarize skill > /review src/cli.ts # positional argument becomes $1 ``` ### Autocomplete popup Typing `/` at the start of the input pops up a menu of matching commands (built-ins `/help`, `/skills`, `/clear`, `/exit` plus every skill loaded from `.botholomew/skills/`). Each row shows the command name and its description. | Key | Action | |---|---| | `↑` / `↓` | Move the highlight | | `Tab` or `Return` | Accept the highlighted command (fills in `/<name> ` so you can type arguments) | | `Esc` | Close the popup without changing the input | The popup filters as you keep typing, and it disappears once you type a space — so a second `Return` submits the message as usual. When a skill runs, the TUI's user bubble shows the literal slash command you typed (e.g. `/review src/cli.ts security`), not the rendered prompt body — that keeps the chat transcript readable. The agent still receives the fully-rendered prompt as its user message. *** ## Managing skills from chat Skills aren't write-once-via-CLI: the chat agent can list, read, create, edit, search, and delete them on demand. Six tools are exposed to the chat agent: | Tool | What it does | |---|---| | `skill_list` | List skills (name, description, args, file path) | | `skill_read` | Read a skill's raw file contents and parsed fields | | `skill_search` | Keyword search across name, description, body, and arg metadata | | `skill_write` | Create or overwrite a skill (`on_conflict: 'error' \| 'overwrite'`) | | `skill_edit` | Apply git-style line-range patches to an existing skill | | `skill_delete` | Delete a skill file by name | Newly written or edited skills are picked up at the start of the *next* user message — `ChatSession.skills` is reloaded from disk in `sendMessage`. So a typical flow looks like: ``` > save this prompt as a skill called daily-log so I can run it tomorrow [agent calls skill_write] > /daily-log # works immediately, no chat restart needed ``` `skill_write` rejects the reserved built-in names (`help`, `skills`, `clear`, `exit`) with `error_type: "reserved_name"`. It also normalizes names to `[a-z0-9-]`, sets the frontmatter `name` to match the filename, and re-parses the generated file before writing — so an invalid skill never lands on disk. `skill_edit` re-parses after applying patches and refuses to write if the result fails validation, so you can't break a skill from chat. **Editing skills outside the chat** (e.g., with your text editor) still requires a chat restart — the in-memory cache is only refreshed inside `sendMessage`. *** ## CLI management ```bash botholomew skill list # table of all skills (supports --limit / --offset) botholomew skill show review # print the full skill file botholomew skill create daily-log # scaffold a new skill botholomew skill validate # parse every .botholomew/skills/*.md and report errors botholomew skill validate path.md # validate a single file (handy before committing) ``` `skill show` exits non-zero if the name doesn't match a loaded skill, and prints the available skill names to stderr. `skill validate` exits non-zero if any file fails to parse, so it fits naturally into a pre-commit hook or CI check. Skills are parsed by `src/skills/parser.ts` and loaded from disk by `src/skills/loader.ts`. The `ChatSession` caches them on session start and reloads them at the top of every `sendMessage` — so skills the chat agent creates or edits via the `skill_*` tools are usable on the next user message. Direct file edits made outside the running chat (e.g., from your editor) take effect on the next user message in any active session, but won't appear retroactively in history. *** ## Writing a good skill * **Be explicit about what you want.** The model doesn't know the shape of the output unless you describe it. * **Use positional args, not free-form.** `/review src/cli.ts` is easier to tab-complete than `/review --file=src/cli.ts`. In the body, you can reference each argument either positionally (`$1`, `$2`) or by the name you gave it (`$file`, `$focus`) — pick whichever reads better. * **Reference tools by name.** "Read the file with `context_read`" nudges the agent toward the right tool and keeps token counts down. * **Keep them short.** A skill is a prompt, not a program. If your skill is 200 lines of conditional logic, it probably wants to be a real tool. *** ## Why not just type the prompt? Because you'll type it a hundred times. Skills are pure convenience — but they're also version-controllable, shareable (copy a `skills/` directory between projects), and discoverable (`/skills` shows them all). They turn "the prompt I always use for standup updates" into a durable project asset. --- --- url: 'https://www.botholomew.com/tasks-and-schedules.md' --- # Tasks & schedules The task queue is Botholomew's execution substrate. Humans and agents both write to it; workers are the readers. *** ## Tasks A task is a unit of work with a lifecycle: ``` pending ──► in_progress ──► complete │ │ │ failed │ │ │ waiting │ └── (reset by timeout) ▼ blocked (via blocked_by) ``` **Columns** (`src/db/sql/1-core_tables.sql`): | Field | Type | Notes | |---|---|---| | `id` | TEXT | UUIDv7 | | `name` | TEXT | Short title | | `description` | TEXT | Full description for the LLM | | `priority` | ENUM | `low` / `medium` / `high` | | `status` | ENUM | `pending` / `in_progress` / `failed` / `complete` / `waiting` | | `waiting_reason` | TEXT | Set when the agent calls `wait_task` | | `claimed_by` | TEXT | Worker id (`workers.id`) that claimed it | | `claimed_at` | TEXT | ISO timestamp | | `blocked_by` | JSON\[] | Array of task IDs that must complete first | | `context_ids` | JSON\[] | Context items referenced by this task | | `output` | TEXT | The `summary` from `complete_task` (added in migration 8) | *** ## The claim loop `claimNextTask(conn, workerId)` in `src/db/tasks.ts`: 1. Select `pending` tasks where every `blocked_by` ID is in status `complete`. 2. Order by priority, then `created_at`. 3. Atomically `UPDATE ... WHERE status='pending' RETURNING *`, stamping the calling worker's id on `claimed_by`. If `RETURNING` comes back empty, another worker claimed it first — the loop tries the next candidate. Multiple workers can race on the same queue safely because the atomic UPDATE is serialized at the DuckDB instance level. A worker holds its claimed task for the duration of the tick. Two cleanup paths release stuck tasks: * **Timeout**: `resetStaleTasks()` (called at the top of every tick) reclaims rows whose `claimed_at` is older than `max_tick_duration_seconds * 3` and sets them back to `pending`. * **Dead worker**: `reapDeadWorkers()` flips any worker whose `last_heartbeat_at` is older than `worker_dead_after_seconds` to `dead` and releases every task and schedule claim held by that worker. See [architecture.md](architecture.md#registration-heartbeat-reaping). A single worker can also target a specific task via `claimSpecificTask(conn, taskId, workerId)` — used by `botholomew worker run --task-id <id>` and the chat `spawn_worker` tool. *** ## DAG validation `blocked_by` defines a dependency DAG. Cycles would deadlock the claim loop, so `validateBlockedBy()` rejects them at insert time: * DFS from each blocker, looking for a path back to the task being created. * If any path exists, `createTask()` throws. This is cheap because the graph is almost always shallow — the common pattern is "produce N subtasks from a schedule" which is a flat one-level fan-out. *** ## Predecessor outputs When the agent works a task that was blocked by others, it doesn't start from zero. `runAgentLoop()` (`src/worker/llm.ts`) fetches each blocker's `output` (the summary passed to `complete_task`) and injects it into the user message: ``` Task: Name: Produce weekly summary Description: ... Priority: medium Predecessor Task Outputs: ### Read email (01JE...) - 3 urgent threads from customers about Q4 rollover... ### Check calendar (01JE...) - 5 meetings this week, 2 with external stakeholders... ``` This is how multi-step workflows chain without a dedicated orchestrator. *** ## Schedules A schedule is a recurring task template described in natural language: ```bash botholomew schedule add "Morning review" \ --frequency "every weekday at 7am" \ --description "Read my email, check my calendar, draft a morning summary" ``` **Columns:** | Field | Notes | |---|---| | `frequency` | Plain text — "every morning", "weekly on Mondays", "every 2 hours" | | `last_run_at` | ISO timestamp of last evaluation that created tasks | | `enabled` | Boolean | | `claimed_by` | Worker id currently evaluating this schedule (or null) | | `claimed_at` | ISO timestamp when the current claim was taken | *** ## LLM-evaluated "is it due?" Instead of parsing cron expressions, `processSchedules(dbPath, config, workerId)` (`src/worker/schedules.ts`) first **claims** each enabled schedule via an atomic `UPDATE schedules SET claimed_by=?1 WHERE id=?2 AND (claimed_at IS NULL OR claimed_at < stale_cutoff) AND (last_run_at IS NULL OR last_run_at < now - min_interval) RETURNING *`. Only the worker that wins the claim evaluates that schedule — so two concurrent workers evaluating the same schedule never produce duplicate task batches. Once a worker holds the claim, it asks the model: > Given the frequency `"every weekday at 7am"`, `last_run_at` > \= 2025-04-16T07:03:12Z, and now = 2025-04-17T07:41:05Z — is this > schedule due? If yes, what task(s) should be created? The LLM returns structured output: `{ isDue: boolean, tasksToCreate: Array<{ name, description, priority }> }`. If the schedule describes a multi-step workflow ("read email and summarize"), the model can return multiple tasks with `blocked_by` linking them — so a schedule naturally expands into a chained DAG. Trade-offs: * **Flexibility.** "Every weekday at 7am, except US holidays, unless I'm on vacation (check calendar)" is specifiable in English and evaluable by the model. * **Cost.** One (cheap) model call per enabled schedule per tick. For dozens of schedules this is negligible; for thousands, you'd want a parser. * **Drift.** The model's idea of "morning" might not match yours. Tighten the frequency text if you see misfires. `botholomew schedule trigger <id>` runs the same evaluation loop on demand and creates the task(s) immediately — handy for verifying that a new schedule produces the tasks you expect without waiting for the next tick. *** ## Running the queue by hand ```bash # Add work botholomew task add "Draft Q4 retro" --priority high # Inspect (newest first; supports --status, --priority, --limit, --offset) botholomew task list --status pending botholomew task list --limit 20 --offset 20 botholomew task view <id> # Run a worker now (foreground, one-shot by default) botholomew worker run botholomew worker run --persist # long-running tick loop botholomew worker run --task-id <id> # target a specific task # Unstick a task botholomew task reset <id> botholomew task delete <id> # Manually fire a schedule botholomew schedule trigger <id> ``` All of the same operations are available to the chat agent (`create_task`, `list_tasks`, `view_task`, `update_task`, `delete_task`, `create_schedule`, `list_schedules`) so you can drive the queue conversationally too. `delete_task` refuses tasks in `in_progress` — the worker has no mid-execution interrupt, so wait for it to finish or run `botholomew task reset <id>` from the CLI first. --- --- url: 'https://www.botholomew.com/tools.md' --- # The Tool class Every tool the agent can call — and every matching CLI subcommand you can run yourself — is defined once as a `ToolDefinition`. A single definition drives three consumers: 1. **The Anthropic SDK** (via `input_schema: JSONSchema`) so the model can call it. 2. **Commander.js** via an auto-generated subcommand. 3. **Tests**, which import the tool directly and call `execute()`. This lives in `src/tools/tool.ts`. *** ## Shape of a tool ```ts import { z } from "zod"; import type { ToolDefinition } from "../tool.ts"; const inputSchema = z.object({ summary: z.string().describe("Summary of work done"), }); const outputSchema = z.object({ message: z.string(), is_error: z.boolean(), }); export const completeTaskTool = { name: "complete_task", description: "Mark the current task as complete with a summary of what was accomplished.", group: "task", terminal: true, inputSchema, outputSchema, execute: async (input, ctx) => ({ message: `Task completed: ${input.summary}`, is_error: false, }), } satisfies ToolDefinition<typeof inputSchema, typeof outputSchema>; ``` **Fields:** | Field | Purpose | |---|---| | `name` | Snake-case identifier; also the CLI subcommand name | | `description` | Used for both the LLM tool definition and CLI help text | | `group` | Groups tools into CLI namespaces (`task`, `file`, `dir`, …) | | `terminal` | If `true`, the agent loop ends when this tool is called (e.g., `complete_task`, `fail_task`, `wait_task`) | | `inputSchema` | Zod schema with `.describe()` per field — becomes JSON Schema for the model and Commander flags for the CLI | | `outputSchema` | Zod schema guaranteeing the shape of the response | | `execute` | The actual implementation, receiving validated input and a `ToolContext` | *** ## ToolContext Every tool receives a `ToolContext`: ```ts interface ToolContext { conn: DbConnection; // short-lived connection, scoped to this tool call dbPath: string; // for long-running tools that manage their own withDb projectDir: string; // absolute path to the project config: Required<BotholomewConfig>; // resolved config (API keys, model, …) mcpxClient: McpxClient | null; // external MCP tools (may be null) } ``` This is the only capability surface. A tool that isn't handed an `mcpxClient` can't reach the network; a tool that doesn't use `conn` or `dbPath` can't touch the database. ### `conn` vs `dbPath` The executor (`runAgentLoop` / `runChatTurn`) wraps each tool call in `withDb(dbPath, async (conn) => tool.execute(input, { ...ctx, conn }))`. That means: * `ctx.conn` is **already open** for the duration of one `execute()` call and will be closed immediately after. Use it for ordinary tools that do one or two quick queries. * `ctx.dbPath` is for tools that run long enough that holding the file lock would block the worker or CLI (e.g., `context_refresh` re-fetching many URLs). Wrap each DB touch in `await withDb(ctx.dbPath, async (conn) => { … })` so the lock is released between items. DuckDB holds the file lock at the instance level. A tool that hangs on `ctx.conn` through a long network round-trip keeps that lock held. When in doubt, prefer granular `ctx.dbPath` wrapping. *** ## Anthropic adapter `toAnthropicTools()` walks the registry and converts each Zod input schema to the Anthropic SDK's `Tool` type using `z.toJSONSchema()`: ```ts { name: "context_write", description: "Write content to a context item. By default, fails if the (drive, path) already exists — pass on_conflict='overwrite' to replace.", input_schema: { type: "object", properties: { /* derived from Zod */ }, required: ["drive", "path", "content"], } } ``` `context_write` accepts an optional `on_conflict: "error" | "overwrite"` input (default `"error"`). A collision returns `is_error: true`, `error_type: "path_conflict"`, and a `next_action_hint` that steers the model back to `context_read` or a retry with `on_conflict='overwrite'`. `runAgentLoop()` feeds this array into `client.messages.create({ tools: ... })`. When the model emits a `tool_use` block, the loop looks up the tool by name via `getTool(name)`, validates the input against `inputSchema`, calls `execute()`, and returns the result as a `tool_result` block. Terminal tools (the ones with `terminal: true`) tell the loop to stop. For workers, those are `complete_task`, `fail_task`, and `wait_task` — any of which transitions the task out of `in_progress`. *** ## CLI adapter `registerToolsAsCLI(program)` iterates the registry and generates a Commander subcommand per tool, grouped by `group`: ```bash botholomew context read disk:/Users/evan/notes/meeting.md --offset 10 --limit 20 botholomew context tree disk:/Users/evan/notes --max-depth 3 botholomew search semantic "quarterly revenue" ``` Positional args and `--options` are derived from the Zod schema shape. The same validation that runs for the LLM runs here, so you get the same error messages. *** ## Registry Tools register themselves on import, so adding a tool is a one-file change: 1. Create `src/tools/<group>/<name>.ts` exporting a `ToolDefinition`. 2. Add `registerTool(myTool);` to `src/tools/registry.ts`. 3. Write a test in `test/tools/<group>/<name>.test.ts`. No central dispatch table to edit, no LLM tool list to update, no CLI command to wire. The Zod schema is the source of truth. *** ## `capabilities_refresh` — the meta-tool The `capabilities`-group tool `capabilities_refresh` exists so the agent can keep its own tool inventory fresh. It walks `getAllTools()` and `mcpxClient.listTools()`, then asks Claude (via `chunker_model`) to produce a **thematic summary** — one line per theme (e.g. "Gmail — read, send, draft, search, and reply to emails") rather than a line per tool. The result is written to `.botholomew/capabilities.md` (preserving frontmatter). Because that file is loaded into every system prompt, the next boot picks up the new inventory without another round-trip. Specific tool names are intentionally absent from the rendered file; the agent uses `mcp_list_tools` / `mcp_search` / `mcp_info` to look them up at call-time. See [persistent-context.md](persistent-context.md#capabilitiesmd--high-level-tool-inventory) for when the agent should call it. The matching CLI surface is `botholomew capabilities`, and the slash command is `/capabilities`. *** ## Why Zod for the schema? Zod gives us three things at once: * **Runtime validation.** Untrusted inputs (from the model, from the CLI) are validated before `execute()` runs. A malformed tool call becomes a clear `tool_result` error the model can recover from, not a crash. * **TypeScript inference.** `z.infer<typeof inputSchema>` gives `execute()` a statically-typed `input` parameter. * **JSON Schema export.** `z.toJSONSchema()` produces the schema the Anthropic API needs without a separate definition. The entire adapter layer is ~80 lines (`src/tools/tool.ts`) because Zod does the heavy lifting. --- --- url: 'https://www.botholomew.com/tui.md' --- # The TUI (`botholomew chat`) ![Tour of every tab in the chat TUI](./assets/full-tour.gif) Botholomew ships with an interactive terminal UI for talking to the agent, inspecting its work, and managing the local database. It's built on [Ink 6](https://github.com/vadimdemedes/ink) + React 19 — a real React tree, just rendered to ANSI characters instead of DOM nodes. The TUI is not a thin wrapper around the CLI. It's an 8-tab dashboard that runs against the same DuckDB workers use, so you can watch tasks claim and complete, browse the agent's memory, edit schedules, and monitor workers in real time. *** ## Launching ```bash botholomew chat # new thread botholomew chat --thread-id <id> # resume a previous thread botholomew chat -p "summarize inbox" # one-shot: send prompt, then chat ``` The TUI does not auto-spawn workers — dispatch them explicitly via the CLI (`botholomew worker start --persist`) or have the chat agent call the `spawn_worker` tool. Exiting the TUI prints the thread ID and the exact command to resume it. Thread titles are auto-generated by the LLM from the first user message and updated in the status bar every 5s. *** ## The eight tabs The TUI is organized as eight sibling panels. Only one is visible at a time. All panels stay mounted — switching tabs hides them with CSS (`display="none"`) rather than unmounting, so scroll position and filter state survive a round trip. | # | Tab | What it's for | |---|---|---| | 1 | **Chat** | Talk to the agent. Streamed responses, tool-call boxes, slash commands, message queue. | | 2 | **Tools** | Scrollable log of every tool call in the current session, with full input/output. | | 3 | **Context** | Browse the agent's "virtual filesystem" (DuckDB-backed). Preview, search, delete. | | 4 | **Tasks** | Task queue with status + priority filters. View details, payloads, and predecessor outputs. | | 5 | **Threads** | Browse past chat and worker threads. View interactions, delete with confirmation. | | 6 | **Schedules** | Recurring work. Toggle enabled/disabled, delete, inspect last run. | | 7 | **Workers** | Live view of registered workers (running / stopped / dead), pid, mode, heartbeat age. `f` cycles the status filter. | | 8 | **Help** | System info, worker status, keyboard reference. | ### 1. Chat The default view. User and assistant messages render as bubbles, tool calls render as compact boxes beneath the message that triggered them, and the input bar sits at the bottom. Completed messages are printed via Ink's `<Static>` component so they live in real terminal scrollback — you can select and copy them with your terminal's native tools, and they survive tab switches without re-layout. While the agent is streaming, text flushes to the screen on a ~50 ms timer (~20 fps) to keep the terminal from flickering. A spinner marks in-flight tool calls. While the model is assembling a tool-call input (streaming a large JSON args block), a `Preparing tool call: <name>...` spinner is shown so the UI doesn't appear frozen. ### 2. Tools Every tool call the agent has made in the current session, in order. `↑`/`↓` selects a row; the detail pane shows the full input JSON and the full (untruncated) output. `Shift+↑`/`↓` or `j`/`k` scroll the detail pane. Tool calls from **MCP** (`mcp_exec`) are displayed as `<server> / <tool>` — e.g. `Linear / CreateIssue` — with the `server` and `tool` fields extracted from the input JSON so the name stays readable. See [tools.md](tools.md) for the underlying `ToolDefinition` pattern and [mcpx.md](mcpx.md) for how MCP tools are merged into the agent's toolset. ### 3. Context Interactive browser for context items (the agent's "virtual filesystem" — see [virtual-filesystem.md](virtual-filesystem.md) and [context-and-search.md](context-and-search.md)). * `↑`/`↓` navigate. * `Enter` picks a drive (at the top level), expands a directory, or previews a file. * `Backspace` goes up one directory; at the root of a drive, it returns to the drive picker. * `/` opens a hybrid (keyword + vector) search across all drives. * `d` deletes the selected item. Markdown files (detected by `mime_type === "text/markdown"` or a `.md` extension on the path) are rendered through `Bun.markdown.ansi` so headers, emphasis, lists, and fenced code blocks show with terminal formatting. Other file types render as plain text. ### 4. Tasks The task queue, with filters for status (pending / in\_progress / completed / failed) and priority. Select a row to see the full task body, its payload, predecessor outputs (for DAG tasks), and the log of attempts. `r` refreshes. See [tasks-and-schedules.md](tasks-and-schedules.md). ### 5. Threads Every thread ever persisted to the project DB, with a type filter (chat vs. worker). Threads store the full interaction history (messages, tool calls, tool results) — the same data the agent uses to reconstruct context on resume. `d` deletes a thread, with a yes/no confirmation. You can't delete the thread you're currently attached to. ### 6. Schedules Recurring tasks. Toggle `e`nabled, `d`elete, or `r`efresh. Schedules are evaluated by an LLM pass during the tick loop against natural-language rules like "every weekday at 9am" — see [tasks-and-schedules.md](tasks-and-schedules.md). ### 7. Workers Live view of every worker registered against this project (status filter cycles with `f`: all → running → stopped → dead → all). Each row shows status, short id, mode, and heartbeat age. The detail pane has full id, pid, hostname, started time, heartbeat time, stopped time (if any), pinned task id (if any), and the per-worker log path. Press `l` to swap the detail pane into a **log view** that tails the selected worker's log file (`.botholomew/logs/<id>.log`). The log auto-refreshes every ~1.5 s and follows the bottom by default — scroll up with `Shift+↑`, `k`, or `K` to pause following; `G` (or scrolling back to the bottom) resumes it. Press `l` again to return to the detail view. Foreground workers (`worker run`) have no log file, so the log view shows an empty-state message instead. The panel polls the DB every ~3s. Workers heartbeat every `worker_heartbeat_interval_seconds` (default 15s); ones older than `worker_dead_after_seconds` (default 60s) get flagged `dead` by a peer's reaper. Start new workers from the CLI (`botholomew worker start --persist`) or have the chat agent call `spawn_worker`. ### 8. Help Project directory, active thread ID, worker status summary, and the full keyboard reference. *** ## The input bar The bar at the bottom of the Chat tab is a custom multi-line input (not `ink-text-input`). It supports: * **Multi-line editing** — `⌥+Enter` (Alt+Enter) inserts a newline; plain `Enter` submits. * **History** — `↑`/`↓` walks through previously submitted messages. Works across skills and plain messages. * **Slash autocomplete** — type `/` at the start of the line to open a popup (see below). * **Blinking cursor** — visual cue for focus. * **Stable input handlers** — keypresses are handled via `useRef`-stable callbacks, so Ink's `useInput` doesn't re-register stdin listeners on every render. Historically this was the difference between a smooth typing experience and one that pegged a CPU core under fast input. ### Slash-command popup Typing `/` with nothing before it opens the autocomplete popup. | Key | Action | |---|---| | `↑` / `↓` | Move the highlight | | `Return` | Submit the highlighted command if it takes no arguments; otherwise insert `/<name> ` so you can type args | | `Tab` | Insert the highlighted completion as `/<name> ` without submitting (lets you edit before sending) | | `Esc` | Close the popup (keeps what you typed) | Built-in commands are `/help`, `/skills`, `/clear`, and `/exit`. `/clear` ends the current chat thread (persisted, still resumable via `botholomew chat --thread-id <id>`) and starts a fresh one on the same session, so you can reset context without losing the conversation. Every file in `.botholomew/skills/` is also surfaced in the popup with its description. See [skills.md](skills.md) for the file format and how skills are invoked with positional arguments. Skills that reference `$1` / `$ARGUMENTS` (or declare `arguments` in frontmatter) are treated as argument-taking: `Return` inserts `/<name> ` and waits for your input. Skills without placeholders, like the built-ins, submit in a single `Return`. *** ## The message queue You can keep typing while the agent is working. Each submitted message is appended to a queue that drains sequentially — when the current turn finishes, the next queued message is sent automatically. On the Chat tab, when at least one message is queued: | Key | Action | |---|---| | `Ctrl+J` | Select the next queued message | | `Ctrl+K` | Select the previous queued message | | `Ctrl+E` | Edit the selected message (moves it back into the input bar) | | `Ctrl+X` | Delete the selected message from the queue | The queue is ephemeral (in-memory, not persisted) — it's a way to batch follow-ups without interrupting a tool loop mid-flight. *** ## Tool-call visualization Every tool call the agent makes renders as a small box under the assistant message that triggered it: ``` ⟳ Linear / CreateIssue (exec) ({"team":"..."}) ✔ Linear / CreateIssue (exec) ({"team":"..."}) → {"id":"...","url":"https://..."} ✘ Linear / CreateIssue (exec) ({"team":"..."}) → {"is_error":true,"error":"Team not found..."} ``` | Marker | State | |---|---| | `⟳` | Running | | `✔` | Succeeded | | `✘` | Errored (`is_error: true` in the tool result) | The input preview is truncated to 60 chars and the output preview to 120 — head over to the **Tools** tab for the full payload. When a tool returns more than `MAX_INLINE_CHARS` (see `src/worker/large-results.ts`), Botholomew routes the full payload through the large-results cache and shows a stub instead: ``` ✔ context_read ({"path":"big.md"}) ⚡ Paginated for LLM [42K, 8pg] ``` The agent sees paged access to the result via dedicated tools; the TUI just shows the summary to keep the chat view compact. *** ## Keyboard reference (consolidated) ### Global (any tab) | Key | Action | |---|---| | `Tab` | Cycle to the next tab | | `1`–`8` | Jump to tab N (not on Chat — the Chat input consumes digits) | | `Esc` | Return to Chat from any other tab | | `Ctrl+C` | Exit the TUI | ### Chat tab | Key | Action | |---|---| | `Enter` | Send / queue message | | `⌥+Enter` | Insert newline | | `↑` / `↓` | Browse input history | | `/` | Open slash-command popup | | `Return` | Run highlighted command (popup open, no-arg) / insert `/<name> ` if args needed | | `Tab` | Insert highlighted command as `/<name> ` without submitting | | `Esc` | Close popup | | `Ctrl+J` / `Ctrl+K` | Select queued message | | `Ctrl+E` | Edit queued message | | `Ctrl+X` | Delete queued message | ### List panels (Tools / Tasks / Threads / Schedules) | Key | Action | |---|---| | `↑` / `↓` | Move selection | | `Shift+↑` / `Shift+↓` | Scroll detail pane | | `j` / `k` | Scroll detail pane (alternate) | | `f` | Cycle filter (status, type, enabled — per panel) | | `p` | Cycle priority filter (Tasks only) | | `r` | Refresh from DB | | `d` | Delete with confirmation (Threads, Schedules, Context) | | `e` | Toggle enable/disable (Schedules only) | ### Workers tab | Key | Action | |---|---| | `↑` / `↓` | Select worker | | `f` | Cycle status filter (all → running → stopped → dead) | | `l` | Toggle between detail and log-tail view | | `Shift+↑` / `Shift+↓` | Scroll log up/down (log view) | | `j` / `k` | Scroll log down/up by one line (log view) | | `J` / `K` | Page scroll the log (log view) | | `g` / `G` | Jump to top / bottom of log (log view, `G` resumes follow) | ### Context tab | Key | Action | |---|---| | `↑` / `↓` | Navigate | | `Enter` | Expand directory / preview file | | `Backspace` | Go up one directory | | `/` | Search | | `d` | Delete selected item | *** ## Theming The TUI detects whether your terminal has a dark or light background and picks colors accordingly. Detection order: 1. `COLORFGBG` environment variable (set by Terminal.app, iTerm2, xterm). 2. On macOS, `defaults read -g AppleInterfaceStyle`. 3. Fallback: dark. All colors and ANSI codes live in `src/tui/theme.ts`, so one import is the single source of truth for hue choices across every panel. *** ## Architecture notes A few choices worth knowing if you're reading or modifying the TUI: * **Streaming is throttled.** `App.tsx` flushes `streamingText` every 50 ms max during a response, not per-token. Per-token flushing caused visible flicker and ~30× the React commits. * **Scroll state lives at the root.** Each list panel (`TaskPanel`, `ThreadPanel`, etc.) keeps its scroll offset lifted up so that switching tabs doesn't reset your position. * **All tabs stay mounted.** Inactive panels are hidden with `display="none"` instead of being unmounted. Remounting the Context/Tasks panels would re-query DuckDB and lose filter state. * **Completed messages render via `<Static>`.** They're written to terminal scrollback once and never re-rendered — essential for performance in long sessions, and it means the chat history survives in your terminal buffer after you exit. * **Input handlers are ref-stable.** `InputBar` and `App` both install a single `useInput` handler wrapped in `useCallback` with `useRef`-backed state reads. A prior bug caused 100 % CPU under fast typing because every keystroke re-registered the stdin listener. * **Kitty keyboard protocol.** The TUI enables Kitty's `disambiguateEscapeCodes` flag when starting Ink so modifiers (`Shift+↑`, `⌥+Enter`, etc.) are distinguishable from plain arrow / Enter presses on supporting terminals. *** ## Troubleshooting * **Colors look off / no colors at all.** Your terminal may not be reporting `COLORFGBG`. Export it manually (`export COLORFGBG="15;0"` for light, `"15;16"` for dark) or rely on the macOS fallback. * **No workers running.** The TUI does not auto-spawn workers — check `botholomew worker list --status running`. If nothing is alive, start one with `botholomew worker start --persist` or have the chat agent call `spawn_worker`. * **Weird layout in tmux / split panes.** Ink needs a stable terminal width; if the pane resizes mid-render, large tool-call boxes can wrap oddly. A fresh `Ctrl+L` usually sorts it. * **`⌥+Enter` inserts the literal character `¬` or similar.** Your terminal is sending Option as Meta. Enable "Use Option as Meta key" in your terminal profile (Terminal.app, iTerm2, Ghostty all support this). *** ## Related docs * [Skills (slash commands)](skills.md) — the `/<name>` commands the popup surfaces. * [Architecture](architecture.md) — how the TUI, workers, and CLI share one DuckDB. * [Tasks & schedules](tasks-and-schedules.md) — what the Tasks and Schedules tabs are actually managing. * [Context & hybrid search](context-and-search.md) — backs the Context tab's search. * [Virtual filesystem](virtual-filesystem.md) — the `context_*` tools visible in the Tools tab. * [MCPX](mcpx.md) — how `mcp_exec` calls get routed to external servers. --- --- url: 'https://www.botholomew.com/virtual-filesystem.md' --- # The virtual filesystem Botholomew's agent has no access to your real filesystem. Every piece of content the agent can touch lives in the `context_items` table as a row identified by a `(drive, path)` pair. When the agent calls `context_read({ drive: "disk", path: "/Users/evan/notes/meeting.md" })`, it is **not** opening that file on disk — it's reading the row that was ingested from it. This is deliberate, and it's the single most important safety property of the system: * **Safety.** The agent cannot read your home directory, cannot overwrite your SSH keys, cannot `rm -rf` anything, cannot exfiltrate files it wasn't handed. A prompt-injected instruction telling it to "read `~/.ssh/id_rsa`" has nothing to act on — that path doesn't exist in its world unless you ingested it. The worst a rogue agent can do is corrupt rows inside `.botholomew/data.duckdb`, which you can recover from a backup of a single file. * **Portability.** The entire "filesystem" is a single DuckDB file you can copy, share, or back up. * **Searchability.** Every "file" is already indexed, chunked, embedded, and queryable. * **History.** Everything the agent writes is recorded in `threads`/`interactions`, so you can audit every change. *** ## Drives Every context item lives under a **drive**. The drive names the origin of the content; the path is whatever that origin natively uses. | Drive | Path shape | Example ref | |---|---|---| | `disk` | absolute filesystem path | `disk:/Users/evan/notes/meeting.md` | | `url` | full URL (with scheme) | `url:/https://example.com/post` | | `agent` | arbitrary agent-chosen path | `agent:/notes/scratch.md` | | `google-docs` | Google Docs document id | `google-docs:/1AbCDEFGhij` | | `github` | `/<owner>/<repo>/<rest>` | `github:/evantahler/botholomew/README.md` | The `drive:/path` string form is the display and CLI convention. Internally, `context_items` has two columns — `drive TEXT` and `path TEXT` — with a `UNIQUE(drive, path)` index. That index is the identity key: an ingest that hits an existing `(drive, path)` is a refresh, never a duplicate. New drives (additional MCP services) can be added by teaching `src/context/drives.ts:detectDriveFromUrl` to recognize their URLs and extract the right path shape. ### The `agent` drive Content written by the agent itself (via `context_write`) defaults to the `agent` drive. It has no external origin, so it's never a candidate for `context_refresh`. *** ## The mapping | Filesystem concept | DuckDB representation | |---|---| | Identity | `(context_items.drive, context_items.path)` — unique together | | Display form | `drive:/path` (e.g. `disk:/Users/x/foo.md`) | | File contents | `context_items.content` (TEXT) or `content_blob` (BLOB) | | MIME type | `context_items.mime_type` | | Directory | A row with `mime_type = 'inode/directory'` | | Directory listing | Items filtered by `drive` and a path prefix, with intermediate directory segments derived from the matching paths | | Binary file | `is_textual = false`, content in `content_blob` | | Ingestion time | `indexed_at`, `created_at`, `updated_at` | *** ## The agent's tools All tools that operate on context items take `(drive, path)` together. For `context_read`, `context_info`, and `context_exists`, `path` can also be a bare UUID or a `drive:/path` string — in those cases `drive` is ignored. **Discovery:** | Tool | What it does | |---|---| | `context_list_drives` | List every drive that has content, with counts — a good first call when you don't know what's ingested | | `context_tree` | With no `drive`: list drives. With a drive: render a tree of that drive — the agent's bird's-eye view | **Directory operations:** | Tool | What it does | |---|---| | `context_create_dir` | Create a directory placeholder row (defaults to `drive: "agent"`) | | `context_dir_size` | Sum `length(content)` for items under a drive/prefix | **File operations:** | Tool | What it does | |---|---| | `context_read` | Read an item's content; slice by line (`offset`/`limit`) | | `context_write` | Upsert a row, trigger re-chunk + re-embed, return a tree snapshot (defaults to `drive: "agent"`) | | `context_edit` | Apply git-style line-range patches | | `context_delete` | Remove by (drive, path) or recursively by prefix | | `context_copy` | Duplicate a row to a new (drive, path) | | `context_move` | Rename or relocate a row — can move between drives | | `context_info` | Return metadata (size, lines, mime, indexed\_at, drive, path, ref) | | `context_exists` | (drive, path) existence check | | `context_count_lines` | Count `\n` in content | These are also exposed from the host CLI: ```bash botholomew context add ~/notes/meeting.md # ingests as disk:/Users/.../meeting.md botholomew context add https://github.com/evantahler/botholomew/blob/main/README.md # ingests as github:/evantahler/botholomew/README.md botholomew context list botholomew context read disk:/Users/evan/notes/meeting.md botholomew context tree disk:/Users/evan/notes ``` *** ## Structured errors from `context_read` / `context_info` When the agent passes a path that doesn't resolve, these tools return a structured `is_error: true` response (they do **not** throw) so the model can recover inside the same tool loop: ```json { "is_error": true, "error_type": "not_found", "message": "No context item at disk:/Users/evan/notes/architecture.md", "next_action_hint": "Nearby items under disk:/Users/evan/notes: disk:/Users/evan/notes/readme.md, disk:/Users/evan/notes/guide.md. Call context_tree({drive:\"disk\",path:\"/Users/evan/notes\"}) to see more." } ``` The hint is built from `findNearbyContextPaths` — up to five siblings of the requested path's parent directory within the same drive, walking up until it finds a populated ancestor. `context_read` also returns `error_type: "no_text_content"` when the target exists but is binary (e.g. an image row). *** ## Patch format for `context_edit` ```ts { start_line: number, end_line: number, content: string } ``` * `start_line` / `end_line` are 1-based inclusive. * `end_line: 0` means **insert** without replacing. * `content: ""` means **delete** the line range. * Patches are applied bottom-up (descending `start_line`) so earlier line numbers remain stable. *** ## Embedding cascade Every mutation cascades into the embeddings table: * `context_write` → delete old chunks, re-chunk, re-embed, insert. * `context_edit` → same. * `context_move` → no embedding changes (embeddings reference the item id, not the path). * `context_delete` → cascade delete embedding rows. Embeddings are stored as `FLOAT[384]` and queried by linear scan via `array_cosine_distance()` — no HNSW index, no VSS extension. The FTS index over `chunk_content` and `title` is rebuilt by `rebuildSearchIndex()` after every ingest write. See [context-and-search.md](context-and-search.md) for the full pipeline. *** ## Why not just use files on disk? A real filesystem would require: * path escaping, sandboxing, symlink resolution; * a separate indexer that must stay consistent with the files; * backup/versioning/synchronization logic. A DuckDB row is already all of those things at once — transactional, searchable, and backed by a single file you can `cp` or `sqlite3` (well, `duckdb`) into. And the biggest reason: **safety**. A filesystem abstraction that happens to be a database is a filesystem the agent cannot escape. There is no `..`, no symlink, no `/etc/passwd` — just `(drive, path)` columns with a `UNIQUE` constraint. If you're comfortable letting a model make decisions on your behalf but not comfortable letting it touch your disk, that's exactly the trade Botholomew makes.