Architecture
Botholomew is three cooperating process roles that share a single DuckDB database:
- Workers — short-lived or long-running
bunprocesses that claim tasks from the queue, evaluate schedules, and run LLM tool loops. Multiple workers can run at once; each registers itself in the DB and heartbeats so dead ones are reaped. - The chat TUI — an Ink/React terminal UI you run on demand; it enqueues tasks, browses history, and can dispatch workers via the
spawn_workertool. - The CLI — everything else (
task add,schedule list,context search,worker list, …). Each invocation opens its own DuckDB connection.
All share .botholomew/data.duckdb. DuckDB holds the file lock at the instance level (not the connection), so no process holds a DB connection longer than a single logical operation. Each CRUD call runs inside a short-lived withDb(dbPath, fn) from src/db/connection.ts, which acquires a connection, executes, and releases the instance when the last overlapping caller in the process is done. withRetry wraps the acquire path and retries with exponential backoff if another process is holding the lock.
Safety note. None of these processes give the agent direct access to your machine. Workers are the only things executing LLM tool calls, and the only tools they see are the ones registered in src/tools/ (all operating inside .botholomew/) plus whichever MCP servers you explicitly configured. There is no "just read the file system" escape hatch. See the virtual filesystem doc for the full argument.
The worker tick
A worker executes one tick() per cycle. In --once mode (the default), a single tick runs and then the worker exits. In --persist mode, the worker loops over ticks until it receives SIGTERM/SIGINT.
tick() ─┐
├─► reset stale in_progress tasks (claimed > 3× max_tick_duration)
├─► processSchedules(workerId) — atomically claim each due
│ schedule, ask the LLM which are
│ "due", enqueue their tasks
├─► claimNextTask(workerId) — highest-priority unblocked pending
│ task; worker id is stamped on the
│ `claimed_by` column
├─► createThread("worker_tick") — one thread per tick for logging
├─► buildSystemPrompt() — always-context + task-relevant
│ context
├─► runAgentLoop() — multi-turn Anthropic tool-use loop
│ every message, thinking block, tool
│ call, and tool result is logged as
│ an `interaction` row
├─► updateTaskStatus() — complete / failed / waiting
└─► endThread()If no task is claimable and no schedule is due, tick() returns false. A --persist worker then sleeps tick_interval_seconds before trying again; a --once worker exits immediately.
See src/worker/tick.ts.
Log format
Worker logs prefix every line with a local HH:MM:SS timestamp. Lifecycle phases render as [[phase-name]] in bold magenta so they're easy to scan and grep (grep '\[\[' .botholomew/logs/<id>.log). Phases emitted each tick:
[[tick-start]] #N[[evaluating-schedules]](only when any are enabled)[[claiming-task]][[tick-end]] #N Xs didWork=true|false[[sleeping]] Ns(only when there was no work in a persist worker)
Background workers (spawned without a TTY) also mirror the conversation thread to the log between [[claiming-task]] and Task ... -> complete, so a tail -f shows what the LLM is actually doing:
[[assistant]] <full text response>— assistant message blocks[[tool-call]] <tool> <truncated JSON input>— each tool invocation[[tool-result]] <tool> ok|err in Ns— tool outcome and duration
Full content (untruncated input, tool output, tokens) stays in the interactions table; the log mirrors enough to follow the trace without opening the DB. Foreground workers (worker run) keep their existing streaming UX (per-token output and ▶/✓ markers) — these phase lines are suppressed there to avoid duplication.
Registration, heartbeat, reaping
Every worker writes a row into the workers table on start (registerWorker in src/db/workers.ts) with its id (uuidv7), pid, hostname, mode, optional pinned task id, optional log_path, and status='running'. Detached workers (spawned via worker start or spawn_worker) get a per-worker log file at .botholomew/logs/<worker-id>.log — the spawn parent generates the id and opens that file before launching the child, so the path is recorded on the row from registration onward. Foreground workers (worker run) have log_path = null and write to stdout instead.
From that moment, a non-blocking setInterval in src/worker/heartbeat.ts bumps last_heartbeat_at every worker_heartbeat_interval_seconds (default 15s) — independent of the tick loop, so a worker mid-LLM-call still heartbeats reliably.
Persist workers also run a reaper interval (worker_reap_interval_seconds, default 30s) that does two things:
- Flips any worker whose heartbeat is older than
worker_dead_after_seconds(default 60s) tostatus='dead'and releases every task and schedule claim it held. This is the failure-recovery path: anything from a terminal crash to akill -9ends with the work reclaimable by another worker. - Deletes cleanly-stopped workers whose
stopped_atis older thanworker_stopped_retention_seconds(default 3600s). Dead workers are kept as forensic evidence; only the clean exits get auto-pruned so theworkerstable doesn't grow unbounded.
The chat TUI
botholomew chat is a separate agent with its own system prompt and tool set — it does not execute long-running work itself. Instead, it:
- answers questions about tasks, threads, and context,
- creates tasks (via
create_task) that workers will pick up, - spawns workers on demand (via
spawn_worker) when the user wants work run right now, - reads worker activity (
list_threads,view_thread), - looks up context items by path or UUID (
context_info,context_search) and can refresh them in place (context_refresh), - invokes skills (
/review,/standup, …) defined in.botholomew/skills/, - edits
beliefs.mdandgoals.mdviaupdate_beliefs/update_goals.
It uses Anthropic's streaming API so tokens render in the TUI as they arrive. Every session is itself a chat_session thread with the same interaction log as a worker tick.
See src/chat/ and src/tui/.
Why two agents?
A single-agent design would force the chat loop to wait on whatever the user asked — "summarize this 200-page PDF" blocks the UI for minutes. The split:
- Chat is fast, streaming, and interactive. It understands the world via the database but doesn't touch it much.
- Workers are slow, autonomous, and batch-oriented. Each tick can take as long as it needs.
Both speak to the same database, so a worker's results are immediately visible to the chat agent — and the chat agent can dispatch workers without blocking.
Automation without a resident daemon
Earlier versions of Botholomew shipped an OS-level watchdog (launchd on macOS, systemd on Linux) to keep a single daemon alive. That's been replaced: users now run workers directly, and there is no installed background service. See automation.md for cron-based recipes and optional launchd/systemd examples if you want Botholomew to advance on its own.
Thread logging
Every interaction is persisted. A thread is one tick or one chat session; an interaction is a single event within it (user message, assistant message, tool call, tool result, thinking block, status change).
This gives Botholomew total observability without a separate tracing stack — botholomew thread view <id> reads the same rows that produced the work. Threads are also the chat agent's way of reporting on what workers have been doing.
Schema lives in src/db/sql/2-logging_tables.sql; thread types are worker_tick and chat_session.
Connection model
Every process uses the same policy: open a DuckDB connection for one logical operation, then close it.
- Workers:
tick()takes adbPath, not a held connection. Each call intosrc/db/*is wrapped inwithDb— stale-task reset, task claim, thread create, everylogInteraction, the status update. The LLM network round-trip holds no connection. - Heartbeat: a separate
setIntervalopens its own shortwithDbevery ~15s (src/worker/heartbeat.ts). This is deliberately decoupled from the tick loop so a long LLM call doesn't stall the heartbeat. - Chat:
ChatSessioncarriesdbPath. Each write (user message log, tool-use log, tool-result log, title update, thread end) is its ownwithDb. Tool execution wraps each call inwithDbsoctx.connis scoped to that tool call only. - CLI invocations:
withDbinsrc/commands/with-db.tsopens a connection for the command, applies migrations, and closes when the callback returns. - TUI panels: take
dbPath, notconn, and wrap each refresh poll inwithDb.
DuckDB's file lock is process-wide and held by the instance, not individual connections. Within one process we refcount a shared instance so overlapping withDb calls (e.g., parallel tool execution via Promise.all, or the heartbeat firing alongside a tick) don't trip DuckDB's "don't open the same DB twice" rule; when the last caller in the process releases, we close the instance and free the OS-level lock so another process can claim it.
Vector search uses array_cosine_distance() (core DuckDB, no extension) over a linear scan of the embeddings.embedding column; the FTS extension (INSTALL fts; LOAD fts;) is loaded at connect time for BM25 keyword search. See src/db/connection.ts.
Multi-worker safety
Any number of workers can run against the same project concurrently (spawned by CLI, the chat tool, cron, or --persist). Concurrency is handled at the DB level:
- Task claim —
claimNextTask(conn, workerId)issues an atomicUPDATE tasks SET status='in_progress', claimed_by=?1 WHERE id=?2 AND status='pending' RETURNING *. If another worker claimed the row first,RETURNINGcomes back empty and the loop tries the next candidate. - Schedule claim —
claimSchedule(conn, id, workerId, opts)is the same atomic UPDATE pattern, gated by both aschedule_claim_stale_seconds(default 300s) window on the existing claim and aschedule_min_interval_seconds(default 60s) window onlast_run_at. Only one worker per schedule per window evaluates and enqueues tasks. - Stale release — If a worker crashes mid-task, its claim is released when the reaper flips its row to
dead. Existingclaim_atstaleness also catches tasks claimed for longer than 3× the tick duration, independent of the worker's heartbeat.
Nuke: bulk database resets
During development and when reusing a project, you often want to wipe part of the database without blowing away the whole .botholomew/ directory (which would also erase soul.md, beliefs.md, goals.md, config.json, and your skills). botholomew nuke covers that:
| Scope | Clears |
|---|---|
nuke context | context_items, embeddings |
nuke tasks | tasks |
nuke schedules | schedules |
nuke threads | threads, interactions (both worker ticks and chat sessions) |
nuke all | everything above plus daemon_state |
Each subcommand requires -y/--yes to actually delete — running without the flag prints per-table row counts and exits, so it doubles as a dry run. Nothing on disk (soul, beliefs, goals, config, skills) is ever touched.
For safety, nuke refuses to run while any worker is in status='running' — stop them first with botholomew worker stop <id>. The schema itself (tables, _migrations) is always preserved.
See src/commands/nuke.ts.
DB doctor: detect and repair index corruption
Under rare circumstances — typically after a hard crash or interrupted write — DuckDB's primary-key index can fall out of sync with the row data. The symptom is that UPDATE/DELETE against the affected rows fails with Invalid Input Error: Failed to delete all rows from index. Inside Bun, that FATAL error unwinds past the NAPI boundary as a C++ exception, surfacing as panic: A C++ exception occurred from Zig__GlobalObject__onCrash. The CLI command, the worker tick loop, and anything else that touches a corrupted row die immediately.
botholomew db doctor exists to detect and recover from this:
| Mode | What it does |
|---|---|
db doctor (default) | For each user table, spawns a child Bun process that runs a self-update touching the PK index. Reports ok / empty / missing / corrupt per table. The child-process isolation is essential — a panic in the probe stays out of the doctor itself. |
db doctor --repair | Refuses if any worker is actually running (PID alive). Stale status='running' rows whose PIDs are dead — the case that tends to coexist with workers-table corruption — are warned about but do not block repair, because flipping them to stopped would just trip the same corruption. Runs CHECKPOINT, EXPORT DATABASE to a timestamped directory under .botholomew/, renames the original data.duckdb (and .wal) to data.duckdb.bak-<timestamp>, opens a fresh DB at the original path, and IMPORT DATABASEs back. Indexes are rebuilt from data, which restores write integrity. After repair, botholomew worker reap cleans up the stale rows. |
Repair is idempotent and non-destructive: the original DB is preserved as a .bak-<timestamp> file next to the new one. Delete the backup once you've confirmed the rebuilt DB looks right.
See src/db/doctor.ts and src/commands/db.ts.