Skip to content
Documentation GitHub
Agent

Investigation Pattern

Status: Design landing Reference epic: INK-846 ADRs: ADR-016, ADR-021

An investigation is a composition pattern, not a new primitive. It combines the Wasmtime CPython sandbox (ADR-021), the MCP back-channel, and LangGraph subgraph dispatch through task() (see Task Primitive) to give the World Agent a context-isolated environment for synthesis work — one where it can pull wide, discard freely, and return only what it concludes.

Synthesis tasks have a context problem. The World Agent may need to read dozens of pages, execute analytical code against retrieved data, and discard most of what it touched. If all of that happens in the main conversation thread, the thread accumulates every candidate, every discarded hit, every intermediate computation. By the time the agent has an answer, the conversation history is bloated with material that is not part of the answer.

An investigation solves this by running the synthesis work inside a persistent sandbox session. The session has a namespace that persists across turns within the investigation but is torn down when the investigation ends. Everything the agent touched — all retrieval results, all intermediate code runs, all discarded hypotheses — stays inside the bubble. The calling agent sees exactly one thing: the FINAL answer.

An investigation is dispatched by passing shape="investigation" to task(). The caller provides a prompt describing the synthesis task and, optionally, a seeded context slice. task() opens a persistent Wasmtime CPython sandbox session and starts the investigation subgraph.

From the calling agent’s view, the investigation is one task() call that returns a typed result. Inside the bubble, the investigation is a loop of many turns: the LLM chooses what code to run, the sandbox executes it, the result comes back into the namespace, and the loop continues until the LLM calls FINAL(answer) or the session budget is exhausted.

This is consistent with ADR-021 — “sandbox is a capability, not a runtime.” The sandbox instance lifetime is an implementation detail of the capability. The calling agent does not manage sandbox instances; it dispatches an investigation and gets an answer. The persistent instance is what makes the helper library’s namespace accumulate across turns within the session; from outside the session boundary it remains one tool call.

Inside the sandbox, the investigation has access to a pre-loaded Python helper library. These callables proxy through the existing MCP back-channel to reach workspace capabilities:

  • kb_search(query, k=…, scope=…) — workspace search via the SearchRouter. Returns up to a configurable number of candidates; wide-seed mode returns hundreds.
  • kb_read(path) — reads a workspace page or block by path.
  • memory_get(tier, …) — reads a memory tier. Capability-gated; the investigation inherits the calling agent’s capability set.
  • llm_query(prompt, model=…) — a single LLM call. Defaults to the cheap analyst model. Brokers out of the sandbox to the sidecar’s DSPy layer.
  • llm_query_batched(prompts, model=…) — parallel cheap-model fan-out, capped at a configurable limit (default 16). Each prompt resolves to a DSPy call.
  • investigate(prompt, context=…) — opens a nested investigation. Capped at a default depth.
  • investigate_batched(prompts, contexts=…) — parallel nested investigations, capped (default 4).
  • FINAL(answer) — terminates the investigation and returns the answer to the caller.
  • FINAL_VAR(name) — terminates and returns a named namespace variable as the answer.
  • SHOW_VARS() — inspects the current namespace; useful for self-directed synthesis steps.

All helpers run under the calling agent’s capability set. Writes that flow through helpers cross the submit boundary like any other agent write — they carry Origin::AgentProduced, Lifecycle::Candidate, and derivation links (see Submit Boundary).

The LLM-calling helpers (llm_query, llm_query_batched) do not invoke provider SDKs directly inside the sandbox. They broker out to the sidecar’s DSPy layer via the RPC back-channel. This means LLM calls inside the investigation follow the same DSPy/LiteLLM path as any other sidecar LLM call; the sandbox does not hold credentials.

Investigations use a wide-seed mode on the SearchRouter. Where a normal search returns RRF top-K results (a handful of high-scoring candidates), wide-seed mode returns hundreds of candidates ranked but not truncated. The investigation filters and discards in code.

This is the pattern’s retrieval contribution: instead of asking the SearchRouter to decide relevance, the investigation agent decides relevance through code runs on the full candidate set. Wide-seed mode is a knob on the existing SearchRouter; no new retrieval substrate is required.

Context discipline at the synthesis boundary

Section titled “Context discipline at the synthesis boundary”

The pattern’s primary value is context discipline. Every page the investigation reads, every intermediate computation it runs, every candidate it discards — none of it enters the calling agent’s thread history. The calling agent sees the FINAL answer and nothing else.

This matters because the calling agent’s context budget is finite. An investigation that reads fifty pages and discards forty-eight of them should not cost fifty pages of context in the conversation. With checkpoint-rewind compaction (see Checkpoint Rewind and Compaction) applied at the outer level, even the task() call itself can be summarized rather than kept verbatim — but that is the outer pattern’s responsibility. The investigation’s own boundary guarantee is: the bubble stays inside the bubble.

The investigation’s outer LLM — the one that decides what code to run and when to call FINAL — uses the calling agent’s planner model. Helper calls (llm_query_batched and similar) default to a cheaper analyst model. Per-investigation override is available via params.

This tiering is deliberate: the expensive planner model makes strategic decisions about the synthesis; the cheap analyst model does the bulk scoring, classification, and extraction work inside batched calls. The two model tiers have distinct roles within one investigation.

Nested investigations (investigate, investigate_batched) share a single budget pool with their parent. Recursion depth caps at a default. The budget pool covers token consumption, wall time, and sandbox fuel combined. When the budget is exhausted, the innermost running investigation is terminated with a budget-exhaustion FINAL; control propagates up.

This means an investigation that spawns four nested investigations is not four times as expensive as expected — all five share one budget. The worldbuilder’s cost exposure is bounded by the pool, not by the nesting depth.

When an investigation’s FINAL answer includes proposed world writes, the submit-boundary adapter constructs derivation links from sources actually read during the investigation. This is distinct from the seeded context slice passed at dispatch time: seeding a thousand pages does not make all thousand pages derivation sources for every write. The helper library tracks which pages kb_read() was called on inside the session; only those appear in the derivation trail.

It is not a new agent runtime. The main agent does not live in the sandbox. The investigation’s LLM calls broker out to the sidecar; the sidecar is still the agent host. The sandbox holds the namespace and executes code; it does not host a graph.

It is not an escape from the submit boundary. Investigation-produced writes are agent writes. They cross the submit boundary with full provenance. The investigation cannot bypass the permission system by virtue of running in a sandbox.

It is not a cross-investigation persistent environment. Each investigation gets a fresh sandbox. Namespace from one investigation does not carry to the next. The only persistence across investigations is what flows through normal world writes (candidate content, memory tier entries via memory tools).

It is not replacing the SearchRouter with late-interaction retrieval. Wide-seed mode is a configuration change on the existing SearchRouter. Adding late-interaction reranking (ColBERT, pylate) is a separate decision, not gated on this pattern.

It is not unsandboxed code execution. Wasmtime stays. The investigation pattern is built on top of the sandbox capability, not around it. Code runs inside Wasmtime; the MCP back-channel is the only way out.

The investigation pattern touches several existing systems:

  • Wasmtime sandbox — the execution environment. The sandbox lifetime model is extended to support persistent sessions (see ADR-021); one-shot calls remain one-shot.
  • SearchRouter — wide-seed retrieval mode is an addition to the existing interface.
  • MCP server (sidecar) — the helper library surface exposes workspace capabilities to code running inside the sandbox via the dual-channel comms model.
  • Submit boundary — investigation writes route through the adapter with derivation links from the read trail.
  • task() primitive — investigations are one of the registered execution shapes; see Task Primitive.
  • Checkpoint Rewind and Compaction — the outer pattern for keeping the calling agent’s history compact after an investigation completes; see Checkpoint Rewind and Compaction.
  • It does not describe the Wasmtime sandbox domain semantics or the CPython-in-Wasmtime capability. See Sandbox Execution and Process Model.
  • It does not describe the task() primitive or the shape registry. See Task Primitive.
  • It does not describe the submit boundary or derivation links. See Submit Boundary.
  • It does not describe the MCP tool surface or the MCP back-channel. See MCP System.
  • It does not describe how the calling agent’s history is kept compact after an investigation. See Checkpoint Rewind and Compaction.
  • It does not describe the memory tier system that memory_get proxies to. See Agent Memory System.
  • It does not describe how DSPy LLM calls work inside the sidecar. See LLM System.

Was this page helpful?