Agent Tool Execution
Agent Tool Execution
Covers the full tool execution lifecycle within the agent harness: round-trip dispatch from user message through LLM tool call to result display; JSON Schema validation; availability gating; permission denial; sequential multi-tool turns; empty-result handling; invalid tool names; and the tool registry catalog. This spec is P1 because tools are the primary mechanism through which the agent interacts with the workspace — a broken tool dispatch silently prevents the agent from doing meaningful work, even when the LLM and UI appear healthy.
The agent harness uses a custom AgentLoop (not Rig’s) that dispatches tool calls returned by the LLM through a
ToolRegistry. Each tool declares a JSON Schema for its parameters, an availability() method checked before
execution, and a source (Workspace, External, or System) that controls boundary framing. Tool results are capped at
100 KB (DEFAULT_MAX_TOOL_RESULT_BYTES). The frontend receives agent:tool-call Tauri events with status "started",
"completed", or "failed".
Preconditions
- HTTP bridge running on port 9990
- A workspace initialized via
initialize_workspacebefore each scenario - Agent harness started via
start_agentwith at least one LLM provider configured (Anthropic, OpenAI, xAI, or Ollama) — scenarios that require real LLM responses are marked; others use a stub/mock provider and validate at the bridge API level - Bridge shim injected via
playwright.config.ts
Scenarios
Seed: seed.spec.ts
1. Tool execution round-trip — agent calls search_pages and result is displayed
The most fundamental path: agent receives a user message, decides to call a tool, the tool executes against the workspace, and the result is surfaced back to the user.
Steps:
- Create a page titled “Dragon Lore” with body text “Dragons breathe fire and hoard treasure.”
- Open the agent conversation panel.
- Send the message: “What pages mention dragons?”
- Observe the agent panel while the response streams.
Expected: The agent calls search_pages with a query related to “dragons”. A tool-call indicator appears in the
conversation panel showing the tool is in progress (status "started"), then transitions to completed (status
"completed"). The agent’s final text response references “Dragon Lore” or its content. No error state is shown.
2. JSON Schema validation — missing required argument is rejected
The Tool trait requires each tool to declare a JSON Schema. When the LLM returns a tool call with missing required
fields, the harness must surface an error rather than forwarding a malformed request to the repository.
Steps:
- Attempt to call
search_pagesvia the bridge API without the requiredqueryparameter (e.g.,POST /agent/tool/executewith{ "tool": "search_pages", "arguments": {} }). - Observe the response.
Expected: A 400 Bad Request or equivalent validation error is returned. The error message indicates the missing or
invalid argument (e.g., “missing ‘query’ field”). The workspace state is unchanged — no search was performed. Note: if a
direct tool-execute endpoint is not exposed, this scenario validates via the agent conversation: send a message that
produces a known-bad tool invocation pattern and confirm the agent’s fallback response.
3. Tool result display — successful execution renders content in the conversation
A successfully executed tool result must be rendered in the agent conversation in a way the user can read.
Steps:
- Create three pages: “History of Rome”, “History of Greece”, “History of Egypt”.
- Send the agent message: “List all pages about history.”
- Wait for the agent response to complete.
Expected: The conversation panel shows a response that references at least two of the three history pages. The
tool-call indicator for search_pages shows as completed (not failed). The text result is not a raw JSON blob — it is
formatted as prose or a readable list. No [UNAVAILABLE: ...] annotation is present in the tool’s name.
4. Tool result display — execution error renders clearly without crashing the panel
When a tool execution fails (e.g., ToolError::ExecutionFailed), the agent must surface the failure gracefully rather
than entering an error state that breaks the conversation.
Steps:
- Send the agent message: “Get the page with slug ‘this-page-does-not-exist-xyz-404’.”
- Observe the agent panel response.
Expected: The agent acknowledges the page was not found (either via a tool error result surfaced to the LLM, or via
the LLM’s natural response). The conversation panel does not crash, freeze, or show a blank state. The agent:tool-call
event received by the frontend has status "failed" with an error field. The agent may retry or offer to help another
way, but the UI remains operational.
5. Tool timeout handling — long-running tool calls do not block the UI indefinitely
The agent loop must not hang the frontend when a tool call takes unexpectedly long.
Steps:
- If the bridge exposes a configurable delay endpoint, trigger a tool call that is artificially delayed beyond the expected timeout threshold.
- Alternatively, simulate by sending a message that causes the agent to call a tool on a very large result set.
- Observe the conversation panel during and after the call.
Expected: The conversation panel remains responsive during the tool call. The tool-call indicator shows a
loading/in-progress state. If the call times out, the agent loop surfaces an error and the panel does not freeze.
Clicking “Interrupt” (if visible) cancels the session. The agent:session-interrupted event fires.
6. Available tools listed in the agent context — system prompt reflects tool inventory
The general_assistant skill’s system prompt includes an ## Available Tools section generated from PromptEngine
rendering. Tools that are available appear without restriction markers; unavailable tools appear with
[UNAVAILABLE: ...].
Steps:
- Start the agent harness with full capabilities granted (owner context).
- Send the message: “What tools do you have access to?”
- Observe the agent’s response.
Expected: The agent describes its tool set. The response mentions workspace tools such as search_pages,
read_page, get_page_tree, create_page, and related operations. No tool appears with an [UNAVAILABLE: ...]
annotation when the owner’s full capability guard is active. The list is complete and matches the tools registered by
ToolRegistryBuilder.
7. Tool execution respects capability restrictions — unavailable tool returns permission denied
The ToolAvailability::Unavailable variant causes ToolRegistry::execute to return AgentError::PermissionDenied
without invoking the tool. This must propagate correctly to the conversation.
Steps:
- Configure the agent with a restricted permission set that excludes
PagesWrite(e.g., start agent as a non-owner user or use a test guard with onlyPagesRead). - Send the message: “Create a page titled ‘Test Page’.”
- Observe the response.
Expected: The agent’s tool list shows create_page as [UNAVAILABLE: Requires PagesWrite capability]. When the
agent attempts to call create_page, the call is rejected at the availability check without reaching the repository.
The conversation shows the agent explaining it cannot create pages due to permission restrictions. No page is created in
the workspace.
8. Multiple sequential tool calls in one agent response — multi-turn chaining works
The agent loop supports multiple tool calls within a single agent session (up to max_turns). When the LLM chains tool
calls across turns, all must complete and feed into the final response.
Steps:
- Create pages “Chapter 1”, “Chapter 2”, “Chapter 3” in a workspace.
- Send the agent message: “Read the content of all three chapter pages and summarize them together.”
- Observe the conversation panel as the agent processes.
Expected: The conversation panel shows multiple tool-call indicators in sequence (e.g., read_page called three
times or search_pages followed by multiple read_page calls). Each indicator transitions from "started" to
"completed". The agent’s final text response synthesizes content from all three pages. The loop does not abort early
with a MaxTurnsExceeded error for this reasonable scope.
9. Tool with no meaningful return value — counter-style tools report cleanly
Some tools return a simple string result (such as an acknowledgement or counter value). The agent must display such results without treating an empty or minimal response as an error.
Steps:
- Send the agent a message that results in a tool call producing a minimal string response — for example: “Move the
page ‘Chapter 1’ to be a child of ‘Chapter 2’.” (The
move_pagetool returns a success acknowledgement, not rich data.) - Observe the conversation response.
Expected: The tool-call indicator shows "completed" (not "failed"). The agent’s response confirms the action was
performed. The UI does not show an empty result as an error. The page hierarchy is updated in the workspace (verifiable
by inspecting the sidebar or get_page_tree).
10. Tool registry lists all available tools in registration order
The ToolRegistry::definitions() method returns all registered tools in insertion order. This determines what appears
in the LLM’s tool schema.
Steps:
- Start the agent harness.
- Query the agent status or inspect the agent initialization log.
- Send the message: “How many tools do you have available?”
Expected: The agent correctly enumerates its tools. The count matches the number of native workspace tools
registered by ToolRegistryBuilder (at minimum: search_pages, read_page, get_page_tree, get_backlinks,
get_outgoing_links, read_page_history, create_page, update_page_content, update_page_metadata, move_page,
rename_page, delete_page). No duplicate tool names appear in the schema.
11. Invalid tool name handling — calling an unregistered tool returns ToolNotFound
When the LLM hallucinates a tool name that is not registered, the registry must return AgentError::ToolNotFound rather
than panicking or silently no-opping.
Steps:
- Via the bridge API or a crafted agent session, attempt to call a tool named
"destroy_all_data"(which does not exist in the registry). - Observe the result.
Expected: The call returns a ToolNotFound error. The error message identifies the tool name. No workspace
operation is performed. If this error propagates to the conversation, the agent acknowledges the tool is unavailable.
The agent loop continues (the error is surfaced as a tool result, not a fatal crash).
12. Tool result size cap — oversized results are truncated to 100 KB
DEFAULT_MAX_TOOL_RESULT_BYTES is 100 KB (102,400 bytes). Results exceeding this threshold are truncated before being
sent back to the LLM. The truncation must not break the conversation.
Steps:
- Create a page with very large content (e.g., 500+ KB of repeated text).
- Ask the agent to read that page by sending: “Read the content of ‘Large Page’.”
- Observe whether the response arrives without error.
Expected: The read_page tool result is truncated at the 100 KB boundary. The agent receives a partial result (with
a truncation indicator if the implementation adds one) and generates a response — even if the response is based on
partial content. No ToolExecutionFailed error is raised due to size alone. The conversation panel does not hang or
show a timeout.
Test Data
| Key | Value | Notes |
|---|---|---|
| search_tool_name | search_pages | Primary tool for content retrieval tests |
| read_tool_name | read_page | Used in sequential multi-tool chaining |
| create_tool_name | create_page | Requires PagesWrite capability |
| nonexistent_tool_name | destroy_all_data | Tool name guaranteed not to exist in any registry |
| max_tool_result_bytes | 102400 | 100 KB cap enforced by DEFAULT_MAX_TOOL_RESULT_BYTES |
| decay_rate | 0.995^hours | Not used in tool tests but referenced for system context |
| tool_sources | Workspace, External, System | Trust levels for boundary framing in LLM requests |
| process_types | Orchestrator, Researcher, Worker, SkillComposer | Researcher receives only read-only tools (get_, list_, search_, etc.) |
| read_only_prefixes | get_, list_, search_, read_, describe_ | Conventional prefix conventions for is_read_only() default impl |
| capability_denied_msg | [UNAVAILABLE: Requires … capability] | Format shown in LLM schema for unavailable tools |
Notes
- The HTTP bridge exposes agent interaction routes including
send_agent_message. Tool-level assertions can be made by inspecting the event stream (agent:tool-call) or the agent’s final text response. Tool execution scenarios that verify real LLM-driven tool output (scenarios 1–9) require a configured LLM provider; theStubLlmProvideris used in keyless CI. Tool listing and schema scenarios (scenarios 10–12) are verifiable without a live provider. - Scenarios that require a real LLM response depend on the test environment having an API key configured. In CI without
keys, the
StubLlmProvideris used, which returns “not configured” errors. These scenarios are marked as requiring a live provider and may be skipped in keyless CI runs. - The
Researcherprocess type receives only read-only tools (those whoseis_read_only()returnstrue). TheToolFilter::ReadOnlyfilter usesTool::is_read_only()— not name prefix alone — so tools that overrideis_read_only()explicitly are gated correctly regardless of their name. ToolAvailability::Unavailableis checked at dispatch time inToolRegistry::execute. The tool’sexecute()method is never called for unavailable tools;AgentError::PermissionDeniedis returned directly.- Tool results carry a
ToolSource(Workspace,External,System) that controls boundary framing in LLM requests. Native workspace tools default toWorkspace. External MCP tools useExternal. This framing is applied by the agent loop and is not directly visible in the UI. - The
DEFAULT_MAX_TOOL_RESULT_BYTESconstant (100 KB) is defined incrates/infrastructure/agent-core/src/agent_loop.rs. Truncation happens before the result is added to the conversation history, so the storedConversationStatenever contains oversized content.
Was this page helpful?
Thanks for your feedback!