Agent

Agent Tool Execution

Covers the full tool execution lifecycle within the agent harness: round-trip dispatch from user message through LLM tool call to result display; JSON Schema validation; availability gating; permission denial; sequential multi-tool turns; empty-result handling; invalid tool names; and the tool registry catalog. This spec is P1 because tools are the primary mechanism through which the agent interacts with the workspace — a broken tool dispatch silently prevents the agent from doing meaningful work, even when the LLM and UI appear healthy.

The agent harness uses a custom AgentLoop (not Rig’s) that dispatches tool calls returned by the LLM through a ToolRegistry. Each tool declares a JSON Schema for its parameters, an availability() method checked before execution, and a source (Workspace, External, or System) that controls boundary framing. Tool results are capped at 100 KB (DEFAULT_MAX_TOOL_RESULT_BYTES). The frontend receives agent:tool-call Tauri events with status "started", "completed", or "failed".

Preconditions

HTTP bridge running on port 9990
A workspace initialized via initialize_workspace before each scenario
Agent harness started via start_agent with at least one LLM provider configured (Anthropic, OpenAI, xAI, or Ollama) — scenarios that require real LLM responses are marked; others use a stub/mock provider and validate at the bridge API level
Bridge shim injected via playwright.config.ts

Scenarios

Seed: seed.spec.ts

1. Tool execution round-trip — agent calls `search_pages` and result is displayed

The most fundamental path: agent receives a user message, decides to call a tool, the tool executes against the workspace, and the result is surfaced back to the user.

Steps:

Create a page titled “Dragon Lore” with body text “Dragons breathe fire and hoard treasure.”
Open the agent conversation panel.
Send the message: “What pages mention dragons?”
Observe the agent panel while the response streams.

Expected: The agent calls search_pages with a query related to “dragons”. A tool-call indicator appears in the conversation panel showing the tool is in progress (status "started"), then transitions to completed (status "completed"). The agent’s final text response references “Dragon Lore” or its content. No error state is shown.

2. JSON Schema validation — missing required argument is rejected

The Tool trait requires each tool to declare a JSON Schema. When the LLM returns a tool call with missing required fields, the harness must surface an error rather than forwarding a malformed request to the repository.

Steps:

Attempt to call search_pages via the bridge API without the required query parameter (e.g., POST /agent/tool/execute with { "tool": "search_pages", "arguments": {} }).
Observe the response.

Expected: A 400 Bad Request or equivalent validation error is returned. The error message indicates the missing or invalid argument (e.g., “missing ‘query’ field”). The workspace state is unchanged — no search was performed. Note: if a direct tool-execute endpoint is not exposed, this scenario validates via the agent conversation: send a message that produces a known-bad tool invocation pattern and confirm the agent’s fallback response.

3. Tool result display — successful execution renders content in the conversation

A successfully executed tool result must be rendered in the agent conversation in a way the user can read.

Steps:

Create three pages: “History of Rome”, “History of Greece”, “History of Egypt”.
Send the agent message: “List all pages about history.”
Wait for the agent response to complete.

Expected: The conversation panel shows a response that references at least two of the three history pages. The tool-call indicator for search_pages shows as completed (not failed). The text result is not a raw JSON blob — it is formatted as prose or a readable list. No [UNAVAILABLE: ...] annotation is present in the tool’s name.

4. Tool result display — execution error renders clearly without crashing the panel

When a tool execution fails (e.g., ToolError::ExecutionFailed), the agent must surface the failure gracefully rather than entering an error state that breaks the conversation.

Steps:

Send the agent message: “Get the page with slug ‘this-page-does-not-exist-xyz-404’.”
Observe the agent panel response.

Expected: The agent acknowledges the page was not found (either via a tool error result surfaced to the LLM, or via the LLM’s natural response). The conversation panel does not crash, freeze, or show a blank state. The agent:tool-call event received by the frontend has status "failed" with an error field. The agent may retry or offer to help another way, but the UI remains operational.

5. Tool timeout handling — long-running tool calls do not block the UI indefinitely

The agent loop must not hang the frontend when a tool call takes unexpectedly long.

Steps:

If the bridge exposes a configurable delay endpoint, trigger a tool call that is artificially delayed beyond the expected timeout threshold.
Alternatively, simulate by sending a message that causes the agent to call a tool on a very large result set.
Observe the conversation panel during and after the call.

Expected: The conversation panel remains responsive during the tool call. The tool-call indicator shows a loading/in-progress state. If the call times out, the agent loop surfaces an error and the panel does not freeze. Clicking “Interrupt” (if visible) cancels the session. The agent:session-interrupted event fires.

6. Available tools listed in the agent context — system prompt reflects tool inventory

The general_assistant skill’s system prompt includes an ## Available Tools section generated from PromptEngine rendering. Tools that are available appear without restriction markers; unavailable tools appear with [UNAVAILABLE: ...].

Steps:

Start the agent harness with full capabilities granted (owner context).
Send the message: “What tools do you have access to?”
Observe the agent’s response.

Expected: The agent describes its tool set. The response mentions workspace tools such as search_pages, read_page, get_page_tree, create_page, and related operations. No tool appears with an [UNAVAILABLE: ...] annotation when the owner’s full capability guard is active. The list is complete and matches the tools registered by ToolRegistryBuilder.

7. Tool execution respects capability restrictions — unavailable tool returns permission denied

The ToolAvailability::Unavailable variant causes ToolRegistry::execute to return AgentError::PermissionDenied without invoking the tool. This must propagate correctly to the conversation.

Steps:

Configure the agent with a restricted permission set that excludes PagesWrite (e.g., start agent as a non-owner user or use a test guard with only PagesRead).
Send the message: “Create a page titled ‘Test Page’.”
Observe the response.

Expected: The agent’s tool list shows create_page as [UNAVAILABLE: Requires PagesWrite capability]. When the agent attempts to call create_page, the call is rejected at the availability check without reaching the repository. The conversation shows the agent explaining it cannot create pages due to permission restrictions. No page is created in the workspace.

8. Multiple sequential tool calls in one agent response — multi-turn chaining works

The agent loop supports multiple tool calls within a single agent session (up to max_turns). When the LLM chains tool calls across turns, all must complete and feed into the final response.

Steps:

Create pages “Chapter 1”, “Chapter 2”, “Chapter 3” in a workspace.
Send the agent message: “Read the content of all three chapter pages and summarize them together.”
Observe the conversation panel as the agent processes.

Expected: The conversation panel shows multiple tool-call indicators in sequence (e.g., read_page called three times or search_pages followed by multiple read_page calls). Each indicator transitions from "started" to "completed". The agent’s final text response synthesizes content from all three pages. The loop does not abort early with a MaxTurnsExceeded error for this reasonable scope.

9. Tool with no meaningful return value — `counter`-style tools report cleanly

Some tools return a simple string result (such as an acknowledgement or counter value). The agent must display such results without treating an empty or minimal response as an error.

Steps:

Send the agent a message that results in a tool call producing a minimal string response — for example: “Move the page ‘Chapter 1’ to be a child of ‘Chapter 2’.” (The move_page tool returns a success acknowledgement, not rich data.)
Observe the conversation response.

Expected: The tool-call indicator shows "completed" (not "failed"). The agent’s response confirms the action was performed. The UI does not show an empty result as an error. The page hierarchy is updated in the workspace (verifiable by inspecting the sidebar or get_page_tree).

10. Tool registry lists all available tools in registration order

The ToolRegistry::definitions() method returns all registered tools in insertion order. This determines what appears in the LLM’s tool schema.

Steps:

Start the agent harness.
Query the agent status or inspect the agent initialization log.
Send the message: “How many tools do you have available?”

Expected: The agent correctly enumerates its tools. The count matches the number of native workspace tools registered by ToolRegistryBuilder (at minimum: search_pages, read_page, get_page_tree, get_backlinks, get_outgoing_links, read_page_history, create_page, update_page_content, update_page_metadata, move_page, rename_page, delete_page). No duplicate tool names appear in the schema.

11. Invalid tool name handling — calling an unregistered tool returns ToolNotFound

When the LLM hallucinates a tool name that is not registered, the registry must return AgentError::ToolNotFound rather than panicking or silently no-opping.

Steps:

Via the bridge API or a crafted agent session, attempt to call a tool named "destroy_all_data" (which does not exist in the registry).
Observe the result.

Expected: The call returns a ToolNotFound error. The error message identifies the tool name. No workspace operation is performed. If this error propagates to the conversation, the agent acknowledges the tool is unavailable. The agent loop continues (the error is surfaced as a tool result, not a fatal crash).

12. Tool result size cap — oversized results are truncated to 100 KB

DEFAULT_MAX_TOOL_RESULT_BYTES is 100 KB (102,400 bytes). Results exceeding this threshold are truncated before being sent back to the LLM. The truncation must not break the conversation.

Steps:

Create a page with very large content (e.g., 500+ KB of repeated text).
Ask the agent to read that page by sending: “Read the content of ‘Large Page’.”
Observe whether the response arrives without error.

Expected: The read_page tool result is truncated at the 100 KB boundary. The agent receives a partial result (with a truncation indicator if the implementation adds one) and generates a response — even if the response is based on partial content. No ToolExecutionFailed error is raised due to size alone. The conversation panel does not hang or show a timeout.

Test Data

Key	Value	Notes
search_tool_name	search_pages	Primary tool for content retrieval tests
read_tool_name	read_page	Used in sequential multi-tool chaining
create_tool_name	create_page	Requires PagesWrite capability
nonexistent_tool_name	destroy_all_data	Tool name guaranteed not to exist in any registry
max_tool_result_bytes	102400	100 KB cap enforced by DEFAULT_MAX_TOOL_RESULT_BYTES
decay_rate	0.995^hours	Not used in tool tests but referenced for system context
tool_sources	Workspace, External, System	Trust levels for boundary framing in LLM requests
process_types	Orchestrator, Researcher, Worker, SkillComposer	Researcher receives only read-only tools (get_, list_, search_, etc.)
read_only_prefixes	get_, list_, search_, read_, describe_	Conventional prefix conventions for is_read_only() default impl
capability_denied_msg	[UNAVAILABLE: Requires … capability]	Format shown in LLM schema for unavailable tools

Notes

The HTTP bridge exposes agent interaction routes including send_agent_message. Tool-level assertions can be made by inspecting the event stream (agent:tool-call) or the agent’s final text response. Tool execution scenarios that verify real LLM-driven tool output (scenarios 1–9) require a configured LLM provider; the StubLlmProvider is used in keyless CI. Tool listing and schema scenarios (scenarios 10–12) are verifiable without a live provider.
Scenarios that require a real LLM response depend on the test environment having an API key configured. In CI without keys, the StubLlmProvider is used, which returns “not configured” errors. These scenarios are marked as requiring a live provider and may be skipped in keyless CI runs.
The Researcher process type receives only read-only tools (those whose is_read_only() returns true). The ToolFilter::ReadOnly filter uses Tool::is_read_only() — not name prefix alone — so tools that override is_read_only() explicitly are gated correctly regardless of their name.
ToolAvailability::Unavailable is checked at dispatch time in ToolRegistry::execute. The tool’s execute() method is never called for unavailable tools; AgentError::PermissionDenied is returned directly.
Tool results carry a ToolSource (Workspace, External, System) that controls boundary framing in LLM requests. Native workspace tools default to Workspace. External MCP tools use External. This framing is applied by the agent loop and is not directly visible in the UI.
The DEFAULT_MAX_TOOL_RESULT_BYTES constant (100 KB) is defined in crates/infrastructure/agent-core/src/agent_loop.rs. Truncation happens before the result is added to the conversation history, so the stored ConversationState never contains oversized content.

Previous
Skills Next
Formatting

Was this page helpful?

Agent Tool Execution

Agent Tool Execution

Preconditions

Scenarios

1. Tool execution round-trip — agent calls search_pages and result is displayed

2. JSON Schema validation — missing required argument is rejected

3. Tool result display — successful execution renders content in the conversation

4. Tool result display — execution error renders clearly without crashing the panel

5. Tool timeout handling — long-running tool calls do not block the UI indefinitely

6. Available tools listed in the agent context — system prompt reflects tool inventory

7. Tool execution respects capability restrictions — unavailable tool returns permission denied

8. Multiple sequential tool calls in one agent response — multi-turn chaining works

9. Tool with no meaningful return value — counter-style tools report cleanly

10. Tool registry lists all available tools in registration order

11. Invalid tool name handling — calling an unregistered tool returns ToolNotFound

12. Tool result size cap — oversized results are truncated to 100 KB

Test Data

Notes

1. Tool execution round-trip — agent calls `search_pages` and result is displayed

9. Tool with no meaningful return value — `counter`-style tools report cleanly