Agent

Agent Skills & Execution Modes

Covers the full skill lifecycle in the agent harness: loading system skills from the embedded registry; FreeForm, Templated, and Blueprint execution modes; skill parameter passing; result display; catalog listing; invalid skill name handling; execution errors; capability-gated skills; and the distinction between system skills and user-authored skills. This spec is P1 because skills are the primary surface through which the agent’s behavior is customized — a broken skill dispatch silently falls back to general_assistant or errors without the user understanding why their specialized skill is not active.

Skills are defined using TOML frontmatter and a MiniJinja template body, stored as .skill files. System skills are embedded at compile time via include_str!() in SkillRegistry::new(). The registry holds 6 system skills: general_assistant, workspace_researcher, content_editor, proactive_organization_audit, proactive_consistency_check, and proactive_relationship_discovery. Each skill declares an execution_mode (FreeForm, Templated, or Blueprint), required_capabilities, and optionally typed parameters. Skills are selected via the skill parameter on send_agent_message (maps to HarnessConfig::skill_id). The PromptEngine renders the MiniJinja template with workspace context before the AgentLoop runs.

Preconditions

HTTP bridge running on port 9990
A workspace initialized via initialize_workspace before each scenario
Agent harness started via start_agent
Bridge shim injected via playwright.config.ts

Scenarios

Seed: seed.spec.ts

1. Skill loading from registry — all six system skills are available at startup

The SkillRegistry::new() constructor parses all embedded .skill files and caches them in memory. All six system skills must be present immediately after harness construction, before any user interaction.

Steps:

Start the agent harness via start_agent.
Send the message: “What skills do you have available?” (or invoke a bridge endpoint that lists skills if available).
Observe the response or skill list.

Expected: The agent or API response confirms at least 6 skills are available. The IDs general_assistant, workspace_researcher, content_editor, proactive_organization_audit, proactive_consistency_check, and proactive_relationship_discovery are all present. No skill parse errors appear in the logs. Skills with unknown capability strings are still loadable (unknown capability strings are skipped with a warning, not a fatal error).

2. FreeForm execution mode — agent uses system template as system prompt with full LLM latitude

The FreeFormExecutor passes the skill’s system_template directly to AgentLoop as the system prompt and lets the LLM respond freely within the tool and turn budget. general_assistant is a FreeForm skill with execution_mode = "FreeForm".

Steps:

Send a message to the agent using the general_assistant skill (default, no explicit skill parameter): “Help me brainstorm names for a dragon character.”
Observe the response.

Expected: The agent responds with creative dragon name suggestions — demonstrating full LLM latitude (FreeForm mode applies no output template or structural constraints). The agent:session-started Tauri event fires with skillId: "general_assistant". The response is conversational prose, not a structured template output. Token usage is reported in the session completion event.

3. FreeForm execution mode — skill system_template is rendered with workspace context

The PromptEngine renders the skill’s system_template as a MiniJinja template with context variables including workspace_name, available_tools, and unavailable_tools. The rendered prompt must reflect the actual workspace name.

Steps:

Create a workspace named “The Thornwood Chronicles”.
Start the agent and send any message using general_assistant.
Ask the agent: “What workspace are you working in?”

Expected: The agent responds with “The Thornwood Chronicles” (or references it in context). The MiniJinja {{ workspace_name }} variable was correctly substituted during prompt rendering. The available tools list is populated from the registry — no empty tool list.

4. Templated execution mode — skill requires RLM feature flag; graceful degradation when unavailable

The Templated execution mode requires the rlm Cargo feature to be enabled. When rlm is not compiled in, the RlmUnavailableExecutor returns an error with a clear message rather than silently falling back to FreeForm.

Steps:

Attempt to invoke a skill with execution_mode = "Templated" in a build where the rlm feature is not enabled.
Observe the agent’s response.

Expected: The agent (or harness) surfaces the error: “skill requires Templated execution mode, which needs the ‘rlm’ feature to be enabled”. The stop_reason is RlmUnavailable. The conversation panel shows an error state or the agent apologizes that the skill is currently unavailable. No silent fallback to FreeForm behavior occurs — the mode mismatch is reported explicitly.

5. Blueprint execution mode — skill requires RLM feature flag; graceful degradation when unavailable

The Blueprint execution mode also requires the rlm Cargo feature. When unavailable, the same RlmUnavailableExecutor path applies.

Steps:

Attempt to invoke a skill with execution_mode = "Blueprint" in a build where the rlm feature is not enabled.
Observe the agent’s response.

Expected: The harness returns a StopReason::RlmUnavailable error. The error message identifies the skill name and states that Blueprint mode requires the RLM feature. The response is a clear error, not a garbled partial execution. This mirrors the Templated scenario — both non-FreeForm modes require the same RLM infrastructure.

6. Skill parameter passing — parameters declared in frontmatter are substituted in the template

Skills can declare typed parameters in their TOML frontmatter. Required parameters without defaults must be provided at invocation time. Parameters are injected into the MiniJinja template context.

Steps:

Define a test skill (or use an existing skill that declares a parameter, such as a search query parameter).
Invoke the skill via send_agent_message with the skill parameter set and a message that maps to the declared parameter.
Observe the rendered system prompt (via log inspection or the agent’s response phrasing).

Expected: The agent’s behavior reflects the injected parameter. For example, if a skill declares name = "query", type = "string", required = true, the template renders the query in the system prompt correctly. Missing a required parameter without a default results in a template render error (HarnessError::TemplateRenderFailed). Optional parameters with defaults fall back to their declared default values when not provided.

7. Skill result display — execution result is shown in the conversation panel

The ExecutionResult from a skill run includes output (final text), stop_reason, execution_time_ms, and usage (token counts). These must be surfaced in the conversation panel in a readable form.

Steps:

Send a message to the agent using the workspace_researcher skill: “Research all pages that mention kings and kingdoms.”
Wait for the agent session to complete.

Expected: The conversation panel shows the agent’s research findings (based on ExecutionResult::output). The agent:session-completed Tauri event fires with stopReason: "end_turn" (or "max_turns" if the turn budget was exhausted). No raw JSON is shown to the user — the output is rendered as human-readable text. Token usage is logged internally.

8. Skill catalog listing — all available skills are enumerable

The SkillRegistry::list() method returns metadata for all cached skills. A UI that surfaces skill selection must be able to enumerate all available skills.

Steps:

Open the agent panel and navigate to the skill selector (if exposed in the UI) or query the available skills via the agent API.
Observe the listed skills.

Expected: At least the 6 system skill names appear: General Assistant, Workspace Researcher, Content Editor, Proactive Organization Audit, Proactive Consistency Check, Proactive Relationship Discovery. Each entry shows the skill’s name, description, and version from the TOML frontmatter. Skills sourced from SkillSource::System are visually distinguished from user-created (SkillSource::User) skills if both types are present.

9. Invalid skill name handling — requesting a non-existent skill fails gracefully

When send_agent_message is called with a skill ID that does not exist in the SkillRegistry, the harness must return a meaningful error rather than proceeding with an empty system prompt.

Steps:

Send a message to the agent with an invalid skill ID: send_agent_message("Do something", skill: "does_not_exist_skill_xyz").
Observe the agent panel or API response.

Expected: The harness returns a HarnessError::SkillFetchFailed error (or equivalent). The error message identifies the unknown skill ID. The conversation panel shows an error state or a message explaining the skill is unavailable. No session starts with an empty system prompt — the failure is caught before AgentLoop is invoked.

10. Skill execution error handling — LLM error during FreeForm execution surfaces cleanly

When the underlying LLM call fails during FreeForm skill execution (e.g., provider not configured, API key expired), the FreeFormExecutor surfaces the error as a HarnessError::AgentError(AgentError::LlmError(...)).

Steps:

Configure the agent with a stub LLM provider that always fails (StubLlmProvider with no API key configured).
Send any message using any skill.
Observe the conversation panel response.

Expected: The agent panel shows an error state (not a silent hang). The error message indicates the LLM provider is not configured or the API call failed. The agent:status-changed event fires with status "error". No partial or garbled response is displayed. The harness remains operable — a subsequent start_agent call (after configuring a valid provider) can recover the session.

11. Skill with required capabilities — skill is gated by the agent’s permission guard

Skills declare required_capabilities in their TOML frontmatter (e.g., ["PagesRead", "SearchUse"]). When the active permission guard does not include a required capability, the skill’s tools are marked unavailable and the skill may decline to execute certain operations.

Steps:

Start the agent with a restricted permission guard that excludes PagesWrite.
Invoke the content_editor skill (which requires write capability) with the message: “Edit the introduction of the ‘Dragon Lore’ page to add a new paragraph.”
Observe the agent’s response.

Expected: The skill loads and runs (the harness does not block skill activation based on capabilities — only individual tool calls are gated). However, tools that require PagesWrite are marked [UNAVAILABLE: Requires PagesWrite capability] in the LLM schema. The agent responds that it cannot make the edit due to permission restrictions. The page content is not modified.

12. System skills vs user skills — distinction is maintained in storage and listing

SkillSource::System skills are embedded in the binary and never written to the SkillStorageRepository. SkillSource::User skills are created by users and persisted in agents.db. The listing must correctly report the source for each skill.

Steps:

Check that general_assistant is listed as a system skill (source: system).
Create a user skill via the UI or API with a custom ID (e.g., my_custom_skill) and a FreeForm template.
List all skills and filter by source.

Expected: general_assistant (and the other 5 system skills) appear with source: "system". The newly created skill appears with source: "user". System skills and user skills are independently filterable via list_skills(source: Some(SkillSource::User)). Deleting the user skill via delete_user_skill removes it from the user list without affecting system skills. System skills cannot be deleted (they are binary-embedded, not stored in the DB).

Test Data

Key	Value	Notes
system_skill_general_assistant	general_assistant	Default skill; FreeForm; no required_capabilities
system_skill_workspace_researcher	workspace_researcher	FreeForm; search/read-focused
system_skill_content_editor	content_editor	FreeForm; requires PagesRead + PagesWrite for full operation
system_skill_proactive_org	proactive_organization_audit	Used by proactive suggestion engine
system_skill_proactive_consist	proactive_consistency_check	Used by proactive suggestion engine
system_skill_proactive_rel	proactive_relationship_discovery	Used by proactive suggestion engine
total_system_skills	6	Minimum count enforced by registry tests
freeform_executor_default_model	claude-sonnet-4-6	Default in FreeFormConfig::default()
freeform_executor_default_turns	20	Default max_turns in FreeFormConfig::default()
harness_default_turns	20	HarnessConfig::default() max_turns
skill_file_format	TOML frontmatter between --- delimiters	id, name, description, version, execution_mode, required_capabilities
rlm_feature_flag	rlm (Cargo feature)	Required for Templated and Blueprint modes; absent = RlmUnavailableExecutor
stop_reason_rlm_unavailable	RlmUnavailable	Returned when Templated/Blueprint skill runs without rlm feature
skill_source_system	system	Binary-embedded; not stored in DB; not deletable
skill_source_user	user	Created by user; persisted in agents.db via SqliteSkillRepository
skill_source_marketplace	marketplace	Downloaded from marketplace; cached in agents.db
invalid_skill_error	HarnessError::SkillFetchFailed	Returned when skill_id not found in registry or DB

Notes

Skills are selected via the optional skill parameter of send_agent_message. When None is provided, the harness defaults to "general_assistant" (see send_message() in apps/desktop/src-tauri/src/agent.rs). The test for invalid skill names must pass a non-None skill ID.
The PromptEngine renders the skill’s system_template using MiniJinja. Template variables include workspace_name, available_tools (list of available tools), and unavailable_tools (list of (tool, reason) pairs). The {% block %} syntax in system skills allows sub-skills or overrides.
FreeForm skills use FreeFormExecutor, which wraps AgentLoop from agent-core. The FreeFormConfig::default() sets model = "claude-sonnet-4-6" and max_turns = 20. These can be overridden by the HarnessConfig passed to send_message.
Templated and Blueprint modes are gated behind the rlm Cargo feature. In the current production build of the desktop app, whether rlm is enabled determines whether these modes are functional. E2e tests for Templated/Blueprint scenarios should detect the feature flag state and adjust expectations accordingly.
The SkillRegistry uses a cache-first lookup: system skills loaded at construction time are never replaced by cloud fetches for the same ID (cloud fetcher is only called on cache miss). This prevents cloud skills from shadowing system skills.
required_capabilities in the skill frontmatter are parsed as domain::Capability enums (e.g., "PagesRead" → Capability::PagesRead). Unknown capability strings are silently skipped with a warn! log — they do not fail the skill parse. Skills with empty required_capabilities run against any permission guard.
The HTTP bridge exposes skill management routes including skill catalog listing. Skill listing is also accessible via the agent conversation itself (“what skills do you have?”). Direct DB queries against agents.db’s skills table can verify SkillSource for user/marketplace skills.
ExecutionTrace (from skill/traces.rs) records the full execution trace for Blueprint mode and is used by DSPy-style assertion checking. FreeForm mode does not populate fuel_consumed (that field is RLM-only). E2e tests for FreeForm can assert fuel_consumed: null in completion events if the event payload exposes this field.

Previous
Settings Next
Tools

Was this page helpful?