Skip to content
Documentation GitHub
Agent

Skill System

Crate: crates/infrastructure/agent-harness/


Skills are multi-artifact packages, not single-template prompts. A skill packages together multiple artifacts of different kinds — descriptions, prompt templates, code templates, DSPy modules — that collectively define a reusable agent capability. The system dispatches to different handlers based on artifact kind, replacing the previous ExecutionMode enum (FreeForm/Templated/Blueprint).

  1. Skills are prompt content, not compiled code. A new skill is a new set of prompt documents, not a new crate version. Agent capabilities evolve independently of binary releases.
  2. Artifact-kind dispatch replaces execution modes. Instead of a skill declaring itself as “FreeForm” or “Blueprint”, each artifact declares its kind. The handler is selected per-artifact.
  3. Two-phase activation. Phase 1 (Context Pipeline) matches skill metadata cheaply via deterministic search. Phase 2 (Orchestrator) selects and loads specific artifacts. Full artifact content is never loaded until needed.
  4. Dual-scope storage. Skills live in agents.db at both account and workspace levels. Workspace skills override account skills by name.
  5. Dual execution contexts. code_template artifacts run in the WASM sandbox (CPython-in-Wasmtime). dspy_module artifacts are proxied to the Python sidecar (apps/python-sidecar/).
  6. Multi-artifact graceful degradation. dspy_module skills must include a prompt_template fallback for environments where the sidecar is unavailable.
  7. Template-only DSPy execution. dspy_module artifacts reference known shipped templates by ID and store serialized state (JSON) in state_blob — no user-supplied Python code enters the sidecar.

Skills are stored across two tables in agents.db:

-- Skill packages
CREATE TABLE skills (
id TEXT PRIMARY KEY, -- UUID
name TEXT NOT NULL UNIQUE, -- Human-readable identifier
description TEXT NOT NULL, -- Short description for catalog display
version TEXT NOT NULL DEFAULT '1.0.0',
tags TEXT, -- JSON array of string tags
source TEXT NOT NULL DEFAULT 'user', -- 'system' | 'community' | 'user'
author TEXT, -- Display name of author
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
-- Metadata for pipeline matching (Phase 1)
intent_patterns TEXT, -- JSON array of intent pattern strings
trigger_phrases TEXT, -- JSON array of trigger phrase strings
-- Sync and distribution
marketplace_id TEXT, -- NULL for local-only skills
sync_etag TEXT -- For marketplace version checking
);
CREATE INDEX idx_skills_source ON skills(source);
CREATE INDEX idx_skills_name ON skills(name);
-- Skill artifacts (one-to-many with skills)
CREATE TABLE skill_artifacts (
id TEXT PRIMARY KEY, -- UUID
skill_id TEXT NOT NULL REFERENCES skills(id) ON DELETE CASCADE,
kind TEXT NOT NULL, -- Artifact kind enum (see below)
name TEXT NOT NULL, -- Human-readable artifact name
ordinal INTEGER NOT NULL DEFAULT 0, -- Display/execution ordering
content TEXT NOT NULL, -- The artifact content (template, code, etc.)
model_variant TEXT, -- NULL = default; else model family key
metadata TEXT, -- JSON: kind-specific config
state_blob BLOB, -- Serialized DSPy module state (dspy_module only)
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(skill_id, name, model_variant)
);
CREATE INDEX idx_skill_artifacts_skill_id ON skill_artifacts(skill_id);
CREATE INDEX idx_skill_artifacts_kind ON skill_artifacts(kind);
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(snake_case)]
pub enum ArtifactKind {
/// High-level description of the skill's purpose and approach.
Description,
/// MiniJinja template rendered with context variables before LLM call.
PromptTemplate,
/// Python code template executed in the RLM sandbox (WASM).
CodeTemplate,
/// Structured DSPy module executed in the Python sidecar.
DspyModule,
// -- Deferred kinds (schema supports, handlers not yet implemented) --
// /// Strategy or methodology guidance (injected into system prompt).
// Approach,
// /// Concrete input/output examples for few-shot prompting.
// Example,
}

V1 artifact kinds: description, prompt_template, code_template, dspy_module. The approach and example kinds are supported in the schema but their handlers are deferred.

“Consistency Checker” — Analyzes workspace content for contradictions.

ArtifactKindPurpose
OverviewdescriptionExplains what the skill does and when to use it
Check Promptprompt_templateMiniJinja template for per-page-pair contradiction analysis
Cross-Reference Scriptcode_templatePython script that traverses workspace pages, extracts claims, and clusters them for LLM review
DSPy Checkerdspy_moduleReferences shipped chain_of_thought template with optimized state for contradiction detection
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Consistency Checker",
"description": "Analyzes workspace content for contradictions and inconsistencies across pages.",
"version": "1.0.0",
"tags": ["analysis", "worldbuilding", "quality"],
"source": "system",
"intent_patterns": ["check consistency", "find contradictions", "verify facts"],
"trigger_phrases": ["are there any contradictions", "check for inconsistencies"],
"artifacts": [
{
"kind": "description",
"name": "Overview",
"ordinal": 0,
"content": "The Consistency Checker skill analyzes content across your workspace to identify contradictions, inconsistencies, and conflicting statements. It is most useful for worldbuilding projects with many interrelated entities (characters, locations, timelines)."
},
{
"kind": "prompt_template",
"name": "Check Prompt",
"ordinal": 1,
"content": "Given the following claims about \"{{ entity_name }}\":\n\n{% for claim in claims %}Claim {{ loop.index }} (from [[{{ claim.page_name }}]]): {{ claim.text }}\n{% endfor %}\n\nIdentify any contradictions between these claims. For each contradiction, cite the specific claims by number and explain the conflict."
},
{
"kind": "code_template",
"name": "Cross-Reference Script",
"ordinal": 2,
"content": "import inklings\n\npages = inklings.search('{{ entity_type }}')\nclaims = []\nfor page in pages:\n content = inklings.get_page(page['slug'])\n extracted = inklings.llm_query(f'Extract factual claims about {entity_type} entities from: {content}')\n claims.extend(extracted)\ninklings.submit({'entity_type': '{{ entity_type }}', 'claims': claims})"
},
{
"kind": "dspy_module",
"name": "DSPy Checker",
"ordinal": 3,
"content": "{\"template_id\": \"chain_of_thought\", \"signature\": \"claims -> contradictions\"}",
"state_blob": "<base64-encoded JSON from DSPy module.save(path, save_program=False)>",
"metadata": {
"fallback_artifact": "Check Prompt"
}
}
]
}

High-level explanation of the skill’s purpose, target audience, and when it should be activated. Loaded during Phase 2 (Orchestrator review) to confirm skill relevance before loading heavier artifacts.

PropertyValue
HandlerNone (informational; injected into Orchestrator context)
Context costLow (~100-300 tokens)
When usedAlways loaded when skill is activated

A MiniJinja template that is rendered with context variables (entity names, page content, user query, etc.) and sent as a user message to the LLM. This is the primary mechanism for structured LLM interactions.

PropertyValue
HandlerPromptEngine::render_str() -> LLM call
Context costVariable (depends on template + rendered context)
When usedPer-step during skill execution

Model variants: A prompt_template artifact may have multiple rows with different model_variant values (e.g., "anthropic", "openai", "default"). The PromptEngine selects the best match for the active model. See PromptEngine Simplification.

A Python script template executed in the RLM sandbox (CPython-in-Wasmtime). Template variables are rendered before execution. The script has access to the inklings host module for workspace queries and llm_query for sub-LM calls.

PropertyValue
HandlerTemplate render -> RlmExecutor::execute()
Execution contextWASM sandbox (Wasmtime)
Context costZero (runs outside LLM context window)
When usedFor workspace-scale analysis requiring iteration

Security: Code templates run inside the WASM sandbox with fuel metering, memory limits, and no filesystem/network access. The inklings Python module is the only bridge to workspace data.

Host functions available: workspace_search, get_page, get_pages, get_pages_by_type, get_tags, get_references, get_history, llm_query, llm_query_batched, checkpoint, submit. See Agent Core System — Host Functions for the complete list.

A structured DSPy module that defines an LLM program with declarative signatures, chain-of-thought reasoning, and optimizable prompts. Executed in the Python sidecar (apps/python-sidecar/), not the WASM sandbox, because DSPy requires full Python ecosystem access (native C extensions like pydantic-core, numpy).

PropertyValue
Handlerrun_skill host function -> DSPy sidecar
Execution contextPython sidecar (PyInstaller binary)
Context costZero (runs outside LLM context window)
When usedFor structured LLM programs with DSPy optimization

Template-only execution model: The dspy_module artifact does NOT contain Python source code. Instead, it references a known execution template by template_id and stores serialized DSPy state (JSON) in state_blob. The sidecar ships with a fixed set of DSPy module class definitions (e.g., Predict, ChainOfThought, ReAct, custom pipelines). At execution time, the sidecar instantiates the known class, calls .load() with the state blob, and executes.

This eliminates the untrusted code execution path entirely — no user-supplied Python code enters the sidecar runtime.

Artifact schema:

  • content: JSON containing template_id (references a shipped template) and execution parameters
  • state_blob: Serialized DSPy state (JSON from module.save(path, save_program=False)) — contains optimized few-shot demos, signature customizations, and LM settings
  • metadata.fallback_artifact: Points to a prompt_template artifact for graceful degradation

Execution architecture: The code_template or Orchestrator calls run_skill(skill_id, params) as a host function. This proxies execution to the Python sidecar process. The sidecar receives the template_id, state_blob, and parameters, instantiates the known template class, loads the optimized state, executes, and returns structured output.

LLM access: The sidecar uses InklingsLM, a custom DSPy LM subclass that routes all LLM completions through bidirectional IPC back to Rust, where the ProviderRegistry (Rig-based) handles the actual API call. Zero API keys exist in the sidecar environment.

Graceful degradation: Every dspy_module artifact must specify a fallback_artifact in its metadata field, pointing to a prompt_template artifact in the same skill package. If the sidecar is unavailable (not installed, failed to start), the system falls back to the prompt_template artifact transparently.

State management: DSPy modules persist optimized state in the state_blob column of skill_artifacts. The state is portable JSON produced by module.save(path, save_program=False) — containing optimized few-shot demos, signature definitions, and LM settings. Re-optimization is triggered by version upgrades (DSPy version bump in sidecar) via the Skill Composer.

Version upgrade: When the sidecar ships a new DSPy version, the app detects the version change and re-runs optimization for all skills with existing state_blob data via the Skill Composer. This is a background task on first launch after update.

DSPy sidecar process lifecycle:

App startup
|
v
[Sidecar not started - lazy]
|
v (first dspy_module execution or Skill Composer invocation)
[Start DSPy sidecar]
|-- PyInstaller binary ships with app as Tauri externalBin
|-- No setup required — self-contained
|
v
[Sidecar running - long-lived]
|-- Two service modes:
| |-- Execute: template_id + state_blob + params -> result
| |-- Optimize: template_id + training_data -> state_blob (JSON)
|-- All LLM calls routed through IPC -> Rust -> ProviderRegistry
|-- Template manifest queryable via IPC
|
v (idle timeout or app shutdown)
[Sidecar stopped]

Python sidecar distribution:

The app ships a PyInstaller binary as a Tauri externalBin. This binary contains Python + DSPy + all dependencies (pydantic-core, litellm, numpy) in a single executable. No Python installation, venv setup, or package management required on the user’s machine.

PropertyValue
DistributionPyInstaller single-file executable (~70 MB per platform)
Tauri configexternalBin in tauri.conf.json
DependenciesPython 3.12+ runtime, DSPy, pydantic, litellm, numpy
Setup requiredNone — self-contained binary
Update mechanismShips with app updates; version bump triggers re-optimize

run_skill host function:

The bridge between the WASM sandbox (or Orchestrator) and the Python sidecar:

Worker (in WASM or agent loop)
|
v
run_skill(skill_id, params)
|
v
[Load artifact: template_id + state_blob]
|
v
[IPC to DSPy sidecar process]
|
v
[Sidecar instantiates known template class by template_id]
[Sidecar loads optimized state from state_blob JSON]
[Sidecar executes module with params]
[LLM calls routed back through IPC -> Rust -> ProviderRegistry]
|
v
[Return structured result via IPC]
|
v
Worker receives result

Sidecar service modes:

  • Execute: {type: "execute", template_id, state_blob, params} -> instantiate template, load state, run, return result
  • Optimize: {type: "optimize", template_id, training_data} -> run DSPy compile(), return new state_blob JSON
  • Manifest: {type: "manifest"} -> return list of available templates with their signatures

Skill activation is split into two phases to minimize latency and token cost on every message.

Phase 1: Deterministic Metadata Match (Context Pipeline)

Section titled “Phase 1: Deterministic Metadata Match (Context Pipeline)”

The Context Pipeline performs embedding/keyword match against the user message and the skill catalog — a lightweight index of skill names, descriptions, tags, intent patterns, and trigger phrases. Full artifact content is NOT loaded.

The pipeline returns ranked skill recommendations as part of the ContextPackage. This phase is deterministic (no LLM call) and completes in under 100ms.

Input: user_message + skill_catalog_metadata (embedding/keyword index)
Output: SkillRecommendation[] (ranked by relevance score)

Phase 2: Artifact Selection (Orchestrator)

Section titled “Phase 2: Artifact Selection (Orchestrator)”

The Orchestrator receives the pipeline’s recommendations in its ContextPackage. It decides whether to activate a skill and, if so, which artifacts to load.

Artifact loading is lazy: only the artifacts needed for the current execution step are loaded from agents.db. A multi-artifact skill may load its description first, then load prompt_template or code_template or dspy_module artifacts one at a time during execution.

Input: ContextPackage (with skill_recommendations) + conversation history
Output: Decision (direct response | activate skill S with artifacts [A1, A2, ...] | research | clarify)

There is no ExecutionMode enum. The skill system dispatches to the correct handler based on each artifact’s kind field. A single skill execution may invoke multiple handlers in sequence (e.g., load description, render prompt_template, execute code_template).

Artifact KindHandlerExecutorOutput
descriptionContext injectionN/AInjected into Orchestrator/Worker context
prompt_templatePromptEngine::render_str()LLM providerLLM response text
code_templateTemplate render + RlmExecutor::execute()Wasmtime (CPython)Structured result from inklings.submit()
dspy_modulerun_skill host functionPython sidecarStructured result from inklings.submit()

When a skill contains a dspy_module artifact, the dispatch logic checks whether the DSPy daemon is available:

  1. Daemon available: Execute dspy_module artifact directly.
  2. Daemon unavailable: Look up fallback_artifact from the artifact’s metadata. Execute the referenced prompt_template artifact instead.
  3. No fallback specified: Return an error indicating the skill requires the Python sidecar.

This ensures skills work across all deployment environments, with DSPy-optimized execution where available and prompt-based fallback where not.

#[async_trait]
pub trait ArtifactHandler: Send + Sync {
/// The artifact kind this handler processes.
fn kind(&self) -> ArtifactKind;
/// Execute the artifact with the given context.
/// Returns the handler's output (LLM response, RLM result, etc.).
async fn execute(
&self,
artifact: &SkillArtifact,
context: &ExecutionContext,
) -> Result<ArtifactOutput, SkillError>;
}
pub struct ArtifactDispatcher {
handlers: HashMap<ArtifactKind, Arc<dyn ArtifactHandler>>,
}
impl ArtifactDispatcher {
/// Dispatch an artifact to its registered handler.
/// For dspy_module, checks daemon availability and falls back if needed.
pub async fn dispatch(
&self,
artifact: &SkillArtifact,
context: &ExecutionContext,
) -> Result<ArtifactOutput, SkillError> {
let handler = self.handlers.get(&artifact.kind)
.ok_or(SkillError::NoHandler(artifact.kind))?;
handler.execute(artifact, context).await
}
}

Skills are stored in agents.db at two levels, mirroring the memory architecture:

ScopeDatabaseContains
Account~/.inklings/agents.dbSystem skills, community skills, user account-level skills
Workspace{workspace}/agents.dbWorkspace-specific user skills, workspace overrides

Both databases use the identical skills + skill_artifacts schema.

When the Context Pipeline queries the skill catalog or the Orchestrator loads artifacts, skills are resolved in this order:

  1. Workspace agents.db — workspace-specific skills take highest priority.
  2. Account agents.db — account-level skills (system, community, user).

If a skill with the same name exists at both levels, the workspace version wins. This allows users to fork a system skill into their workspace for customization without affecting other workspaces.

#[async_trait]
pub trait SkillStorageRepository: Send + Sync {
/// List all skills visible at this scope (metadata only, no artifacts).
async fn list_skills(&self) -> Result<Vec<SkillSummary>, SkillStorageError>;
/// Get a skill by ID, including all its artifacts.
async fn get_skill(&self, skill_id: &str) -> Result<Option<Skill>, SkillStorageError>;
/// Get a skill by name (for resolution-order lookups).
async fn get_skill_by_name(&self, name: &str) -> Result<Option<Skill>, SkillStorageError>;
/// Get specific artifacts for a skill, filtered by kind.
async fn get_artifacts(
&self,
skill_id: &str,
kinds: &[ArtifactKind],
) -> Result<Vec<SkillArtifact>, SkillStorageError>;
/// Get a specific artifact by ID.
async fn get_artifact(&self, artifact_id: &str) -> Result<Option<SkillArtifact>, SkillStorageError>;
/// Create a new skill (without artifacts).
async fn create_skill(&self, skill: &Skill) -> Result<(), SkillStorageError>;
/// Update skill metadata.
async fn update_skill(&self, skill: &Skill) -> Result<(), SkillStorageError>;
/// Delete a skill and all its artifacts (CASCADE).
async fn delete_skill(&self, skill_id: &str) -> Result<(), SkillStorageError>;
/// Add an artifact to a skill.
async fn add_artifact(&self, artifact: &SkillArtifact) -> Result<(), SkillStorageError>;
/// Update an artifact.
async fn update_artifact(&self, artifact: &SkillArtifact) -> Result<(), SkillStorageError>;
/// Delete an artifact.
async fn delete_artifact(&self, artifact_id: &str) -> Result<(), SkillStorageError>;
}

The SkillCatalog aggregates both scopes and presents a unified view:

pub struct SkillCatalog {
workspace_repo: Arc<dyn SkillStorageRepository>,
account_repo: Arc<dyn SkillStorageRepository>,
}
impl SkillCatalog {
/// Build the merged catalog with workspace-wins resolution.
pub async fn list_all(&self) -> Result<Vec<SkillSummary>, SkillStorageError> {
let mut workspace_skills = self.workspace_repo.list_skills().await?;
let account_skills = self.account_repo.list_skills().await?;
let workspace_names: HashSet<_> = workspace_skills.iter()
.map(|s| s.name.clone())
.collect();
// Add account skills that don't have a workspace override.
for skill in account_skills {
if !workspace_names.contains(&skill.name) {
workspace_skills.push(skill);
}
}
Ok(workspace_skills)
}
}

System skills do not ship as compiled-in strings. They ship as seed data in the initial agents.db schema migration and are refreshed from the cloud when connected. This avoids binary size bloat and enables skill updates without app releases.

ChannelMechanismOffline Behavior
System skillsSeeded in agents.db migration + cloud refreshAvailable from seed data
Community skillsMarketplace catalog (cloud)Cached locally after first download
User skillsLocal only (created by Skill Composer or manual import)Always available

Community and system skills sync from the marketplace using ETags for efficient version checking:

  1. On app launch (or periodic background check), query marketplace for updated skills.
  2. Compare sync_etag with server response.
  3. If changed, download updated skill package and upsert into account-level agents.db.
  4. Workspace-level overrides are never modified by marketplace sync.

Skill sync between devices uses the existing Supabase sync infrastructure. Only the workspace owner syncs skills to the cloud — this prevents skill duplication across collaborators. Non-owners receive system and community skills through the marketplace channel.

The PromptEngine is simplified from a struct-heavy rendering pipeline to a single render_str function that accepts flexible context.

pub struct PromptEngine {
engine: minijinja::Environment<'static>,
}
impl PromptEngine {
/// Render a template string with the given context values.
/// Context is a flat key-value map -- the caller assembles it from whatever
/// sources are relevant (user query, page content, memory, etc.).
pub fn render_str(
&self,
template: &str,
context: &HashMap<String, serde_json::Value>,
) -> Result<String, PromptEngineError>;
}

Key change: There is no rigid PromptContext struct. The caller assembles a HashMap<String, Value> from whatever context is relevant to the current execution step. This eliminates the need to define and maintain a struct that anticipates every possible context shape.

When a prompt_template artifact has multiple rows with different model_variant values, the PromptEngine selects the best match:

  1. Exact match on model family (e.g., "anthropic" for Claude models).
  2. Fall back to model_variant = NULL (the default variant).
  3. If no default exists, use the first variant by ordinal.

This enables per-model prompt optimization without branching logic in the skill definition.

Every artifact execution is recorded as a trace entry for debugging, cost tracking, and optimization feedback.

CREATE TABLE skill_execution_traces (
id TEXT PRIMARY KEY, -- UUID
session_id TEXT NOT NULL, -- Conversation session ID
skill_id TEXT NOT NULL, -- Which skill was activated
artifact_id TEXT NOT NULL, -- Which specific artifact was executed
artifact_kind TEXT NOT NULL, -- Artifact kind for fast filtering
-- Execution details
rendered_input TEXT, -- Rendered template (after variable substitution)
raw_output TEXT, -- Raw LLM response or RLM result
error TEXT, -- NULL on success; error message on failure
-- Timing
started_at TEXT NOT NULL,
completed_at TEXT,
duration_ms INTEGER,
-- Cost
input_tokens INTEGER,
output_tokens INTEGER,
model_id TEXT, -- Which model was used
estimated_cost REAL, -- Estimated cost in USD
-- Assertions
assertions_run INTEGER DEFAULT 0,
assertions_passed INTEGER DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_traces_session_id ON skill_execution_traces(session_id);
CREATE INDEX idx_traces_skill_id ON skill_execution_traces(skill_id);
CREATE INDEX idx_traces_artifact_id ON skill_execution_traces(artifact_id);

Each artifact execution produces its own trace row. A single skill activation that executes 3 artifacts (description + prompt_template + code_template) produces 3 trace rows, all sharing the same session_id and skill_id but with different artifact_id values.

This enables:

  • Per-artifact cost tracking — which artifacts consume the most tokens?
  • Per-artifact timing — which artifacts are bottlenecks?
  • Per-artifact assertion results — which artifacts fail validation?
  • DSPy optimization feedback — traces feed back into offline prompt optimization.

Assertions validate artifact outputs at execution time. They are defined per-artifact and checked after each artifact execution.

TypeCheckExample
StructuralJSON schema, required fields, format constraints”Output must contain a contradictions array”
SemanticLLM-evaluated quality checks”Output must reference specific page names from the workspace”
LengthToken/character bounds”Output must be between 100 and 2000 characters”
PatternRegex match/no-match”Output must not contain markdown code fences”

Assertions are stored in the artifact’s metadata JSON field:

{
"assertions": [
{
"type": "structural",
"check": "json_schema",
"schema": {
"type": "object",
"required": ["contradictions"],
"properties": {
"contradictions": { "type": "array" }
}
}
},
{
"type": "length",
"check": "char_range",
"min": 100,
"max": 5000
},
{
"type": "semantic",
"check": "llm_eval",
"prompt": "Does this output reference specific page names from the workspace? Answer yes or no.",
"expected": "yes"
}
]
}
#[async_trait]
pub trait Assertion: Send + Sync {
/// Human-readable description of what this assertion checks.
fn description(&self) -> &str;
/// Evaluate the assertion against an artifact output.
/// Returns Ok(()) on pass, Err with explanation on failure.
async fn evaluate(
&self,
output: &ArtifactOutput,
context: &ExecutionContext,
) -> Result<(), AssertionFailure>;
}
pub struct AssertionFailure {
pub assertion_description: String,
pub explanation: String,
/// Whether this is a hard failure (abort) or soft failure (warn + continue).
pub severity: AssertionSeverity,
}
#[derive(Debug, Clone, Copy)]
pub enum AssertionSeverity {
/// Abort skill execution on failure.
Hard,
/// Log warning but continue execution.
Soft,
}

The assertion framework produces feedback compatible with DSPy’s offline optimization pipeline:

  • Each assertion evaluation produces a binary pass/fail signal.
  • Trace rows record assertions_run and assertions_passed counts.
  • Exported traces can be formatted as DSPy evaluation datasets for prompt optimization.

The Skill Composer agent (see Process Model) is the primary interface for creating and modifying skills.

  1. User initiates: “Create a skill that checks timeline consistency.”
  2. Orchestrator dispatches Skill Composer with the creation request.
  3. Skill Composer generates an initial skill package:
    • description artifact (what the skill does)
    • prompt_template artifact (the main analysis prompt)
    • Optional code_template artifact (batch analysis script)
  4. Skill Composer validates the package (schema compliance, template syntax).
  5. Orchestrator presents the proposed skill to the user.
  6. User provides feedback (“add a dspy_module for optimized execution”).
  7. Orchestrator routes back to Skill Composer for iterative refinement.
  8. Skill Composer adds/modifies artifacts, re-validates.
  9. Repeat until the user approves.
  10. Skill is saved to the appropriate scope (workspace or account agents.db).

When the Python sidecar is available, the Skill Composer can optimize prompt_template artifacts using DSPy:

  1. Skill Composer generates a dspy_module artifact wrapping the prompt logic.
  2. DSPy compiles the module against assertion-based evaluation metrics.
  3. Optimized state is stored in the state_blob column.
  4. The prompt_template artifact remains as fallback.

Users can also create skills by importing JSON packages directly (e.g., shared by another user or exported from a different workspace). The import path validates the package schema before persisting.

Was this page helpful?