Agent

Skill System

Crate: crates/infrastructure/agent-harness/

Overview

Skills are multi-artifact packages, not single-template prompts. A skill packages together multiple artifacts of different kinds — descriptions, prompt templates, code templates, DSPy modules — that collectively define a reusable agent capability. The system dispatches to different handlers based on artifact kind, replacing the previous ExecutionMode enum (FreeForm/Templated/Blueprint).

Design Principles

Skills are prompt content, not compiled code. A new skill is a new set of prompt documents, not a new crate version. Agent capabilities evolve independently of binary releases.
Artifact-kind dispatch replaces execution modes. Instead of a skill declaring itself as “FreeForm” or “Blueprint”, each artifact declares its kind. The handler is selected per-artifact.
Two-phase activation. Phase 1 (Context Pipeline) matches skill metadata cheaply via deterministic search. Phase 2 (Orchestrator) selects and loads specific artifacts. Full artifact content is never loaded until needed.
Dual-scope storage. Skills live in agents.db at both account and workspace levels. Workspace skills override account skills by name.
Dual execution contexts. code_template artifacts run in the WASM sandbox (CPython-in-Wasmtime). dspy_module artifacts are proxied to the Python sidecar (apps/python-sidecar/).
Multi-artifact graceful degradation. dspy_module skills must include a prompt_template fallback for environments where the sidecar is unavailable.
Template-only DSPy execution. dspy_module artifacts reference known shipped templates by ID and store serialized state (JSON) in state_blob — no user-supplied Python code enters the sidecar.

Skill Package Schema

Database Schema

Skills are stored across two tables in agents.db:

-- Skill packages
CREATE TABLE skills (
    id              TEXT PRIMARY KEY,          -- UUID
    name            TEXT NOT NULL UNIQUE,      -- Human-readable identifier
    description     TEXT NOT NULL,             -- Short description for catalog display
    version         TEXT NOT NULL DEFAULT '1.0.0',
    tags            TEXT,                      -- JSON array of string tags
    source          TEXT NOT NULL DEFAULT 'user',  -- 'system' | 'community' | 'user'
    author          TEXT,                      -- Display name of author
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),

    -- Metadata for pipeline matching (Phase 1)
    intent_patterns TEXT,                      -- JSON array of intent pattern strings
    trigger_phrases TEXT,                      -- JSON array of trigger phrase strings

    -- Sync and distribution
    marketplace_id  TEXT,                      -- NULL for local-only skills
    sync_etag       TEXT                       -- For marketplace version checking
);

CREATE INDEX idx_skills_source ON skills(source);
CREATE INDEX idx_skills_name ON skills(name);

-- Skill artifacts (one-to-many with skills)
CREATE TABLE skill_artifacts (
    id              TEXT PRIMARY KEY,          -- UUID
    skill_id        TEXT NOT NULL REFERENCES skills(id) ON DELETE CASCADE,
    kind            TEXT NOT NULL,             -- Artifact kind enum (see below)
    name            TEXT NOT NULL,             -- Human-readable artifact name
    ordinal         INTEGER NOT NULL DEFAULT 0, -- Display/execution ordering
    content         TEXT NOT NULL,             -- The artifact content (template, code, etc.)
    model_variant   TEXT,                      -- NULL = default; else model family key
    metadata        TEXT,                      -- JSON: kind-specific config
    state_blob      BLOB,                     -- Serialized DSPy module state (dspy_module only)
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),

    UNIQUE(skill_id, name, model_variant)
);

CREATE INDEX idx_skill_artifacts_skill_id ON skill_artifacts(skill_id);
CREATE INDEX idx_skill_artifacts_kind ON skill_artifacts(kind);

Artifact Kind Enum

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(snake_case)]
pub enum ArtifactKind {
    /// High-level description of the skill's purpose and approach.
    Description,
    /// MiniJinja template rendered with context variables before LLM call.
    PromptTemplate,
    /// Python code template executed in the RLM sandbox (WASM).
    CodeTemplate,
    /// Structured DSPy module executed in the Python sidecar.
    DspyModule,
    // -- Deferred kinds (schema supports, handlers not yet implemented) --
    // /// Strategy or methodology guidance (injected into system prompt).
    // Approach,
    // /// Concrete input/output examples for few-shot prompting.
    // Example,
}

V1 artifact kinds: description, prompt_template, code_template, dspy_module. The approach and example kinds are supported in the schema but their handlers are deferred.

Example Skill Package

“Consistency Checker” — Analyzes workspace content for contradictions.

Artifact	Kind	Purpose
Overview	`description`	Explains what the skill does and when to use it
Check Prompt	`prompt_template`	MiniJinja template for per-page-pair contradiction analysis
Cross-Reference Script	`code_template`	Python script that traverses workspace pages, extracts claims, and clusters them for LLM review
DSPy Checker	`dspy_module`	References shipped `chain_of_thought` template with optimized state for contradiction detection

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Consistency Checker",
  "description": "Analyzes workspace content for contradictions and inconsistencies across pages.",
  "version": "1.0.0",
  "tags": ["analysis", "worldbuilding", "quality"],
  "source": "system",
  "intent_patterns": ["check consistency", "find contradictions", "verify facts"],
  "trigger_phrases": ["are there any contradictions", "check for inconsistencies"],
  "artifacts": [
    {
      "kind": "description",
      "name": "Overview",
      "ordinal": 0,
      "content": "The Consistency Checker skill analyzes content across your workspace to identify contradictions, inconsistencies, and conflicting statements. It is most useful for worldbuilding projects with many interrelated entities (characters, locations, timelines)."
    },
    {
      "kind": "prompt_template",
      "name": "Check Prompt",
      "ordinal": 1,
      "content": "Given the following claims about \"{{ entity_name }}\":\n\n{% for claim in claims %}Claim {{ loop.index }} (from [[{{ claim.page_name }}]]): {{ claim.text }}\n{% endfor %}\n\nIdentify any contradictions between these claims. For each contradiction, cite the specific claims by number and explain the conflict."
    },
    {
      "kind": "code_template",
      "name": "Cross-Reference Script",
      "ordinal": 2,
      "content": "import inklings\n\npages = inklings.search('{{ entity_type }}')\nclaims = []\nfor page in pages:\n    content = inklings.get_page(page['slug'])\n    extracted = inklings.llm_query(f'Extract factual claims about {entity_type} entities from: {content}')\n    claims.extend(extracted)\ninklings.submit({'entity_type': '{{ entity_type }}', 'claims': claims})"
    },
    {
      "kind": "dspy_module",
      "name": "DSPy Checker",
      "ordinal": 3,
      "content": "{\"template_id\": \"chain_of_thought\", \"signature\": \"claims -> contradictions\"}",
      "state_blob": "<base64-encoded JSON from DSPy module.save(path, save_program=False)>",
      "metadata": {
        "fallback_artifact": "Check Prompt"
      }
    }
  ]
}

Artifact Kinds

`description`

High-level explanation of the skill’s purpose, target audience, and when it should be activated. Loaded during Phase 2 (Orchestrator review) to confirm skill relevance before loading heavier artifacts.

Property	Value
Handler	None (informational; injected into Orchestrator context)
Context cost	Low (~100-300 tokens)
When used	Always loaded when skill is activated

`prompt_template`

A MiniJinja template that is rendered with context variables (entity names, page content, user query, etc.) and sent as a user message to the LLM. This is the primary mechanism for structured LLM interactions.

Property	Value
Handler	`PromptEngine::render_str()` -> LLM call
Context cost	Variable (depends on template + rendered context)
When used	Per-step during skill execution

Model variants: A prompt_template artifact may have multiple rows with different model_variant values (e.g., "anthropic", "openai", "default"). The PromptEngine selects the best match for the active model. See PromptEngine Simplification.

`code_template`

A Python script template executed in the RLM sandbox (CPython-in-Wasmtime). Template variables are rendered before execution. The script has access to the inklings host module for workspace queries and llm_query for sub-LM calls.

Property	Value
Handler	Template render -> `RlmExecutor::execute()`
Execution context	WASM sandbox (Wasmtime)
Context cost	Zero (runs outside LLM context window)
When used	For workspace-scale analysis requiring iteration

Security: Code templates run inside the WASM sandbox with fuel metering, memory limits, and no filesystem/network access. The inklings Python module is the only bridge to workspace data.

Host functions available: workspace_search, get_page, get_pages, get_pages_by_type, get_tags, get_references, get_history, llm_query, llm_query_batched, checkpoint, submit. See Agent Core System — Host Functions for the complete list.

`dspy_module`

A structured DSPy module that defines an LLM program with declarative signatures, chain-of-thought reasoning, and optimizable prompts. Executed in the Python sidecar (apps/python-sidecar/), not the WASM sandbox, because DSPy requires full Python ecosystem access (native C extensions like pydantic-core, numpy).

Property	Value
Handler	`run_skill` host function -> DSPy sidecar
Execution context	Python sidecar (PyInstaller binary)
Context cost	Zero (runs outside LLM context window)
When used	For structured LLM programs with DSPy optimization

Template-only execution model: The dspy_module artifact does NOT contain Python source code. Instead, it references a known execution template by template_id and stores serialized DSPy state (JSON) in state_blob. The sidecar ships with a fixed set of DSPy module class definitions (e.g., Predict, ChainOfThought, ReAct, custom pipelines). At execution time, the sidecar instantiates the known class, calls .load() with the state blob, and executes.

This eliminates the untrusted code execution path entirely — no user-supplied Python code enters the sidecar runtime.

Artifact schema:

content: JSON containing template_id (references a shipped template) and execution parameters
state_blob: Serialized DSPy state (JSON from module.save(path, save_program=False)) — contains optimized few-shot demos, signature customizations, and LM settings
metadata.fallback_artifact: Points to a prompt_template artifact for graceful degradation

Execution architecture: The code_template or Orchestrator calls run_skill(skill_id, params) as a host function. This proxies execution to the Python sidecar process. The sidecar receives the template_id, state_blob, and parameters, instantiates the known template class, loads the optimized state, executes, and returns structured output.

LLM access: The sidecar uses InklingsLM, a custom DSPy LM subclass that routes all LLM completions through bidirectional IPC back to Rust, where the ProviderRegistry (Rig-based) handles the actual API call. Zero API keys exist in the sidecar environment.

Graceful degradation: Every dspy_module artifact must specify a fallback_artifact in its metadata field, pointing to a prompt_template artifact in the same skill package. If the sidecar is unavailable (not installed, failed to start), the system falls back to the prompt_template artifact transparently.

State management: DSPy modules persist optimized state in the state_blob column of skill_artifacts. The state is portable JSON produced by module.save(path, save_program=False) — containing optimized few-shot demos, signature definitions, and LM settings. Re-optimization is triggered by version upgrades (DSPy version bump in sidecar) via the Skill Composer.

Version upgrade: When the sidecar ships a new DSPy version, the app detects the version change and re-runs optimization for all skills with existing state_blob data via the Skill Composer. This is a background task on first launch after update.

DSPy sidecar process lifecycle:

App startup
    |
    v
[Sidecar not started - lazy]
    |
    v (first dspy_module execution or Skill Composer invocation)
[Start DSPy sidecar]
    |-- PyInstaller binary ships with app as Tauri externalBin
    |-- No setup required — self-contained
    |
    v
[Sidecar running - long-lived]
    |-- Two service modes:
    |   |-- Execute: template_id + state_blob + params -> result
    |   |-- Optimize: template_id + training_data -> state_blob (JSON)
    |-- All LLM calls routed through IPC -> Rust -> ProviderRegistry
    |-- Template manifest queryable via IPC
    |
    v (idle timeout or app shutdown)
[Sidecar stopped]

Python sidecar distribution:

The app ships a PyInstaller binary as a Tauri externalBin. This binary contains Python + DSPy + all dependencies (pydantic-core, litellm, numpy) in a single executable. No Python installation, venv setup, or package management required on the user’s machine.

Property	Value
Distribution	PyInstaller single-file executable (~70 MB per platform)
Tauri config	`externalBin` in `tauri.conf.json`
Dependencies	Python 3.12+ runtime, DSPy, pydantic, litellm, numpy
Setup required	None — self-contained binary
Update mechanism	Ships with app updates; version bump triggers re-optimize

run_skill host function:

The bridge between the WASM sandbox (or Orchestrator) and the Python sidecar:

Worker (in WASM or agent loop)
    |
    v
run_skill(skill_id, params)
    |
    v
[Load artifact: template_id + state_blob]
    |
    v
[IPC to DSPy sidecar process]
    |
    v
[Sidecar instantiates known template class by template_id]
[Sidecar loads optimized state from state_blob JSON]
[Sidecar executes module with params]
[LLM calls routed back through IPC -> Rust -> ProviderRegistry]
    |
    v
[Return structured result via IPC]
    |
    v
Worker receives result

Sidecar service modes:

Execute: {type: "execute", template_id, state_blob, params} -> instantiate template, load state, run, return result
Optimize: {type: "optimize", template_id, training_data} -> run DSPy compile(), return new state_blob JSON
Manifest: {type: "manifest"} -> return list of available templates with their signatures

Two-Phase Activation

Skill activation is split into two phases to minimize latency and token cost on every message.

Phase 1: Deterministic Metadata Match (Context Pipeline)

The Context Pipeline performs embedding/keyword match against the user message and the skill catalog — a lightweight index of skill names, descriptions, tags, intent patterns, and trigger phrases. Full artifact content is NOT loaded.

The pipeline returns ranked skill recommendations as part of the ContextPackage. This phase is deterministic (no LLM call) and completes in under 100ms.

Input:  user_message + skill_catalog_metadata (embedding/keyword index)
Output: SkillRecommendation[] (ranked by relevance score)

Phase 2: Artifact Selection (Orchestrator)

The Orchestrator receives the pipeline’s recommendations in its ContextPackage. It decides whether to activate a skill and, if so, which artifacts to load.

Artifact loading is lazy: only the artifacts needed for the current execution step are loaded from agents.db. A multi-artifact skill may load its description first, then load prompt_template or code_template or dspy_module artifacts one at a time during execution.

Input:  ContextPackage (with skill_recommendations) + conversation history
Output: Decision (direct response | activate skill S with artifacts [A1, A2, ...] | research | clarify)

Artifact-Kind Dispatch

There is no ExecutionMode enum. The skill system dispatches to the correct handler based on each artifact’s kind field. A single skill execution may invoke multiple handlers in sequence (e.g., load description, render prompt_template, execute code_template).

Kind-to-Handler Mapping

Artifact Kind	Handler	Executor	Output
`description`	Context injection	N/A	Injected into Orchestrator/Worker context
`prompt_template`	`PromptEngine::render_str()`	LLM provider	LLM response text
`code_template`	Template render + `RlmExecutor::execute()`	Wasmtime (CPython)	Structured result from `inklings.submit()`
`dspy_module`	`run_skill` host function	Python sidecar	Structured result from `inklings.submit()`

Graceful Degradation

When a skill contains a dspy_module artifact, the dispatch logic checks whether the DSPy daemon is available:

Daemon available: Execute dspy_module artifact directly.
Daemon unavailable: Look up fallback_artifact from the artifact’s metadata. Execute the referenced prompt_template artifact instead.
No fallback specified: Return an error indicating the skill requires the Python sidecar.

This ensures skills work across all deployment environments, with DSPy-optimized execution where available and prompt-based fallback where not.

Dispatch Trait

#[async_trait]
pub trait ArtifactHandler: Send + Sync {
    /// The artifact kind this handler processes.
    fn kind(&self) -> ArtifactKind;

    /// Execute the artifact with the given context.
    /// Returns the handler's output (LLM response, RLM result, etc.).
    async fn execute(
        &self,
        artifact: &SkillArtifact,
        context: &ExecutionContext,
    ) -> Result<ArtifactOutput, SkillError>;
}

pub struct ArtifactDispatcher {
    handlers: HashMap<ArtifactKind, Arc<dyn ArtifactHandler>>,
}

impl ArtifactDispatcher {
    /// Dispatch an artifact to its registered handler.
    /// For dspy_module, checks daemon availability and falls back if needed.
    pub async fn dispatch(
        &self,
        artifact: &SkillArtifact,
        context: &ExecutionContext,
    ) -> Result<ArtifactOutput, SkillError> {
        let handler = self.handlers.get(&artifact.kind)
            .ok_or(SkillError::NoHandler(artifact.kind))?;
        handler.execute(artifact, context).await
    }
}

Storage Architecture

Dual-Scope Storage

Skills are stored in agents.db at two levels, mirroring the memory architecture:

Scope	Database	Contains
Account	`~/.inklings/agents.db`	System skills, community skills, user account-level skills
Workspace	`{workspace}/agents.db`	Workspace-specific user skills, workspace overrides

Both databases use the identical skills + skill_artifacts schema.

Resolution Order

When the Context Pipeline queries the skill catalog or the Orchestrator loads artifacts, skills are resolved in this order:

Workspace agents.db — workspace-specific skills take highest priority.
Account agents.db — account-level skills (system, community, user).

If a skill with the same name exists at both levels, the workspace version wins. This allows users to fork a system skill into their workspace for customization without affecting other workspaces.

SkillStorageRepository Trait

#[async_trait]
pub trait SkillStorageRepository: Send + Sync {
    /// List all skills visible at this scope (metadata only, no artifacts).
    async fn list_skills(&self) -> Result<Vec<SkillSummary>, SkillStorageError>;

    /// Get a skill by ID, including all its artifacts.
    async fn get_skill(&self, skill_id: &str) -> Result<Option<Skill>, SkillStorageError>;

    /// Get a skill by name (for resolution-order lookups).
    async fn get_skill_by_name(&self, name: &str) -> Result<Option<Skill>, SkillStorageError>;

    /// Get specific artifacts for a skill, filtered by kind.
    async fn get_artifacts(
        &self,
        skill_id: &str,
        kinds: &[ArtifactKind],
    ) -> Result<Vec<SkillArtifact>, SkillStorageError>;

    /// Get a specific artifact by ID.
    async fn get_artifact(&self, artifact_id: &str) -> Result<Option<SkillArtifact>, SkillStorageError>;

    /// Create a new skill (without artifacts).
    async fn create_skill(&self, skill: &Skill) -> Result<(), SkillStorageError>;

    /// Update skill metadata.
    async fn update_skill(&self, skill: &Skill) -> Result<(), SkillStorageError>;

    /// Delete a skill and all its artifacts (CASCADE).
    async fn delete_skill(&self, skill_id: &str) -> Result<(), SkillStorageError>;

    /// Add an artifact to a skill.
    async fn add_artifact(&self, artifact: &SkillArtifact) -> Result<(), SkillStorageError>;

    /// Update an artifact.
    async fn update_artifact(&self, artifact: &SkillArtifact) -> Result<(), SkillStorageError>;

    /// Delete an artifact.
    async fn delete_artifact(&self, artifact_id: &str) -> Result<(), SkillStorageError>;
}

Resolved Skill Catalog

The SkillCatalog aggregates both scopes and presents a unified view:

pub struct SkillCatalog {
    workspace_repo: Arc<dyn SkillStorageRepository>,
    account_repo: Arc<dyn SkillStorageRepository>,
}

impl SkillCatalog {
    /// Build the merged catalog with workspace-wins resolution.
    pub async fn list_all(&self) -> Result<Vec<SkillSummary>, SkillStorageError> {
        let mut workspace_skills = self.workspace_repo.list_skills().await?;
        let account_skills = self.account_repo.list_skills().await?;

        let workspace_names: HashSet<_> = workspace_skills.iter()
            .map(|s| s.name.clone())
            .collect();

        // Add account skills that don't have a workspace override.
        for skill in account_skills {
            if !workspace_names.contains(&skill.name) {
                workspace_skills.push(skill);
            }
        }

        Ok(workspace_skills)
    }
}

Distribution and Sync

No `include_str!()`

System skills do not ship as compiled-in strings. They ship as seed data in the initial agents.db schema migration and are refreshed from the cloud when connected. This avoids binary size bloat and enables skill updates without app releases.

Distribution Channels

Channel	Mechanism	Offline Behavior
System skills	Seeded in `agents.db` migration + cloud refresh	Available from seed data
Community skills	Marketplace catalog (cloud)	Cached locally after first download
User skills	Local only (created by Skill Composer or manual import)	Always available

Marketplace Sync

Community and system skills sync from the marketplace using ETags for efficient version checking:

On app launch (or periodic background check), query marketplace for updated skills.
Compare sync_etag with server response.
If changed, download updated skill package and upsert into account-level agents.db.
Workspace-level overrides are never modified by marketplace sync.

Owner-Only Sync

Skill sync between devices uses the existing Supabase sync infrastructure. Only the workspace owner syncs skills to the cloud — this prevents skill duplication across collaborators. Non-owners receive system and community skills through the marketplace channel.

PromptEngine Simplification

The PromptEngine is simplified from a struct-heavy rendering pipeline to a single render_str function that accepts flexible context.

API

pub struct PromptEngine {
    engine: minijinja::Environment<'static>,
}

impl PromptEngine {
    /// Render a template string with the given context values.
    /// Context is a flat key-value map -- the caller assembles it from whatever
    /// sources are relevant (user query, page content, memory, etc.).
    pub fn render_str(
        &self,
        template: &str,
        context: &HashMap<String, serde_json::Value>,
    ) -> Result<String, PromptEngineError>;
}

Key change: There is no rigid PromptContext struct. The caller assembles a HashMap<String, Value> from whatever context is relevant to the current execution step. This eliminates the need to define and maintain a struct that anticipates every possible context shape.

Model-Variant Selection

When a prompt_template artifact has multiple rows with different model_variant values, the PromptEngine selects the best match:

Exact match on model family (e.g., "anthropic" for Claude models).
Fall back to model_variant = NULL (the default variant).
If no default exists, use the first variant by ordinal.

This enables per-model prompt optimization without branching logic in the skill definition.

Execution Traces

Every artifact execution is recorded as a trace entry for debugging, cost tracking, and optimization feedback.

Trace Schema

CREATE TABLE skill_execution_traces (
    id              TEXT PRIMARY KEY,          -- UUID
    session_id      TEXT NOT NULL,             -- Conversation session ID
    skill_id        TEXT NOT NULL,             -- Which skill was activated
    artifact_id     TEXT NOT NULL,             -- Which specific artifact was executed
    artifact_kind   TEXT NOT NULL,             -- Artifact kind for fast filtering

    -- Execution details
    rendered_input  TEXT,                      -- Rendered template (after variable substitution)
    raw_output      TEXT,                      -- Raw LLM response or RLM result
    error           TEXT,                      -- NULL on success; error message on failure

    -- Timing
    started_at      TEXT NOT NULL,
    completed_at    TEXT,
    duration_ms     INTEGER,

    -- Cost
    input_tokens    INTEGER,
    output_tokens   INTEGER,
    model_id        TEXT,                      -- Which model was used
    estimated_cost  REAL,                      -- Estimated cost in USD

    -- Assertions
    assertions_run  INTEGER DEFAULT 0,
    assertions_passed INTEGER DEFAULT 0,

    created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_traces_session_id ON skill_execution_traces(session_id);
CREATE INDEX idx_traces_skill_id ON skill_execution_traces(skill_id);
CREATE INDEX idx_traces_artifact_id ON skill_execution_traces(artifact_id);

Per-Artifact Granularity

Each artifact execution produces its own trace row. A single skill activation that executes 3 artifacts (description + prompt_template + code_template) produces 3 trace rows, all sharing the same session_id and skill_id but with different artifact_id values.

This enables:

Per-artifact cost tracking — which artifacts consume the most tokens?
Per-artifact timing — which artifacts are bottlenecks?
Per-artifact assertion results — which artifacts fail validation?
DSPy optimization feedback — traces feed back into offline prompt optimization.

Assertion Framework

Assertions validate artifact outputs at execution time. They are defined per-artifact and checked after each artifact execution.

Assertion Types

Type	Check	Example
Structural	JSON schema, required fields, format constraints	”Output must contain a `contradictions` array”
Semantic	LLM-evaluated quality checks	”Output must reference specific page names from the workspace”
Length	Token/character bounds	”Output must be between 100 and 2000 characters”
Pattern	Regex match/no-match	”Output must not contain markdown code fences”

Assertion Schema

Assertions are stored in the artifact’s metadata JSON field:

{
  "assertions": [
    {
      "type": "structural",
      "check": "json_schema",
      "schema": {
        "type": "object",
        "required": ["contradictions"],
        "properties": {
          "contradictions": { "type": "array" }
        }
      }
    },
    {
      "type": "length",
      "check": "char_range",
      "min": 100,
      "max": 5000
    },
    {
      "type": "semantic",
      "check": "llm_eval",
      "prompt": "Does this output reference specific page names from the workspace? Answer yes or no.",
      "expected": "yes"
    }
  ]
}

Assertion Trait

#[async_trait]
pub trait Assertion: Send + Sync {
    /// Human-readable description of what this assertion checks.
    fn description(&self) -> &str;

    /// Evaluate the assertion against an artifact output.
    /// Returns Ok(()) on pass, Err with explanation on failure.
    async fn evaluate(
        &self,
        output: &ArtifactOutput,
        context: &ExecutionContext,
    ) -> Result<(), AssertionFailure>;
}

pub struct AssertionFailure {
    pub assertion_description: String,
    pub explanation: String,
    /// Whether this is a hard failure (abort) or soft failure (warn + continue).
    pub severity: AssertionSeverity,
}

#[derive(Debug, Clone, Copy)]
pub enum AssertionSeverity {
    /// Abort skill execution on failure.
    Hard,
    /// Log warning but continue execution.
    Soft,
}

DSPy Compatibility

The assertion framework produces feedback compatible with DSPy’s offline optimization pipeline:

Each assertion evaluation produces a binary pass/fail signal.
Trace rows record assertions_run and assertions_passed counts.
Exported traces can be formatted as DSPy evaluation datasets for prompt optimization.

Skill Authoring

The Skill Composer agent (see Process Model) is the primary interface for creating and modifying skills.

Authoring Workflow

User initiates: “Create a skill that checks timeline consistency.”
Orchestrator dispatches Skill Composer with the creation request.
Skill Composer generates an initial skill package:
- description artifact (what the skill does)
- prompt_template artifact (the main analysis prompt)
- Optional code_template artifact (batch analysis script)
Skill Composer validates the package (schema compliance, template syntax).
Orchestrator presents the proposed skill to the user.
User provides feedback (“add a dspy_module for optimized execution”).
Orchestrator routes back to Skill Composer for iterative refinement.
Skill Composer adds/modifies artifacts, re-validates.
Repeat until the user approves.
Skill is saved to the appropriate scope (workspace or account agents.db).

DSPy Optimization (Skill Composer)

When the Python sidecar is available, the Skill Composer can optimize prompt_template artifacts using DSPy:

Skill Composer generates a dspy_module artifact wrapping the prompt logic.
DSPy compiles the module against assertion-based evaluation metrics.
Optimized state is stored in the state_blob column.
The prompt_template artifact remains as fallback.

Manual Import

Users can also create skills by importing JSON packages directly (e.g., shared by another user or exported from a different workspace). The import path validates the package schema before persisting.

Process Model — Agent types, especially Skill Composer and Worker
Agent Core System — Parent system document
Agent Memory System — Memory matrix and storage architecture
LLM System — Multi-provider abstraction (PromptEngine uses model routing)

Previous
Scheduling System Next
Attachment System

Was this page helpful?