Agent

Agent Core System

Crate: crates/infrastructure/agent-core/

Overview

The agent core system provides the execution environment, process model, and memory architecture for AI agents in Inklings. It is built around three core principles:

Context as explorable data, not consumed tokens — Inspired by Recursive Language Models (RLMs), agent knowledge lives in queryable external stores rather than in the LLM context window. The agent retrieves what it needs on demand rather than carrying everything as prompt tokens.
The workspace is the knowledge base — The PKM workspace itself (pages, blocks, tags, references, CRDT history) is the richest knowledge source. Agent memory supplements but never duplicates it.
Context compression at every layer — No agent type holds raw workspace content in its conversation history. Raw data flows through specialist agents and returns as structured summaries. The Orchestrator’s context contains conversation + decisions, never page bodies or search result dumps.

Overview Diagram

Hub-and-spoke model: Only the Orchestrator communicates with task and specialist agents. Workers and Researchers never interact with specialists directly. This keeps the communication graph simple and the Orchestrator as the single coordination point.

Context Pipeline: Every user message triggers the Context Pipeline (deterministic infrastructure, not an agent) which performs skill search + memory retrieval + workspace metadata assembly (~100ms, 0 tokens). A Refinement Gate (single Cheap LLM call, ~500ms) accepts or refines the selection before the Orchestrator decides on an action.

Process Model: The Team Metaphor

The agent process model uses four agent types plus the Context Pipeline (infrastructure), organized around a team metaphor. Each type has a distinct role, model class, tool filter, and spawn permission. See Process Model for the full specification including Rust type definitions, configuration defaults, and migration checklist.

Process Type	Role	Model Class	User Perception
Orchestrator	User-facing coordinator, delegates work	Frontier	”The agent I’m talking to”
Researcher	Read-only investigation, structured findings	Frontier	”Someone went to look into that”
Worker	Task execution with scoped writes	Fast	”Someone went to do that task”
Skill Composer	Skill authoring and refinement	Frontier	”It’s helping me build a skill”

Context Pipeline (infrastructure, not an agent type):

Component	Role	Model Class
Context Pipeline	Skill search + memory retrieval + workspace metadata	None (deterministic)
Refinement Gate	Accept/refine context selection	Cheap

Orchestrator

The user’s primary interface. Receives messages, orchestrates task agents and the Context Pipeline, and synthesizes results. Never blocks on execution — heavy work is delegated immediately. Only type that can spawn sub-processes. Hub-and-spoke coordinator.

Researcher

Isolated, read-only investigation process. Gathers information, analyzes structure, searches history. Reports structured findings back to the Orchestrator. Stores intermediate results in the session scratchpad for the Orchestrator to query rather than dumping full results into the conversation context.

Worker

Focused task execution with scoped write access. Creates pages, reorganizes subtrees, applies edits, runs skills. Integrates with the RLM executor for code_template artifact execution. Can be fire-and-forget or interactive.

Skill Composer

Invoked for skill creation and modification workflows. Generates multi-artifact skill packages through iterative refinement with the user. Uses a frontier model for creative prompt engineering.

Context Pipeline (Infrastructure)

The Context Pipeline is deterministic infrastructure, not an agent type. It handles context assembly as a mechanical pipeline:

Skill search — match user intent against skill catalog metadata
Memory retrieval — query 4-tier memory hierarchy with channel-scoped filtering
Workspace metadata — gather relevant workspace context (recent activity, active page, etc.)

The pipeline sees the index (metadata, embeddings, tags); the Researcher reads the documents (full page content). Three verbs, three owners: select (pipeline), execute (Worker), investigate (Researcher).

A Refinement Gate (single Cheap LLM call) follows the pipeline to accept or refine the assembled context before it reaches the Orchestrator.

Multi-Provider Routing

Supported providers: Anthropic (Claude), OpenAI (GPT), xAI (Grok), OpenRouter (100+ models via single API key or OAuth PKCE), Ollama (local, keyless).

Process Type	Default Model Class	Rationale
Orchestrator	Frontier	User-facing quality matters most
Researcher	Frontier	Analysis quality drives research value
Worker	Fast	Throughput over polish
Skill Composer	Frontier	Creative skill authoring requires top-tier reasoning
Refinement Gate	Cheap	Single accept/refine decision per message
Consolidation	Cheap	Background memory management at scale

OpenRouter enables access to frontier models from multiple providers (Anthropic, OpenAI, Google, Meta, etc.) without separate API keys for each. A single OpenRouter key or OAuth connection covers the full model catalog.

Memory Architecture

The agent memory system uses a 4-tier hierarchy (Conversation -> Channel -> Workspace -> Account) with per-tier configurable decay rates via DecayConfig. Full design documented in Agent Memory System.

Key characteristics:

Source provenance — source: String field records which agent type or actor produced each memory (replaces the former ownership concept)
Per-tier decay — DecayConfig with configurable rates per scope; DecayCalculator applies tier-aware decay
Channel-scoped retrieval — channel_id parameter enables topical isolation in search queries
Embedding backfill — EmbeddingBackfillTask as a ScheduledTask (30-min interval, batch of 50)
Two-layer dedup — RRF cosine threshold (0.9) + text-similarity fallback for FTS-only mode

Storage Architecture

Physical storage uses agents.db at both the account and workspace level:

{tauri_data_dir}/
+-- agents.db                     # Account-scoped agent memory + skills
|   +-- memories                  # Account-tier memories (preferences, behaviors)
|   +-- skills                    # System + community + user account-level skills
|   +-- skill_artifacts           # Artifacts for account-level skills
|   +-- skill_execution_traces    # Execution trace log
|
+-- workspaces/
    +-- {workspace}/
        +-- inklings.db           # Workspace data (existing)
        +-- agents.db             # Workspace-scoped agent memory + skills
            +-- memories          # All workspace/channel/conversation memories
            +-- channels          # Channel definitions + metadata
            +-- conversations     # Conversation records with channel assignment
            +-- skills            # Workspace-specific skills (override account)
            +-- skill_artifacts   # Artifacts for workspace-level skills
            +-- skill_execution_traces  # Per-workspace execution traces
            +-- scheduled_activities    # Scheduled background activities

Workspace deletion cleanly removes all workspace-scoped agent memory without orphan cleanup. Mirrors the existing SQLite-per-workspace pattern.

Session Continuity

Session continuity follows the Orient -> Work -> Persist lifecycle. The Context Pipeline handles Orient (deterministic context assembly + Refinement Gate); the Orchestrator drives the Work phase; mechanical extraction handles Persist. See Agent Memory System — Session Lifecycle for full details.

Three complementary mechanisms maintain context across session boundaries:

1. Context Externalization (Primary)

Accumulated knowledge lives in the 4-tier memory hierarchy, queryable on demand. The agent retrieves relevant prior context via the retrieval pipeline rather than carrying a compressed summary in the context window.

2. Session Orientation Document (Secondary)

At conversation start, the Context Pipeline produces an OrientationDocument — structured markdown loaded into the system prompt with relevant memories, recent activity, and workspace context. When a session ends, mechanical extraction reviews the conversation summary for final observation extraction (Persist phase). Background consolidation runs as a best-effort scheduled task.

3. Session Serialization (Suspend/Resume)

Full session state is serialized at breakpoints between LLM calls. Covers explicit pause/resume and app restart. The orientation document covers cases where serialized state is unavailable.

Consolidation

Consolidation is a background scheduled task with best-effort catch-up — missed schedules are deduped and run on next startup. No daemon or service worker; paired with the “run in background” app setting.

Pipeline:

Score — Calculate relevance for all short-term memories using the per-tier decay formula
Promote — Move short-term memories to long-term if relevance > 0.7 and access_count > 3
Prune — Remove memories with relevance below 0.01
Dedup — Merge memories with embedding cosine similarity > 0.95
Cap enforcement — Enforce 10,000 memories per scope

RLM Execution Environment

The agent harness includes an embedded RLM executor for workspace-scale analysis tasks.

Runtime: CPython in Wasmtime

Property	Value
Binary size impact	~15-20 MB (CPython.wasm + Wasmtime runtime)
Cold start	~50-200 ms (hidden by pre-warming during LLM inference wait)
Sandboxing	WASM process boundary — fuel metering, memory limits, epoch interruption, no I/O
Language	Python 3.12+ (full stdlib available, selectively exposed via `with_module()`)
Security model	Process-level isolation — language-level escapes are irrelevant

Consumer API

let vm = RlmExecutor::new()
    .with_module("json", stdlib::json)
    .with_module("math", stdlib::math)
    .with_module("collections", stdlib::collections)
    .with_module("inklings", host_functions::make_module)
    .memory_limit(64 * MB)
    .fuel_limit(1_000_000)
    .build()?;

let result = vm.execute(script).await?;

When RLM Activates

Most agent interactions use normal tool calls against the MCP server. The RLM activates for workspace-scale operations and for code_template artifact execution:

“Analyze all pages for consistency issues”
“Find all references to this concept across the workspace”
“How has my thesis evolved across these 50 pages?”
“Check for contradictions between these character descriptions”

The Worker agent manages RLM lifecycle: pre-warms instances during LLM inference latency, executes scripts, and collects results.

Skill System

Skills are multi-artifact packages that bundle descriptions, prompt templates, code templates, and examples into reusable agent capabilities. The previous ExecutionMode enum (FreeForm/Templated/Blueprint) is replaced by artifact-kind dispatch — each artifact declares its kind, and the system routes to the appropriate handler. See Skill System for the full specification.

Key Concepts

Artifact kinds: description, approach, prompt_template, code_template, example
Two-phase activation: Phase 1 (Context Pipeline) matches skill metadata cheaply. Phase 2 (Orchestrator) selects and loads specific artifacts. Full content is never loaded until needed.
Dual-scope storage: Skills live in agents.db at both account and workspace levels. Workspace skills override account skills by name.
Marketplace distribution: System skills ship as seed data and refresh from cloud. Community skills download from marketplace. User skills are local-only.
Execution traces: Per-artifact execution recording for cost tracking, timing, and DSPy optimization feedback.
Assertion framework: Per-artifact structural and semantic validation.

Storage Model

Skill Type	Storage	Editable	Gating
System skills (Inklings-provided)	Seeded in agents.db + cloud-refreshed	Viewable, not editable	Subscription tier
Community skills (marketplace)	Cloud catalog, cached locally	Fork and edit	Account required
User skills	Local (workspace or account level)	Full control	None

All external skills are cached locally in agents.db for offline use.

Process Model — Agent process model specification
Skill System — Multi-artifact skill package specification
Scheduling System — Autonomous background activities
Agent Memory System — 4-tier hierarchy, decay model, channels, consolidation
LLM System — Multi-provider abstraction and routing
MCP System — In-process MCP server

Research References

Zhang, Kraska, Khattab. “Recursive Language Models.” arXiv 2512.24601 (2025).
Letta (formerly MemGPT) — Tiered memory architecture with cognitive triage
LangMem SDK — LLM-driven memory consolidation with semantic/episodic/procedural typing
Park et al. “Generative Agents” (2023) — Exponential decay + importance + relevance retrieval
DSPy — Prompt optimization and assertion-guided validation
Wasmtime — WebAssembly runtime for RLM sandbox
RustPython — API design inspiration for RLM executor module composition

Previous
Write Path Next
Agent Memory System

Was this page helpful?