Agent Core System
Crate: crates/infrastructure/agent-core/
Overview
Section titled “Overview”The agent core system provides the execution environment, process model, and memory architecture for AI agents in Inklings. It is built around three core principles:
-
Context as explorable data, not consumed tokens — Inspired by Recursive Language Models (RLMs), agent knowledge lives in queryable external stores rather than in the LLM context window. The agent retrieves what it needs on demand rather than carrying everything as prompt tokens.
-
The workspace is the knowledge base — The PKM workspace itself (pages, blocks, tags, references, CRDT history) is the richest knowledge source. Agent memory supplements but never duplicates it.
-
Context compression at every layer — No agent type holds raw workspace content in its conversation history. Raw data flows through specialist agents and returns as structured summaries. The Orchestrator’s context contains conversation + decisions, never page bodies or search result dumps.
Overview Diagram
Section titled “Overview Diagram”Hub-and-spoke model: Only the Orchestrator communicates with task and specialist agents. Workers and Researchers never interact with specialists directly. This keeps the communication graph simple and the Orchestrator as the single coordination point.
Context Pipeline: Every user message triggers the Context Pipeline (deterministic infrastructure, not an agent) which performs skill search + memory retrieval + workspace metadata assembly (~100ms, 0 tokens). A Refinement Gate (single Cheap LLM call, ~500ms) accepts or refines the selection before the Orchestrator decides on an action.
Process Model: The Team Metaphor
Section titled “Process Model: The Team Metaphor”The agent process model uses four agent types plus the Context Pipeline (infrastructure), organized around a team metaphor. Each type has a distinct role, model class, tool filter, and spawn permission. See Process Model for the full specification including Rust type definitions, configuration defaults, and migration checklist.
| Process Type | Role | Model Class | User Perception |
|---|---|---|---|
| Orchestrator | User-facing coordinator, delegates work | Frontier | ”The agent I’m talking to” |
| Researcher | Read-only investigation, structured findings | Frontier | ”Someone went to look into that” |
| Worker | Task execution with scoped writes | Fast | ”Someone went to do that task” |
| Skill Composer | Skill authoring and refinement | Frontier | ”It’s helping me build a skill” |
Context Pipeline (infrastructure, not an agent type):
| Component | Role | Model Class |
|---|---|---|
| Context Pipeline | Skill search + memory retrieval + workspace metadata | None (deterministic) |
| Refinement Gate | Accept/refine context selection | Cheap |
Orchestrator
Section titled “Orchestrator”The user’s primary interface. Receives messages, orchestrates task agents and the Context Pipeline, and synthesizes results. Never blocks on execution — heavy work is delegated immediately. Only type that can spawn sub-processes. Hub-and-spoke coordinator.
Researcher
Section titled “Researcher”Isolated, read-only investigation process. Gathers information, analyzes structure, searches history. Reports structured findings back to the Orchestrator. Stores intermediate results in the session scratchpad for the Orchestrator to query rather than dumping full results into the conversation context.
Worker
Section titled “Worker”Focused task execution with scoped write access. Creates pages, reorganizes subtrees, applies edits, runs skills.
Integrates with the RLM executor for code_template artifact execution. Can be fire-and-forget or interactive.
Skill Composer
Section titled “Skill Composer”Invoked for skill creation and modification workflows. Generates multi-artifact skill packages through iterative refinement with the user. Uses a frontier model for creative prompt engineering.
Context Pipeline (Infrastructure)
Section titled “Context Pipeline (Infrastructure)”The Context Pipeline is deterministic infrastructure, not an agent type. It handles context assembly as a mechanical pipeline:
- Skill search — match user intent against skill catalog metadata
- Memory retrieval — query 4-tier memory hierarchy with channel-scoped filtering
- Workspace metadata — gather relevant workspace context (recent activity, active page, etc.)
The pipeline sees the index (metadata, embeddings, tags); the Researcher reads the documents (full page content). Three verbs, three owners: select (pipeline), execute (Worker), investigate (Researcher).
A Refinement Gate (single Cheap LLM call) follows the pipeline to accept or refine the assembled context before it reaches the Orchestrator.
Multi-Provider Routing
Section titled “Multi-Provider Routing”Supported providers: Anthropic (Claude), OpenAI (GPT), xAI (Grok), OpenRouter (100+ models via single API key or OAuth PKCE), Ollama (local, keyless).
| Process Type | Default Model Class | Rationale |
|---|---|---|
| Orchestrator | Frontier | User-facing quality matters most |
| Researcher | Frontier | Analysis quality drives research value |
| Worker | Fast | Throughput over polish |
| Skill Composer | Frontier | Creative skill authoring requires top-tier reasoning |
| Refinement Gate | Cheap | Single accept/refine decision per message |
| Consolidation | Cheap | Background memory management at scale |
OpenRouter enables access to frontier models from multiple providers (Anthropic, OpenAI, Google, Meta, etc.) without separate API keys for each. A single OpenRouter key or OAuth connection covers the full model catalog.
Memory Architecture
Section titled “Memory Architecture”The agent memory system uses a 4-tier hierarchy (Conversation -> Channel -> Workspace -> Account) with per-tier
configurable decay rates via DecayConfig. Full design documented in Agent Memory System.
Key characteristics:
- Source provenance —
source: Stringfield records which agent type or actor produced each memory (replaces the former ownership concept) - Per-tier decay —
DecayConfigwith configurable rates per scope;DecayCalculatorapplies tier-aware decay - Channel-scoped retrieval —
channel_idparameter enables topical isolation in search queries - Embedding backfill —
EmbeddingBackfillTaskas aScheduledTask(30-min interval, batch of 50) - Two-layer dedup — RRF cosine threshold (0.9) + text-similarity fallback for FTS-only mode
Storage Architecture
Section titled “Storage Architecture”Physical storage uses agents.db at both the account and workspace level:
{tauri_data_dir}/+-- agents.db # Account-scoped agent memory + skills| +-- memories # Account-tier memories (preferences, behaviors)| +-- skills # System + community + user account-level skills| +-- skill_artifacts # Artifacts for account-level skills| +-- skill_execution_traces # Execution trace log|+-- workspaces/ +-- {workspace}/ +-- inklings.db # Workspace data (existing) +-- agents.db # Workspace-scoped agent memory + skills +-- memories # All workspace/channel/conversation memories +-- channels # Channel definitions + metadata +-- conversations # Conversation records with channel assignment +-- skills # Workspace-specific skills (override account) +-- skill_artifacts # Artifacts for workspace-level skills +-- skill_execution_traces # Per-workspace execution traces +-- scheduled_activities # Scheduled background activitiesWorkspace deletion cleanly removes all workspace-scoped agent memory without orphan cleanup. Mirrors the existing SQLite-per-workspace pattern.
Session Continuity
Section titled “Session Continuity”Session continuity follows the Orient -> Work -> Persist lifecycle. The Context Pipeline handles Orient (deterministic context assembly + Refinement Gate); the Orchestrator drives the Work phase; mechanical extraction handles Persist. See Agent Memory System — Session Lifecycle for full details.
Three complementary mechanisms maintain context across session boundaries:
1. Context Externalization (Primary)
Section titled “1. Context Externalization (Primary)”Accumulated knowledge lives in the 4-tier memory hierarchy, queryable on demand. The agent retrieves relevant prior context via the retrieval pipeline rather than carrying a compressed summary in the context window.
2. Session Orientation Document (Secondary)
Section titled “2. Session Orientation Document (Secondary)”At conversation start, the Context Pipeline produces an OrientationDocument — structured markdown loaded into the
system prompt with relevant memories, recent activity, and workspace context. When a session ends, mechanical extraction
reviews the conversation summary for final observation extraction (Persist phase). Background consolidation runs as a
best-effort scheduled task.
3. Session Serialization (Suspend/Resume)
Section titled “3. Session Serialization (Suspend/Resume)”Full session state is serialized at breakpoints between LLM calls. Covers explicit pause/resume and app restart. The orientation document covers cases where serialized state is unavailable.
Consolidation
Section titled “Consolidation”Consolidation is a background scheduled task with best-effort catch-up — missed schedules are deduped and run on next startup. No daemon or service worker; paired with the “run in background” app setting.
Pipeline:
- Score — Calculate relevance for all short-term memories using the per-tier decay formula
- Promote — Move short-term memories to long-term if relevance > 0.7 and access_count > 3
- Prune — Remove memories with relevance below 0.01
- Dedup — Merge memories with embedding cosine similarity > 0.95
- Cap enforcement — Enforce 10,000 memories per scope
RLM Execution Environment
Section titled “RLM Execution Environment”The agent harness includes an embedded RLM executor for workspace-scale analysis tasks.
Runtime: CPython in Wasmtime
Section titled “Runtime: CPython in Wasmtime”| Property | Value |
|---|---|
| Binary size impact | ~15-20 MB (CPython.wasm + Wasmtime runtime) |
| Cold start | ~50-200 ms (hidden by pre-warming during LLM inference wait) |
| Sandboxing | WASM process boundary — fuel metering, memory limits, epoch interruption, no I/O |
| Language | Python 3.12+ (full stdlib available, selectively exposed via with_module()) |
| Security model | Process-level isolation — language-level escapes are irrelevant |
Consumer API
Section titled “Consumer API”let vm = RlmExecutor::new() .with_module("json", stdlib::json) .with_module("math", stdlib::math) .with_module("collections", stdlib::collections) .with_module("inklings", host_functions::make_module) .memory_limit(64 * MB) .fuel_limit(1_000_000) .build()?;
let result = vm.execute(script).await?;When RLM Activates
Section titled “When RLM Activates”Most agent interactions use normal tool calls against the MCP server. The RLM activates for workspace-scale operations
and for code_template artifact execution:
- “Analyze all pages for consistency issues”
- “Find all references to this concept across the workspace”
- “How has my thesis evolved across these 50 pages?”
- “Check for contradictions between these character descriptions”
The Worker agent manages RLM lifecycle: pre-warms instances during LLM inference latency, executes scripts, and collects results.
Skill System
Section titled “Skill System”Skills are multi-artifact packages that bundle descriptions, prompt templates, code templates, and examples into
reusable agent capabilities. The previous ExecutionMode enum (FreeForm/Templated/Blueprint) is replaced by
artifact-kind dispatch — each artifact declares its kind, and the system routes to the appropriate handler. See
Skill System for the full specification.
Key Concepts
Section titled “Key Concepts”- Artifact kinds:
description,approach,prompt_template,code_template,example - Two-phase activation: Phase 1 (Context Pipeline) matches skill metadata cheaply. Phase 2 (Orchestrator) selects and loads specific artifacts. Full content is never loaded until needed.
- Dual-scope storage: Skills live in
agents.dbat both account and workspace levels. Workspace skills override account skills by name. - Marketplace distribution: System skills ship as seed data and refresh from cloud. Community skills download from marketplace. User skills are local-only.
- Execution traces: Per-artifact execution recording for cost tracking, timing, and DSPy optimization feedback.
- Assertion framework: Per-artifact structural and semantic validation.
Storage Model
Section titled “Storage Model”| Skill Type | Storage | Editable | Gating |
|---|---|---|---|
| System skills (Inklings-provided) | Seeded in agents.db + cloud-refreshed | Viewable, not editable | Subscription tier |
| Community skills (marketplace) | Cloud catalog, cached locally | Fork and edit | Account required |
| User skills | Local (workspace or account level) | Full control | None |
All external skills are cached locally in agents.db for offline use.
Related Documents
Section titled “Related Documents”- Process Model — Agent process model specification
- Skill System — Multi-artifact skill package specification
- Scheduling System — Autonomous background activities
- Agent Memory System — 4-tier hierarchy, decay model, channels, consolidation
- LLM System — Multi-provider abstraction and routing
- MCP System — In-process MCP server
Research References
Section titled “Research References”- Zhang, Kraska, Khattab. “Recursive Language Models.” arXiv 2512.24601 (2025).
- Letta (formerly MemGPT) — Tiered memory architecture with cognitive triage
- LangMem SDK — LLM-driven memory consolidation with semantic/episodic/procedural typing
- Park et al. “Generative Agents” (2023) — Exponential decay + importance + relevance retrieval
- DSPy — Prompt optimization and assertion-guided validation
- Wasmtime — WebAssembly runtime for RLM sandbox
- RustPython — API design inspiration for RLM executor module composition
Was this page helpful?
Thanks for your feedback!