Skip to content
Documentation GitHub
Agent

Agent Core System

Crate: crates/infrastructure/agent-core/


The agent core system provides the execution environment, process model, and memory architecture for AI agents in Inklings. It is built around three core principles:

  1. Context as explorable data, not consumed tokens — Inspired by Recursive Language Models (RLMs), agent knowledge lives in queryable external stores rather than in the LLM context window. The agent retrieves what it needs on demand rather than carrying everything as prompt tokens.

  2. The workspace is the knowledge base — The PKM workspace itself (pages, blocks, tags, references, CRDT history) is the richest knowledge source. Agent memory supplements but never duplicates it.

  3. Context compression at every layer — No agent type holds raw workspace content in its conversation history. Raw data flows through specialist agents and returns as structured summaries. The Orchestrator’s context contains conversation + decisions, never page bodies or search result dumps.

Hub-and-spoke model: Only the Orchestrator communicates with task and specialist agents. Workers and Researchers never interact with specialists directly. This keeps the communication graph simple and the Orchestrator as the single coordination point.

Context Pipeline: Every user message triggers the Context Pipeline (deterministic infrastructure, not an agent) which performs skill search + memory retrieval + workspace metadata assembly (~100ms, 0 tokens). A Refinement Gate (single Cheap LLM call, ~500ms) accepts or refines the selection before the Orchestrator decides on an action.

The agent process model uses four agent types plus the Context Pipeline (infrastructure), organized around a team metaphor. Each type has a distinct role, model class, tool filter, and spawn permission. See Process Model for the full specification including Rust type definitions, configuration defaults, and migration checklist.

Process TypeRoleModel ClassUser Perception
OrchestratorUser-facing coordinator, delegates workFrontier”The agent I’m talking to”
ResearcherRead-only investigation, structured findingsFrontier”Someone went to look into that”
WorkerTask execution with scoped writesFast”Someone went to do that task”
Skill ComposerSkill authoring and refinementFrontier”It’s helping me build a skill”

Context Pipeline (infrastructure, not an agent type):

ComponentRoleModel Class
Context PipelineSkill search + memory retrieval + workspace metadataNone (deterministic)
Refinement GateAccept/refine context selectionCheap

The user’s primary interface. Receives messages, orchestrates task agents and the Context Pipeline, and synthesizes results. Never blocks on execution — heavy work is delegated immediately. Only type that can spawn sub-processes. Hub-and-spoke coordinator.

Isolated, read-only investigation process. Gathers information, analyzes structure, searches history. Reports structured findings back to the Orchestrator. Stores intermediate results in the session scratchpad for the Orchestrator to query rather than dumping full results into the conversation context.

Focused task execution with scoped write access. Creates pages, reorganizes subtrees, applies edits, runs skills. Integrates with the RLM executor for code_template artifact execution. Can be fire-and-forget or interactive.

Invoked for skill creation and modification workflows. Generates multi-artifact skill packages through iterative refinement with the user. Uses a frontier model for creative prompt engineering.

The Context Pipeline is deterministic infrastructure, not an agent type. It handles context assembly as a mechanical pipeline:

  1. Skill search — match user intent against skill catalog metadata
  2. Memory retrieval — query 4-tier memory hierarchy with channel-scoped filtering
  3. Workspace metadata — gather relevant workspace context (recent activity, active page, etc.)

The pipeline sees the index (metadata, embeddings, tags); the Researcher reads the documents (full page content). Three verbs, three owners: select (pipeline), execute (Worker), investigate (Researcher).

A Refinement Gate (single Cheap LLM call) follows the pipeline to accept or refine the assembled context before it reaches the Orchestrator.

Supported providers: Anthropic (Claude), OpenAI (GPT), xAI (Grok), OpenRouter (100+ models via single API key or OAuth PKCE), Ollama (local, keyless).

Process TypeDefault Model ClassRationale
OrchestratorFrontierUser-facing quality matters most
ResearcherFrontierAnalysis quality drives research value
WorkerFastThroughput over polish
Skill ComposerFrontierCreative skill authoring requires top-tier reasoning
Refinement GateCheapSingle accept/refine decision per message
ConsolidationCheapBackground memory management at scale

OpenRouter enables access to frontier models from multiple providers (Anthropic, OpenAI, Google, Meta, etc.) without separate API keys for each. A single OpenRouter key or OAuth connection covers the full model catalog.

The agent memory system uses a 4-tier hierarchy (Conversation -> Channel -> Workspace -> Account) with per-tier configurable decay rates via DecayConfig. Full design documented in Agent Memory System.

Key characteristics:

  • Source provenancesource: String field records which agent type or actor produced each memory (replaces the former ownership concept)
  • Per-tier decayDecayConfig with configurable rates per scope; DecayCalculator applies tier-aware decay
  • Channel-scoped retrievalchannel_id parameter enables topical isolation in search queries
  • Embedding backfillEmbeddingBackfillTask as a ScheduledTask (30-min interval, batch of 50)
  • Two-layer dedup — RRF cosine threshold (0.9) + text-similarity fallback for FTS-only mode

Physical storage uses agents.db at both the account and workspace level:

{tauri_data_dir}/
+-- agents.db # Account-scoped agent memory + skills
| +-- memories # Account-tier memories (preferences, behaviors)
| +-- skills # System + community + user account-level skills
| +-- skill_artifacts # Artifacts for account-level skills
| +-- skill_execution_traces # Execution trace log
|
+-- workspaces/
+-- {workspace}/
+-- inklings.db # Workspace data (existing)
+-- agents.db # Workspace-scoped agent memory + skills
+-- memories # All workspace/channel/conversation memories
+-- channels # Channel definitions + metadata
+-- conversations # Conversation records with channel assignment
+-- skills # Workspace-specific skills (override account)
+-- skill_artifacts # Artifacts for workspace-level skills
+-- skill_execution_traces # Per-workspace execution traces
+-- scheduled_activities # Scheduled background activities

Workspace deletion cleanly removes all workspace-scoped agent memory without orphan cleanup. Mirrors the existing SQLite-per-workspace pattern.

Session continuity follows the Orient -> Work -> Persist lifecycle. The Context Pipeline handles Orient (deterministic context assembly + Refinement Gate); the Orchestrator drives the Work phase; mechanical extraction handles Persist. See Agent Memory System — Session Lifecycle for full details.

Three complementary mechanisms maintain context across session boundaries:

Accumulated knowledge lives in the 4-tier memory hierarchy, queryable on demand. The agent retrieves relevant prior context via the retrieval pipeline rather than carrying a compressed summary in the context window.

2. Session Orientation Document (Secondary)

Section titled “2. Session Orientation Document (Secondary)”

At conversation start, the Context Pipeline produces an OrientationDocument — structured markdown loaded into the system prompt with relevant memories, recent activity, and workspace context. When a session ends, mechanical extraction reviews the conversation summary for final observation extraction (Persist phase). Background consolidation runs as a best-effort scheduled task.

Full session state is serialized at breakpoints between LLM calls. Covers explicit pause/resume and app restart. The orientation document covers cases where serialized state is unavailable.

Consolidation is a background scheduled task with best-effort catch-up — missed schedules are deduped and run on next startup. No daemon or service worker; paired with the “run in background” app setting.

Pipeline:

  1. Score — Calculate relevance for all short-term memories using the per-tier decay formula
  2. Promote — Move short-term memories to long-term if relevance > 0.7 and access_count > 3
  3. Prune — Remove memories with relevance below 0.01
  4. Dedup — Merge memories with embedding cosine similarity > 0.95
  5. Cap enforcement — Enforce 10,000 memories per scope

The agent harness includes an embedded RLM executor for workspace-scale analysis tasks.

PropertyValue
Binary size impact~15-20 MB (CPython.wasm + Wasmtime runtime)
Cold start~50-200 ms (hidden by pre-warming during LLM inference wait)
SandboxingWASM process boundary — fuel metering, memory limits, epoch interruption, no I/O
LanguagePython 3.12+ (full stdlib available, selectively exposed via with_module())
Security modelProcess-level isolation — language-level escapes are irrelevant
let vm = RlmExecutor::new()
.with_module("json", stdlib::json)
.with_module("math", stdlib::math)
.with_module("collections", stdlib::collections)
.with_module("inklings", host_functions::make_module)
.memory_limit(64 * MB)
.fuel_limit(1_000_000)
.build()?;
let result = vm.execute(script).await?;

Most agent interactions use normal tool calls against the MCP server. The RLM activates for workspace-scale operations and for code_template artifact execution:

  • “Analyze all pages for consistency issues”
  • “Find all references to this concept across the workspace”
  • “How has my thesis evolved across these 50 pages?”
  • “Check for contradictions between these character descriptions”

The Worker agent manages RLM lifecycle: pre-warms instances during LLM inference latency, executes scripts, and collects results.

Skills are multi-artifact packages that bundle descriptions, prompt templates, code templates, and examples into reusable agent capabilities. The previous ExecutionMode enum (FreeForm/Templated/Blueprint) is replaced by artifact-kind dispatch — each artifact declares its kind, and the system routes to the appropriate handler. See Skill System for the full specification.

  • Artifact kinds: description, approach, prompt_template, code_template, example
  • Two-phase activation: Phase 1 (Context Pipeline) matches skill metadata cheaply. Phase 2 (Orchestrator) selects and loads specific artifacts. Full content is never loaded until needed.
  • Dual-scope storage: Skills live in agents.db at both account and workspace levels. Workspace skills override account skills by name.
  • Marketplace distribution: System skills ship as seed data and refresh from cloud. Community skills download from marketplace. User skills are local-only.
  • Execution traces: Per-artifact execution recording for cost tracking, timing, and DSPy optimization feedback.
  • Assertion framework: Per-artifact structural and semantic validation.
Skill TypeStorageEditableGating
System skills (Inklings-provided)Seeded in agents.db + cloud-refreshedViewable, not editableSubscription tier
Community skills (marketplace)Cloud catalog, cached locallyFork and editAccount required
User skillsLocal (workspace or account level)Full controlNone

All external skills are cached locally in agents.db for offline use.

  • Zhang, Kraska, Khattab. “Recursive Language Models.” arXiv 2512.24601 (2025).
  • Letta (formerly MemGPT) — Tiered memory architecture with cognitive triage
  • LangMem SDK — LLM-driven memory consolidation with semantic/episodic/procedural typing
  • Park et al. “Generative Agents” (2023) — Exponential decay + importance + relevance retrieval
  • DSPy — Prompt optimization and assertion-guided validation
  • Wasmtime — WebAssembly runtime for RLM sandbox
  • RustPython — API design inspiration for RLM executor module composition

Was this page helpful?