Skip to content
Documentation GitHub
Platform

Task Runner System

Status: Implemented (event-driven + scheduled runtime live; World Agent task kinds scheduled for E7) Reference epics: INK-834 ADRs: ADR-016 Depends on: Tokio runtime Crate: crates/infrastructure/task-runner/


The task runner is the workspace’s background execution surface. It runs two kinds of work:

  • Event-driven tasks — react to bounded channels of application events (e.g. page writes produce embedding events)
  • Scheduled tasks — run on fixed intervals (e.g. history collapse, autonomous World Agent tasks)

A single TaskRunner instance is created per workspace and manages all background work for that workspace’s lifetime. Every subsystem that needs background processing without blocking the UI thread — embeddings, history collapse, scheduled agent runs — registers its task here.

The scheduler stays in Rust. It does not execute agent logic itself. When a scheduled task is agent-kind, the runner IPC-triggers the Python sidecar against a thread_id, and LangGraph inside the sidecar resumes from the last checkpoint for that thread (see ADR-016 and the process model). This division keeps timers, backpressure, and lifecycle management in Rust and keeps agent graph execution in the one place it is defined.


TaskRunner (per workspace)
├── Event-driven tasks
│ └── EmbeddingTask crates/application/src/embedding/
│ Reacts to page saves, indexes workspace content
└── Scheduled tasks
├── HistoryCollapseTask apps/desktop/src-tauri/src/history_collapse.rs
│ Prunes event_log entries beyond retention window (24h interval)
└── WorldAgentScheduledTask crates/infrastructure/task-runner/src/world_agent/
One task-runner entry per registered autonomous-task category;
invokes the sidecar over IPC against the agent's thread_id
  1. Workspace open — a new TaskRunner is created in open_workspace_async(). Tasks are registered before start() is called.
  2. Running — the runner spawns a supervisor tokio task that owns all child task join handles. Each task runs in its own tokio task with structured tracing spans.
  3. Workspace close — the runner is taken from AppState.managers.task_runner and shutdown_and_wait() is called with a bounded timeout (5 seconds). The CancellationToken signals all tasks to stop.
  4. App exit — the same shutdown sequence runs in the Tauri on_event(CloseRequested) handler. If the runner is dropped without explicit shutdown, the Drop impl cancels all tasks (fire-and-forget).
AppState
└── managers: ManagerBundle
└── task_runner: Mutex<Option<TaskRunner>>
Some(_) when workspace is open
None when no workspace is active

The get_task_runner_health Tauri command exposes per-task health snapshots to the frontend for diagnostics.


Event-driven tasks implement the EventDrivenTask trait:

pub trait EventDrivenTask: Send + Sync + 'static {
type Event: Send + 'static + Eq + Hash;
fn name(&self) -> &'static str;
fn channel_capacity(&self) -> usize; // default: 256
fn debounce_interval(&self) -> Option<Duration>; // default: None
fn handle_batch(&self, events: Vec<Self::Event>)
-> impl Future<Output = Result<(), TaskError>> + Send;
}

Key behaviors:

  • Bounded channel with backpressure — each task gets an mpsc::channel with configurable capacity. EventTaskHandle::try_send() returns TaskSendError::Full when at capacity; events are dropped rather than blocking the caller.
  • Batch processing — events drain from the channel and are delivered as a Vec to handle_batch(). This amortizes overhead for bursty workloads.
  • Debounce with deduplication — when debounce_interval() returns Some(duration), events accumulate in a HashSet (deduplicating by Eq + Hash) and flush only after the debounce timer expires. The timer resets on each new event; the Sleep future is allocated once and reset in-place via Sleep::reset().
  • Lock-free queue depth — an AtomicUsize counter tracks queue depth on the hot try_send path without acquiring the health mutex.
  • Status broadcasting — a watch::channel provides TaskStatus updates consumers can subscribe to (e.g. the embedding status watch in the frontend).
TaskEvent typeDebounceCapacityPurpose
EmbeddingTaskEmbeddingEventConfigurable256Index page content into embedding vectors

Scheduled tasks implement the ScheduledTask trait:

pub trait ScheduledTask: Send + Sync + 'static {
fn name(&self) -> &'static str;
fn interval(&self) -> Duration;
fn initial_delay(&self) -> Option<Duration>; // default: None
fn jitter_percent(&self) -> Option<u8>; // default: None
fn run(&self) -> impl Future<Output = Result<(), TaskError>> + Send;
}

Key behaviors:

  • Fixed interval — uses tokio::time::interval_at for periodic execution.
  • Initial delay — optional delay before the first tick, useful for letting the application stabilize after workspace open.
  • Jitter — optional deterministic jitter (percentage of interval) applied to the first tick only. Prevents thundering-herd when multiple scheduled tasks share an interval. The jitter is deterministic per task name (FNV-1a hash), so restarts produce the same offset.
TaskIntervalPurpose
HistoryCollapseTask24 hoursPrune event_log entries older than event_log_retention_days
WorldAgentScheduledTaskPer categoryInvoke the Python sidecar against the agent’s thread_id for a registered autonomous category

The World Agent participates in the task runner as a scheduled-task kind — not as a process the runner owns, but as a per-category task whose run() implementation IPC-triggers the Python sidecar. This is the concrete wiring behind the scheduling system’s category-based scope grants.

One WorldAgentScheduledTask entry is registered per enabled autonomous-task category (per PL-C). Categories and their defaults:

CategoryDefault cadencePurpose
structural_maintenance24 hoursLink cleanup, type drift, orphan surveys
revalidation_processingHourlyWalk the re-validation queue, produce candidate fixes
content_proposalsDailySurface new candidate pages / derivation links
external_importsOpt-inPoll configured external sources (default: ask)
canonical_modificationsDailyPropose retirements, supersessions (submit as candidate)

Cadences are configurable per workspace. A category that is scope-revoked is de-registered from the runner; re-granting it re-registers.

The WorldAgentScheduledTask::run() body is narrow on purpose. It does not invoke the LangGraph graph. It:

  1. Resolves the World Agent’s PermissionGuard for this workspace (via CapabilityResolver — see permission-system).
  2. Looks up the thread_id associated with (workspace_id, category). Each category has a stable per-workspace thread so resumption is idempotent: one in-flight run per category per workspace.
  3. Builds an IPC trigger message containing { thread_id, category, scope_summary, invocation_reason: "scheduled" }.
  4. Sends the trigger to the Python sidecar over the agent-event IPC channel (see process-model).
  5. Awaits completion of that trigger (bounded timeout) and records task health, then yields.

On receipt, the Python sidecar’s LangGraph runtime resumes the graph at thread_id. If a previous invocation checkpointed mid-run (long-running task, interrupt for permission, etc.), the checkpointer’s stored state for that thread_id is loaded and the graph continues from the last node. If the thread is fresh or previously completed, a new traversal begins with the category’s scope summary as input. See agent-core-system for the checkpointer semantics.

The World Agent’s writes during the run traverse the submit boundary exactly as interactive writes do — the category-grant controls scope (what the agent is authorized to do while under this task), not the write path. Event-log entries from the run are tagged event_source = 'autonomous_task' with source_ref = <category> so they are filterable downstream (see event-log-system §event source).

Keeping the timer, the scope resolution, and the IPC trigger in Rust — and keeping the graph execution in the sidecar — lets each side own what it is good at:

  • The task runner already handles supervision, jitter, health, backpressure, and workspace-scoped lifecycle. Agent tasks get those for free.
  • The sidecar already owns LangGraph, checkpointing, subagent dispatch, interrupt semantics, and the full tool surface. Re-implementing any of that Rust-side would be duplication.
  • The thread_id is the continuity key. It survives sidecar restart, workspace reopen, and app relaunch. Every trigger against the same (workspace, category) hits the same conversation the runtime already remembers.

Every registered task maintains a TaskHealth snapshot:

pub struct TaskHealth {
pub name: &'static str,
pub status: TaskStatus, // Idle | Running | Queued(usize) | Error(String)
pub last_run: Option<SystemTime>,
pub error_count: u64,
pub queue_depth: usize,
}

TaskRunner::runner_health() returns an aggregate TaskRunnerHealth with task count, active count, and error count. This is exposed to the frontend via the get_task_runner_health Tauri command, which serializes health data as TaskHealthSnapshot (Unix-millisecond timestamps for cross-language compatibility).

World Agent scheduled tasks participate in health tracking normally: an IPC trigger timing out, a sidecar panic, or a non-zero agent-run error propagates back as a TaskError and increments the task’s error count. The author’s control plane surfaces per-category health for the World Agent so the operator can see which categories are healthy and which are stuck.


Task errors use the TaskError enum:

  • ExecutionFailed { message } — generic task body failure.
  • ChannelClosed — the event channel was closed unexpectedly.
  • Timeout { duration_ms } — the task exceeded its allowed execution time.

Errors increment error_count and set status to TaskStatus::Error(message). The task loop continues running after errors — a single failure does not terminate the task. If a tokio join handle fails (panic), the supervisor logs the error and increments the error count.

For WorldAgentScheduledTask, IPC-side failures (sidecar unreachable, trigger timeout) are ExecutionFailed at this layer. Agent-run failures inside the sidecar (graph-level error, capability denial, submit-boundary conflict that could not be reconciled) report to the runner as successful completion at the IPC level but are visible in the event log and the agent’s own error surfaces. This keeps the task runner’s health semantics focused on “did the trigger land?” rather than duplicating the agent’s own error reporting.


The TaskRunner uses a CancellationToken shared across all tasks. On shutdown:

  1. cancel() is called, signaling all tokio::select! loops to exit.
  2. The supervisor awaits all task join handles.
  3. A oneshot channel signals completion back to the caller via shutdown_and_wait().

The Tauri integration wraps this in a 5-second timeout. If the timeout elapses, the runner is dropped (which calls cancel() again via Drop), and in-flight work is abandoned.

A WorldAgentScheduledTask that is cancelled mid-trigger does not terminate the sidecar’s in-flight run. The sidecar’s LangGraph checkpointer records progress durably; when the next trigger arrives (on workspace reopen or scheduled re-entry), resumption picks up from the last checkpoint. Cancellation here means “stop waiting” on the Rust side, not “kill the agent run.”


Was this page helpful?