Task Runner System
Status: Implemented (event-driven + scheduled runtime live; World Agent task kinds scheduled for E7) Reference epics: INK-834 ADRs: ADR-016 Depends on: Tokio runtime Crate: crates/infrastructure/task-runner/
Overview
Section titled “Overview”The task runner is the workspace’s background execution surface. It runs two kinds of work:
- Event-driven tasks — react to bounded channels of application events (e.g. page writes produce embedding events)
- Scheduled tasks — run on fixed intervals (e.g. history collapse, autonomous World Agent tasks)
A single TaskRunner instance is created per workspace and manages all background work for that workspace’s lifetime. Every subsystem that needs background processing without blocking the UI thread — embeddings, history collapse, scheduled agent runs — registers its task here.
The scheduler stays in Rust. It does not execute agent logic itself. When a scheduled task is agent-kind, the runner IPC-triggers the Python sidecar against a thread_id, and LangGraph inside the sidecar resumes from the last checkpoint for that thread (see ADR-016 and the process model). This division keeps timers, backpressure, and lifecycle management in Rust and keeps agent graph execution in the one place it is defined.
Architecture
Section titled “Architecture”TaskRunner (per workspace) ├── Event-driven tasks │ └── EmbeddingTask crates/application/src/embedding/ │ Reacts to page saves, indexes workspace content │ └── Scheduled tasks ├── HistoryCollapseTask apps/desktop/src-tauri/src/history_collapse.rs │ Prunes event_log entries beyond retention window (24h interval) │ └── WorldAgentScheduledTask crates/infrastructure/task-runner/src/world_agent/ One task-runner entry per registered autonomous-task category; invokes the sidecar over IPC against the agent's thread_idLifecycle
Section titled “Lifecycle”- Workspace open — a new
TaskRunneris created inopen_workspace_async(). Tasks are registered beforestart()is called. - Running — the runner spawns a supervisor tokio task that owns all child task join handles. Each task runs in its own tokio task with structured tracing spans.
- Workspace close — the runner is taken from
AppState.managers.task_runnerandshutdown_and_wait()is called with a bounded timeout (5 seconds). TheCancellationTokensignals all tasks to stop. - App exit — the same shutdown sequence runs in the Tauri
on_event(CloseRequested)handler. If the runner is dropped without explicit shutdown, theDropimpl cancels all tasks (fire-and-forget).
Integration with AppState
Section titled “Integration with AppState”AppState └── managers: ManagerBundle └── task_runner: Mutex<Option<TaskRunner>> Some(_) when workspace is open None when no workspace is activeThe get_task_runner_health Tauri command exposes per-task health snapshots to the frontend for diagnostics.
Event-driven tasks
Section titled “Event-driven tasks”Event-driven tasks implement the EventDrivenTask trait:
pub trait EventDrivenTask: Send + Sync + 'static { type Event: Send + 'static + Eq + Hash;
fn name(&self) -> &'static str; fn channel_capacity(&self) -> usize; // default: 256 fn debounce_interval(&self) -> Option<Duration>; // default: None fn handle_batch(&self, events: Vec<Self::Event>) -> impl Future<Output = Result<(), TaskError>> + Send;}Key behaviors:
- Bounded channel with backpressure — each task gets an
mpsc::channelwith configurable capacity.EventTaskHandle::try_send()returnsTaskSendError::Fullwhen at capacity; events are dropped rather than blocking the caller. - Batch processing — events drain from the channel and are delivered as a
Vectohandle_batch(). This amortizes overhead for bursty workloads. - Debounce with deduplication — when
debounce_interval()returnsSome(duration), events accumulate in aHashSet(deduplicating byEq + Hash) and flush only after the debounce timer expires. The timer resets on each new event; theSleepfuture is allocated once and reset in-place viaSleep::reset(). - Lock-free queue depth — an
AtomicUsizecounter tracks queue depth on the hottry_sendpath without acquiring the health mutex. - Status broadcasting — a
watch::channelprovidesTaskStatusupdates consumers can subscribe to (e.g. the embedding status watch in the frontend).
Current event-driven tasks
Section titled “Current event-driven tasks”| Task | Event type | Debounce | Capacity | Purpose |
|---|---|---|---|---|
EmbeddingTask | EmbeddingEvent | Configurable | 256 | Index page content into embedding vectors |
Scheduled tasks
Section titled “Scheduled tasks”Scheduled tasks implement the ScheduledTask trait:
pub trait ScheduledTask: Send + Sync + 'static { fn name(&self) -> &'static str; fn interval(&self) -> Duration; fn initial_delay(&self) -> Option<Duration>; // default: None fn jitter_percent(&self) -> Option<u8>; // default: None fn run(&self) -> impl Future<Output = Result<(), TaskError>> + Send;}Key behaviors:
- Fixed interval — uses
tokio::time::interval_atfor periodic execution. - Initial delay — optional delay before the first tick, useful for letting the application stabilize after workspace open.
- Jitter — optional deterministic jitter (percentage of interval) applied to the first tick only. Prevents thundering-herd when multiple scheduled tasks share an interval. The jitter is deterministic per task name (FNV-1a hash), so restarts produce the same offset.
Current scheduled tasks
Section titled “Current scheduled tasks”| Task | Interval | Purpose |
|---|---|---|
HistoryCollapseTask | 24 hours | Prune event_log entries older than event_log_retention_days |
WorldAgentScheduledTask | Per category | Invoke the Python sidecar against the agent’s thread_id for a registered autonomous category |
World Agent task kinds
Section titled “World Agent task kinds”The World Agent participates in the task runner as a scheduled-task kind — not as a process the runner owns, but as a per-category task whose run() implementation IPC-triggers the Python sidecar. This is the concrete wiring behind the scheduling system’s category-based scope grants.
Registration
Section titled “Registration”One WorldAgentScheduledTask entry is registered per enabled autonomous-task category (per PL-C). Categories and their defaults:
| Category | Default cadence | Purpose |
|---|---|---|
structural_maintenance | 24 hours | Link cleanup, type drift, orphan surveys |
revalidation_processing | Hourly | Walk the re-validation queue, produce candidate fixes |
content_proposals | Daily | Surface new candidate pages / derivation links |
external_imports | Opt-in | Poll configured external sources (default: ask) |
canonical_modifications | Daily | Propose retirements, supersessions (submit as candidate) |
Cadences are configurable per workspace. A category that is scope-revoked is de-registered from the runner; re-granting it re-registers.
What run() does
Section titled “What run() does”The WorldAgentScheduledTask::run() body is narrow on purpose. It does not invoke the LangGraph graph. It:
- Resolves the World Agent’s
PermissionGuardfor this workspace (viaCapabilityResolver— see permission-system). - Looks up the
thread_idassociated with(workspace_id, category). Each category has a stable per-workspace thread so resumption is idempotent: one in-flight run per category per workspace. - Builds an IPC trigger message containing
{ thread_id, category, scope_summary, invocation_reason: "scheduled" }. - Sends the trigger to the Python sidecar over the agent-event IPC channel (see process-model).
- Awaits completion of that trigger (bounded timeout) and records task health, then yields.
What happens on the sidecar side
Section titled “What happens on the sidecar side”On receipt, the Python sidecar’s LangGraph runtime resumes the graph at thread_id. If a previous invocation checkpointed mid-run (long-running task, interrupt for permission, etc.), the checkpointer’s stored state for that thread_id is loaded and the graph continues from the last node. If the thread is fresh or previously completed, a new traversal begins with the category’s scope summary as input. See agent-core-system for the checkpointer semantics.
The World Agent’s writes during the run traverse the submit boundary exactly as interactive writes do — the category-grant controls scope (what the agent is authorized to do while under this task), not the write path. Event-log entries from the run are tagged event_source = 'autonomous_task' with source_ref = <category> so they are filterable downstream (see event-log-system §event source).
Why this split
Section titled “Why this split”Keeping the timer, the scope resolution, and the IPC trigger in Rust — and keeping the graph execution in the sidecar — lets each side own what it is good at:
- The task runner already handles supervision, jitter, health, backpressure, and workspace-scoped lifecycle. Agent tasks get those for free.
- The sidecar already owns LangGraph, checkpointing, subagent dispatch, interrupt semantics, and the full tool surface. Re-implementing any of that Rust-side would be duplication.
- The
thread_idis the continuity key. It survives sidecar restart, workspace reopen, and app relaunch. Every trigger against the same(workspace, category)hits the same conversation the runtime already remembers.
Health monitoring
Section titled “Health monitoring”Every registered task maintains a TaskHealth snapshot:
pub struct TaskHealth { pub name: &'static str, pub status: TaskStatus, // Idle | Running | Queued(usize) | Error(String) pub last_run: Option<SystemTime>, pub error_count: u64, pub queue_depth: usize,}TaskRunner::runner_health() returns an aggregate TaskRunnerHealth with task count, active count, and error count. This is exposed to the frontend via the get_task_runner_health Tauri command, which serializes health data as TaskHealthSnapshot (Unix-millisecond timestamps for cross-language compatibility).
World Agent scheduled tasks participate in health tracking normally: an IPC trigger timing out, a sidecar panic, or a non-zero agent-run error propagates back as a TaskError and increments the task’s error count. The author’s control plane surfaces per-category health for the World Agent so the operator can see which categories are healthy and which are stuck.
Error handling
Section titled “Error handling”Task errors use the TaskError enum:
ExecutionFailed { message }— generic task body failure.ChannelClosed— the event channel was closed unexpectedly.Timeout { duration_ms }— the task exceeded its allowed execution time.
Errors increment error_count and set status to TaskStatus::Error(message). The task loop continues running after errors — a single failure does not terminate the task. If a tokio join handle fails (panic), the supervisor logs the error and increments the error count.
For WorldAgentScheduledTask, IPC-side failures (sidecar unreachable, trigger timeout) are ExecutionFailed at this layer. Agent-run failures inside the sidecar (graph-level error, capability denial, submit-boundary conflict that could not be reconciled) report to the runner as successful completion at the IPC level but are visible in the event log and the agent’s own error surfaces. This keeps the task runner’s health semantics focused on “did the trigger land?” rather than duplicating the agent’s own error reporting.
Shutdown behavior
Section titled “Shutdown behavior”The TaskRunner uses a CancellationToken shared across all tasks. On shutdown:
cancel()is called, signaling alltokio::select!loops to exit.- The supervisor awaits all task join handles.
- A
oneshotchannel signals completion back to the caller viashutdown_and_wait().
The Tauri integration wraps this in a 5-second timeout. If the timeout elapses, the runner is dropped (which calls cancel() again via Drop), and in-flight work is abandoned.
A WorldAgentScheduledTask that is cancelled mid-trigger does not terminate the sidecar’s in-flight run. The sidecar’s LangGraph checkpointer records progress durably; when the next trigger arrives (on workspace reopen or scheduled re-entry), resumption picks up from the last checkpoint. Cancellation here means “stop waiting” on the Rust side, not “kill the agent run.”
Related
Section titled “Related”- Scheduling system — category taxonomy, scope grants, author-facing controls
- World Agent — the agent as participant
- Agent core system — checkpointing, threads, resumption semantics
- Process model — Rust↔sidecar IPC boundary
- Event log system — consumer of
event_source = 'autonomous_task'events produced by scheduled agent runs - Permission system — how each scheduled agent invocation resolves its capability set
- Embedding system — primary event-driven consumer
- ADR-016: World Runtime on LangGraph
Was this page helpful?
Thanks for your feedback!