LLM System
Status: Design landing Reference epics: INK-825, INK-849 ADRs: ADR-016, ADR-022
The LLM system is how the agent talks to language models. The organizing pattern is “LangGraph orchestrates; DSPy executes”: LangGraph owns the graph, checkpointing, interrupts, and subgraph dispatch; DSPy owns the LLM call inside a node. From LangGraph’s side a node is typed-in, typed-out — DSPy is an implementation detail within the node boundary.
DSPy as the LLM programming layer
Section titled “DSPy as the LLM programming layer”DSPy provides three constructs that every LLM-calling node in the sidecar uses:
- Signatures declare each LLM task’s I/O contract — input fields, output fields, and a task description — declaratively. They replace hand-crafted prompt strings. A Signature describes what a node does; DSPy renders the prompt from it.
- Modules compose Signatures into multi-step LLM programs wired inside a LangGraph node. A Module is the node’s LLM logic; it may chain calls, apply chain-of-thought, or branch on intermediate outputs.
- Adapters render structured outputs against the declared output schema. Structured internal nodes use Adapters to parse model responses cleanly; free-text output fields do not require one.
DSPy is the sanctioned LLM SDK in the sidecar. Sidecar code does not import provider SDKs (anthropic, openai, google-genai, or their equivalents) or LiteLLM directly. LLM access flows through DSPy. LiteLLM enters the dependency graph only as DSPy’s transitive backend, version-pinned with a cooling period; direct imports of LiteLLM from sidecar code are disallowed.
The node boundary
Section titled “The node boundary”DSPy lives strictly inside the node boundary. LangGraph never references DSPy. The node’s contract with LangGraph is typed state in, typed state out — that contract is all LangGraph sees.
Replacing DSPy in a node is a per-node change. Orchestration, node contracts, and IPC are all DSPy-agnostic. The blast radius of a DSPy change is one node.
Client management and credentials
Section titled “Client management and credentials”dspy.LM wraps LiteLLM and provides the configured language-model client. The sidecar maintains one dspy.LM instance per provider, shared across threads. Provider credentials come from the Tauri host: the host stores keys in the OS keychain, passes them to the sidecar at startup through the IPC surface, and the sidecar configures dspy.LM with them. There is no key material on disk inside the sidecar.
A local endpoint (Ollama, LM Studio, or any openai-compatible local server) is configured the same way: host-side settings describe the endpoint; the sidecar constructs a dspy.LM pointed at it through LiteLLM’s local-provider support.
Tool calls
Section titled “Tool calls”LLM tool use and MCP tool use are not the same thing, and this system does not conflate them.
- LLM tool use is a feature of provider APIs: the model produces a structured request to call a tool and consumes a structured response.
- MCP tool use is the agent’s actual tool-calling surface (see MCP System).
The bridge lives inside the planner node. When the node wants to offer the model a tool, it:
- Reads the tool’s schema from the MCP registry.
- Passes it to DSPy as part of the Module’s configured tools.
- Reads the model’s tool-call response through DSPy’s output fields.
- Dispatches the actual call through the MCP client.
- Threads the tool result back into the next DSPy call.
Provider-specific tool-call shapes are handled by DSPy via LiteLLM’s provider normalization, rather than by hand in each node.
Streaming
Section titled “Streaming”Run and agent streaming — node outputs, lifecycle events, interrupts — remains LangGraph’s responsibility over the ADR-016 agent-event IPC. DSPy does not touch that surface.
Token-level LLM streaming passes through DSPy via dspy.streamify and StreamListener, which surface token deltas from named output fields. The user-facing conversational-response Signature is therefore shaped with a free-text answer field, which streams cleanly. Structured internal nodes finalize at parse time and do not require token streaming.
Multimodal and attachments
Section titled “Multimodal and attachments”DSPy’s typed multimodal input fields (dspy.Image, audio where supported), backed by LiteLLM, carry image and attachment input across providers. Attachments are modeled explicitly into Signatures as typed input fields, not passed ambiently.
A verification spike — one streaming free-text Signature and one image-input Signature — precedes the broad rollout, to confirm both surfaces work correctly across the providers Inklings supports.
Model selection
Section titled “Model selection”A turn does not commit to a single model. Different nodes use different dspy.LM instances, and the same node can select a different model on different runs. Selection is made at module configuration time, based on:
- The workspace’s configured default for the node’s role (planner, composer, classifier, summarizer).
- Author overrides (“use Claude Opus for this planning turn”).
- The node’s own rules (e.g., the prompt-injection classifier uses a small, fast model).
There is no global “current model” state on the agent. Two concurrent threads can be running on different models.
Prompt caching and context management
Section titled “Prompt caching and context management”DSPy’s LiteLLM backend surfaces cache-control hints for providers that expose them explicitly (Anthropic). Providers that manage caching implicitly benefit from it automatically through LiteLLM. The context node (see Agent Core System) assembles a memory slice shaped to fit the selected model’s context window; nothing in the LLM system elides or summarizes on the fly.
Reasoning and extended thinking
Section titled “Reasoning and extended thinking”Provider-specific reasoning features (Anthropic extended thinking, OpenAI reasoning effort, Google thinking budgets) are accessible through LiteLLM’s provider-parameter pass-through from within DSPy Modules. The planner node is the typical consumer. Reasoning traces are not stored as conversation content; they are consumed inside the node and discarded, except for any summary the node chooses to record.
Optimizers
Section titled “Optimizers”DSPy’s Optimizer layer (BootstrapFewShot, MIPRO, and their peers) is deferred. The v0.4.0 agent ships with hand-authored Signatures and Modules. Optimizers are DSPy’s fastest-moving surface and are not on the critical path. When they are needed, Signatures provide the hook: Optimizers tune Signatures in place without re-architecting anything.
What the LLM system is not
Section titled “What the LLM system is not”- Not a provider abstraction of Inklings’ own making. DSPy and LiteLLM handle provider normalization; sidecar code does not replicate it.
- Not a tool executor. Tools are executed through MCP System. The LLM system mediates the model’s request for a tool and the model’s consumption of the result.
- Not a memory layer. Memory is Agent Memory System. The LLM system receives a context slice; it does not compose one.
- Not the DSPy skill-compilation pipeline. The
dspy_moduleskill artifact,state_blob, optimize/execute/manifest sidecar daemon, and untrusted-module_codeexecution were removed (INK-835) and are not revived. DSPy here is first-party sidecar code using DSPy Signatures and Modules, version-controlled like any other sidecar code.
Why this shape
Section titled “Why this shape”An earlier design called for calling provider SDKs directly inside graph nodes with no LLM programming layer. That approach kept provider features unobscured but produced hand-crafted prompt strings, ad-hoc structured-output parsing, and per-node prompt drift as the agent’s behavior grew.
DSPy addresses those problems declaratively: Signatures replace prompt strings; Adapters handle structured-output parsing; the provider boundary is LiteLLM’s concern rather than each node’s. The containment at the node seam (LangGraph never references DSPy) keeps the dependency risk bounded: DSPy is fast-moving and research-rooted, and isolating it to the node-execution seam means orchestration and IPC are unaffected by its evolution.
What this page does not do
Section titled “What this page does not do”- It does not describe the agent graph, node composition, or subagent dispatch. See Agent Core System.
- It does not describe tool execution. See MCP System.
- It does not describe memory or context assembly. See Agent Memory System.
- It does not describe prompt-injection classification between tool output and planner input. See Prompt-Injection Boundary.
- It does not describe the sidecar process or IPC to the host. See Process Model.
Was this page helpful?
Thanks for your feedback!