Agent

LLM System

Diagram

Overview

The LLM system provides a multi-provider abstraction for interacting with large language models. It handles API key management, provider registration, request construction, streaming responses, and rate limit tracking. Built on the rig-core crate for underlying provider implementations.

The system supports cloud providers via BYOK (Bring Your Own Key) — users configure their own API keys stored securely in the OS keychain — Ollama for local model inference with no API key required, and OpenRouter as an aggregator gateway providing access to 100+ models from multiple providers via a single API key or OAuth PKCE flow.

Key Design Decisions

1. Multi-Provider from Day One

Five providers are supported: Anthropic (Claude), OpenAI (GPT), xAI (Grok), Ollama (local models), and OpenRouter (aggregator). Cloud providers (Anthropic, OpenAI, xAI, OpenRouter) use ProviderRegistry, built from OS keychain keys. Ollama uses OllamaProvider directly without keychain involvement.

OpenRouter provides access to 100+ models from multiple underlying providers via a single API key. It uses Rig’s native rig::providers::openrouter module — not the OpenAI-compatible wrapper.

2. Keychain-Based Key Storage

API keys are stored in the OS keychain (macOS: Security.framework), never in the settings JSON file. The KeyStore trait abstracts key retrieval:

#[async_trait]
pub trait KeyStore: Send + Sync {
    async fn get_key(&self, provider: ProviderKind) -> Option<String>;
}

KeychainKeyStore in the Tauri framework layer implements both:

application::settings::KeychainStore (sync, for Tauri command operations)
infrastructure_llm::KeyStore (async, for provider registry construction)

This dual-trait pattern lets a single concrete adapter serve two distinct layer boundaries.

3. Rig as Thin Abstraction

The rig-core crate provides provider-specific API adapters. We use it strictly as a thin LLM abstraction (CompletionModel + provider routing) — the agent loop, tool system, and session management are all custom.

4. Stub Fallback

When no API key is configured, build_llm_provider() returns a StubLlmProvider that returns LlmError::Provider for every call. This avoids Option<Arc<dyn LlmProvider>> throughout the codebase — the harness always has a provider, but it may be non-functional.

The StubLlmProvider is feature-gated behind test-support to keep it out of the library’s default surface.

5. Ollama — Keyless Local Provider

Ollama runs locally and requires no API key. The OllamaProvider wraps rig-core’s Ollama adapter and is registered directly without consulting the KeyStore. Key design elements:

No keychain involvement: OllamaHttpClient talks directly to the Ollama HTTP API (http://localhost:11434 by default, overridable via AgentSettings.ollama_url)
Hardware-aware recommendations: SystemCapabilities::detect() reads system RAM via the sysinfo crate and classifies the machine into HardwareTier (Low/Medium/High). The static RecommendedModel catalog (6 models) maps tiers to appropriate model sizes
Static curated catalog: Model recommendations are compiled into the binary — no network call required to display suggestions
Streaming model pulls: Model download uses the Ollama /api/pull endpoint with NDJSON streaming, surfaced through pull_ollama_model Tauri command with progress events

6. OpenRouter — OAuth PKCE or Manual Key

OpenRouter is a model aggregator supporting 100+ models from providers including Anthropic, OpenAI, Google, Meta, and others — all accessible via a single API key. Two authentication paths are supported:

OAuth PKCE flow (recommended): The user clicks “Connect with OpenRouter” in the agent settings UI. The app generates a PKCE code verifier/challenge, opens the OpenRouter authorization URL in the default browser, receives the callback via the inklings:// deep link scheme, and completes the token exchange. The resulting access token is stored in the OS keychain.
Manual key entry: Standard set_api_key flow used by all other cloud providers.

OpenRouter defaults to "openai/gpt-4o" as the initial model. Users can change the model string to any OpenRouter-supported model identifier.

The OAuth PKCE flow is implemented in apps/desktop/src-tauri/src/openrouter_auth.rs (framework layer), not in the infrastructure-llm crate.

Architecture

Provider Construction Flow

1. User saves API key via UI
   → validate_api_key (async, hits provider API)
   → set_api_key (stores in keychain, sets api_key_configured=true)

   OR (OpenRouter only):
   → start_openrouter_auth (PKCE initiation, opens browser)
   → complete_openrouter_auth (token exchange, stores in keychain, sets api_key_configured=true)

2. Agent starts (start_agent command)
   → build_llm_provider() checks api_key_configured flag
   → If true: ProviderRegistry::from_keys(&key_store)
     → Iterates Anthropic/OpenAI/xAI/OpenRouter, queries keychain for each
     → Builds LlmModel per found key
     → Wraps in RegistryProvider adapter → Arc<dyn LlmProvider>
   → If false: StubLlmProvider → Arc<dyn LlmProvider>

3. Agent loop calls provider.complete() / provider.stream()
   → RegistryProvider delegates to ProviderRegistry
   → Registry selects model by provider kind
   → Model calls rig-core provider API

Module Map

Module	Responsibility
`provider.rs`	`ProviderKind` enum (type alias for `domain::AgentProvider`)
`registry.rs`	`ProviderRegistry` — maps provider kinds to configured `LlmModel` instances
`model.rs`	`LlmModel` — wraps rig-core completion model with provider metadata
`request.rs`	`LlmRequest`, `LlmMessage`, `ToolDefinition`, `CacheHint`
`response.rs`	`LlmResponse`, `ContentBlock`, `StopReason`, `TokenUsage`
`streaming.rs`	`LlmStream` and `StreamChunk` for streaming completions
`key_store.rs`	`KeyStore` trait + `InMemoryKeyStore` for testing
`rate_limit.rs`	`RateLimitTracker` — per-provider rate limit state
`validation.rs`	`validate_api_key()` — lightweight API call to verify key validity
`providers/`	Provider-specific adapters (anthropic, openai, xai) + shared `common.rs`
`providers/ollama.rs`	`OllamaProvider` — rig-core Ollama adapter for completions
`providers/openrouter.rs`	`OpenRouterProvider` — wraps `rig::providers::openrouter::Client` (native Rig module, not OpenAI-compat wrapper)
`ollama_client.rs`	`OllamaHttpClient` — health checks, model listing, streaming model pulls
`hardware.rs`	`SystemCapabilities` detection and `HardwareTier` classification
`stub.rs`	`StubLlmProvider` — feature-gated no-op provider

Prompt Caching

The request model supports CacheHint annotations on messages for providers that support prompt caching (Anthropic). System prompts and tool definitions can be marked as cacheable, reducing token costs for repeated interactions.

Rate Limiting

RateLimitTracker maintains per-provider rate limit state extracted from API response headers. When a provider returns rate limit headers, the tracker records:

Remaining requests/tokens
Reset timestamps

The agent loop can query the tracker before sending requests to avoid hitting limits.

Error Handling

Error	Variant	Handling
Invalid API key	`LlmError::AuthFailure`	Returned during validation; UI shows error
Rate limited	`LlmError::RateLimited`	Returned during validation or runtime
Provider error	`LlmError::Provider`	Generic provider-specific error
Network error	`LlmError::Network`	Connection failure
Unsupported provider	`LlmError::UnsupportedProvider`	No keys found for any provider

Connection Points

System	Relationship
User Settings	`KeychainStore` trait for key CRUD; `api_key_configured` flag; `ollama_url` for custom endpoint
Agent Core	`RegistryProvider` adapter bridges `ProviderRegistry` → `LlmProvider` trait
Agent Harness	`build_llm_provider()` constructs the provider at harness startup
Domain	`AgentProvider` enum (Anthropic, OpenAi, Xai, Ollama, OpenRouter); `OllamaStatus`, `OllamaModelInfo`, `PullProgress`, `RecommendedModel`, `HardwareTier`, `ModelCategory`, `SystemCapabilities` in `crates/domain/src/ollama.rs`
OpenRouter OAuth	`openrouter_auth.rs` in `src-tauri` handles PKCE flow; `inklings://` deep link scheme receives OAuth callback
Ollama	`OllamaHttpClient` for health (`/api/tags`), model listing, and streaming pulls (`/api/pull`)

Testing Strategy

Unit tests: InMemoryKeyStore for testing key retrieval without OS keychain
Integration tests: Provider construction with mock key stores
No live API tests: Validation and provider calls are not tested against real endpoints in CI

cargo test -p infrastructure-llm

LLM system is required by Agent Core and Agent Harness.

Previous
Conversation System Next
MCP System

Was this page helpful?