LLM System
Diagram
Section titled “Diagram”Overview
Section titled “Overview”The LLM system provides a multi-provider abstraction for interacting with large language models. It handles API key
management, provider registration, request construction, streaming responses, and rate limit tracking. Built on the
rig-core crate for underlying provider implementations.
The system supports cloud providers via BYOK (Bring Your Own Key) — users configure their own API keys stored securely in the OS keychain — Ollama for local model inference with no API key required, and OpenRouter as an aggregator gateway providing access to 100+ models from multiple providers via a single API key or OAuth PKCE flow.
Key Design Decisions
Section titled “Key Design Decisions”1. Multi-Provider from Day One
Section titled “1. Multi-Provider from Day One”Five providers are supported: Anthropic (Claude), OpenAI (GPT), xAI (Grok), Ollama (local models), and OpenRouter
(aggregator). Cloud providers (Anthropic, OpenAI, xAI, OpenRouter) use ProviderRegistry, built from OS keychain keys.
Ollama uses OllamaProvider directly without keychain involvement.
OpenRouter provides access to 100+ models from multiple underlying providers via a single API key. It uses Rig’s native
rig::providers::openrouter module — not the OpenAI-compatible wrapper.
2. Keychain-Based Key Storage
Section titled “2. Keychain-Based Key Storage”API keys are stored in the OS keychain (macOS: Security.framework), never in the settings JSON file. The KeyStore
trait abstracts key retrieval:
#[async_trait]pub trait KeyStore: Send + Sync { async fn get_key(&self, provider: ProviderKind) -> Option<String>;}KeychainKeyStore in the Tauri framework layer implements both:
application::settings::KeychainStore(sync, for Tauri command operations)infrastructure_llm::KeyStore(async, for provider registry construction)
This dual-trait pattern lets a single concrete adapter serve two distinct layer boundaries.
3. Rig as Thin Abstraction
Section titled “3. Rig as Thin Abstraction”The rig-core crate provides provider-specific API adapters. We use it strictly as a thin LLM abstraction
(CompletionModel + provider routing) — the agent loop, tool system, and session management are all custom.
4. Stub Fallback
Section titled “4. Stub Fallback”When no API key is configured, build_llm_provider() returns a StubLlmProvider that returns LlmError::Provider for
every call. This avoids Option<Arc<dyn LlmProvider>> throughout the codebase — the harness always has a provider, but
it may be non-functional.
The StubLlmProvider is feature-gated behind test-support to keep it out of the library’s default surface.
5. Ollama — Keyless Local Provider
Section titled “5. Ollama — Keyless Local Provider”Ollama runs locally and requires no API key. The OllamaProvider wraps rig-core’s Ollama adapter and is registered
directly without consulting the KeyStore. Key design elements:
- No keychain involvement:
OllamaHttpClienttalks directly to the Ollama HTTP API (http://localhost:11434by default, overridable viaAgentSettings.ollama_url) - Hardware-aware recommendations:
SystemCapabilities::detect()reads system RAM via thesysinfocrate and classifies the machine intoHardwareTier(Low/Medium/High). The staticRecommendedModelcatalog (6 models) maps tiers to appropriate model sizes - Static curated catalog: Model recommendations are compiled into the binary — no network call required to display suggestions
- Streaming model pulls: Model download uses the Ollama
/api/pullendpoint with NDJSON streaming, surfaced throughpull_ollama_modelTauri command with progress events
6. OpenRouter — OAuth PKCE or Manual Key
Section titled “6. OpenRouter — OAuth PKCE or Manual Key”OpenRouter is a model aggregator supporting 100+ models from providers including Anthropic, OpenAI, Google, Meta, and others — all accessible via a single API key. Two authentication paths are supported:
- OAuth PKCE flow (recommended): The user clicks “Connect with OpenRouter” in the agent settings UI. The app
generates a PKCE code verifier/challenge, opens the OpenRouter authorization URL in the default browser, receives the
callback via the
inklings://deep link scheme, and completes the token exchange. The resulting access token is stored in the OS keychain. - Manual key entry: Standard
set_api_keyflow used by all other cloud providers.
OpenRouter defaults to "openai/gpt-4o" as the initial model. Users can change the model string to any
OpenRouter-supported model identifier.
The OAuth PKCE flow is implemented in apps/desktop/src-tauri/src/openrouter_auth.rs (framework layer), not in the
infrastructure-llm crate.
Architecture
Section titled “Architecture”Provider Construction Flow
Section titled “Provider Construction Flow”1. User saves API key via UI → validate_api_key (async, hits provider API) → set_api_key (stores in keychain, sets api_key_configured=true)
OR (OpenRouter only): → start_openrouter_auth (PKCE initiation, opens browser) → complete_openrouter_auth (token exchange, stores in keychain, sets api_key_configured=true)
2. Agent starts (start_agent command) → build_llm_provider() checks api_key_configured flag → If true: ProviderRegistry::from_keys(&key_store) → Iterates Anthropic/OpenAI/xAI/OpenRouter, queries keychain for each → Builds LlmModel per found key → Wraps in RegistryProvider adapter → Arc<dyn LlmProvider> → If false: StubLlmProvider → Arc<dyn LlmProvider>
3. Agent loop calls provider.complete() / provider.stream() → RegistryProvider delegates to ProviderRegistry → Registry selects model by provider kind → Model calls rig-core provider APIModule Map
Section titled “Module Map”| Module | Responsibility |
|---|---|
provider.rs | ProviderKind enum (type alias for domain::AgentProvider) |
registry.rs | ProviderRegistry — maps provider kinds to configured LlmModel instances |
model.rs | LlmModel — wraps rig-core completion model with provider metadata |
request.rs | LlmRequest, LlmMessage, ToolDefinition, CacheHint |
response.rs | LlmResponse, ContentBlock, StopReason, TokenUsage |
streaming.rs | LlmStream and StreamChunk for streaming completions |
key_store.rs | KeyStore trait + InMemoryKeyStore for testing |
rate_limit.rs | RateLimitTracker — per-provider rate limit state |
validation.rs | validate_api_key() — lightweight API call to verify key validity |
providers/ | Provider-specific adapters (anthropic, openai, xai) + shared common.rs |
providers/ollama.rs | OllamaProvider — rig-core Ollama adapter for completions |
providers/openrouter.rs | OpenRouterProvider — wraps rig::providers::openrouter::Client (native Rig module, not OpenAI-compat wrapper) |
ollama_client.rs | OllamaHttpClient — health checks, model listing, streaming model pulls |
hardware.rs | SystemCapabilities detection and HardwareTier classification |
stub.rs | StubLlmProvider — feature-gated no-op provider |
Prompt Caching
Section titled “Prompt Caching”The request model supports CacheHint annotations on messages for providers that support prompt caching (Anthropic).
System prompts and tool definitions can be marked as cacheable, reducing token costs for repeated interactions.
Rate Limiting
Section titled “Rate Limiting”RateLimitTracker maintains per-provider rate limit state extracted from API response headers. When a provider returns
rate limit headers, the tracker records:
- Remaining requests/tokens
- Reset timestamps
The agent loop can query the tracker before sending requests to avoid hitting limits.
Error Handling
Section titled “Error Handling”| Error | Variant | Handling |
|---|---|---|
| Invalid API key | LlmError::AuthFailure | Returned during validation; UI shows error |
| Rate limited | LlmError::RateLimited | Returned during validation or runtime |
| Provider error | LlmError::Provider | Generic provider-specific error |
| Network error | LlmError::Network | Connection failure |
| Unsupported provider | LlmError::UnsupportedProvider | No keys found for any provider |
Connection Points
Section titled “Connection Points”| System | Relationship |
|---|---|
| User Settings | KeychainStore trait for key CRUD; api_key_configured flag; ollama_url for custom endpoint |
| Agent Core | RegistryProvider adapter bridges ProviderRegistry → LlmProvider trait |
| Agent Harness | build_llm_provider() constructs the provider at harness startup |
| Domain | AgentProvider enum (Anthropic, OpenAi, Xai, Ollama, OpenRouter); OllamaStatus, OllamaModelInfo, PullProgress, RecommendedModel, HardwareTier, ModelCategory, SystemCapabilities in crates/domain/src/ollama.rs |
| OpenRouter OAuth | openrouter_auth.rs in src-tauri handles PKCE flow; inklings:// deep link scheme receives OAuth callback |
| Ollama | OllamaHttpClient for health (/api/tags), model listing, and streaming pulls (/api/pull) |
Testing Strategy
Section titled “Testing Strategy”- Unit tests:
InMemoryKeyStorefor testing key retrieval without OS keychain - Integration tests: Provider construction with mock key stores
- No live API tests: Validation and provider calls are not tested against real endpoints in CI
cargo test -p infrastructure-llmLLM system is required by Agent Core and Agent Harness.
Was this page helpful?
Thanks for your feedback!