Platform

Embedding System

Status: Implemented Depends On: Page System

Overview

The Embedding System generates and stores dense vector embeddings for workspace pages, enabling semantic similarity search alongside the existing FTS5 keyword search. When a user types a natural-language query (3+ words, no boolean operators), the SearchRouter automatically embeds the query and merges FTS5 and semantic results via Reciprocal Rank Fusion (RRF). Keyword queries (short, quoted, or containing FTS5 operators) bypass embedding entirely and go directly to FTS5.

Embeddings are computed locally using the snowflake-arctic-embed-m-v2.0 sentence transformer model (fp16), run via ONNX Runtime. No data leaves the device.

Overview Diagram

Architecture

Framework (Tauri)
  └── EmbeddingManager          apps/desktop/src-tauri/src/embedding.rs
        │  Tokio background worker, bounded channel (256), debounce (2s)
        │
Application
  ├── EmbeddingPipeline         crates/application/src/embedding/pipeline.rs
  │     Orchestrates: provider + repos → generate → store
  ├── EmbeddingProvider (trait) crates/application/src/embedding/provider.rs
  ├── EmbeddingRepository (trait) crates/application/src/embedding/services.rs
  └── SearchRouter              crates/application/src/search/search_router.rs
        Intent classification, FTS5 + semantic dispatch, RRF merge

Infrastructure
  ├── OnnxEmbeddingProvider     crates/infrastructure/onnx/src/provider.rs
  │     snowflake-arctic-embed-m-v2.0 (fp16), ONNX Runtime, mean pooling, L2 normalize
  └── SqliteEmbeddingRepository crates/infrastructure/sqlite/src/workspace/embedding_repository.rs
        page_embeddings table, f32 BLOB, brute-force cosine similarity

Dependencies flow inward: Framework → Infrastructure → Application → Domain. The application layer sees only traits (EmbeddingProvider, EmbeddingRepository); ONNX and SQLite details are confined to the infrastructure crates.

Pipeline

Incremental (after page save)

Tauri command saves page content → calls EmbeddingManager::queue_embed_page(page_id).
Manager sends EmbedPage command to the background worker channel (bounded, 256 capacity).
Worker deduplicates via HashSet and resets a 2-second debounce timer.
After the debounce fires, process_pages() dispatches to tokio::task::spawn_blocking.
EmbeddingPipeline::embed_page() loads the page via PageRepository, concatenates title + block text, calls EmbeddingProvider::embed(), then calls EmbeddingRepository::upsert().
Pages with empty text after trimming are silently skipped — no embedding stored.

Bulk indexing (workspace open / model upgrade)

EmbeddingManager::trigger_reindex() sends IndexWorkspace to the worker.
Any pending debounced page embeds are flushed first.
run_bulk_index() dispatches to spawn_blocking.
EmbeddingPipeline::index_workspace() loops:
- Calls EmbeddingRepository::get_stale_pages() up to 100 at a time.
- Breaks into batches of 16 (configurable via PipelineConfig::batch_size).
- Calls EmbeddingProvider::embed_batch() for each chunk.
- Writes all results in a single transaction via upsert_batch().
- Reports (completed, total) progress via a callback → forwarded to the watch channel as IndexingStatus::Indexing { completed, total }.
When no more stale pages remain, status transitions to IndexingStatus::Idle.

Model Details

Property	Value
Model	`snowflake-arctic-embed-m-v2.0`
Source	HuggingFace `Snowflake/snowflake-arctic-embed-m-v2.0`
Format	ONNX fp16 (`model_fp16.onnx`, saved as `model.onnx`)
Runtime	ONNX Runtime via the `ort` crate
Output dimensions	768
Max input tokens	512 (truncated, padded at construction time)
Post-processing	Mean pooling over token embeddings (masked), then L2 normalization
Model size	~613 MB (fp16)

The model and tokenizer are not bundled in the repository. Download them with:

./tools/dev/download-embedding-model.sh

Files are placed in .data/models/ and symlinked into crates/infrastructure/onnx/models/ for unit tests. SHA-256 checksums are verified on download when configured.

The ONNX session is initialized once with GraphOptimizationLevel::Level3 and intra_threads=1. Both the session and tokenizer are wrapped in Mutex for interior mutability across the Send + Sync trait bound.

Storage Format

Embeddings are stored in the page_embeddings table inside the workspace SQLite database ({workspace}/.inklings/inklings.db):

page_embeddings (
    page_id      TEXT PRIMARY KEY,   -- UUID, foreign key to pages.id
    model_id     TEXT NOT NULL,      -- e.g. "snowflake-arctic-embed-m-v2.0"
    model_version TEXT NOT NULL,     -- e.g. "1.0.0"
    embedding    BLOB NOT NULL,      -- 768 f32 values, little-endian (3072 bytes)
    updated_at   TEXT NOT NULL       -- ISO 8601 timestamp
)

The embedding vector is stored as a raw little-endian f32 byte BLOB (768 floats × 4 bytes = 3072 bytes). Conversion functions f32_slice_to_bytes and bytes_to_f32_vec in the repository handle serialization. There is no dependency on the sqlite-vec extension — similarity is computed in Rust.

Stale Detection

A page’s embedding is considered stale if any of the following is true:

page_embeddings.page_id IS NULL — no embedding exists yet.
page_embeddings.model_id != current_model_id — model has changed.
page_embeddings.model_version != current_model_version — model was updated.
pages.updated_at > page_embeddings.updated_at — page was modified after the embedding was computed.

The SQL query (get_stale_pages) uses a LEFT JOIN with these conditions. Soft-deleted pages (is_deleted = 1) are excluded.

After a model upgrade, delete_all() can be called to wipe all embeddings, then trigger_reindex() rebuilds from scratch.

Search Integration

SearchRouter

SearchRouter (crates/application/src/search/search_router.rs) is the single entry point for all page search. It is constructed with a PageRepository, an EmbeddingRepository, and an optional EmbeddingProvider. It is cached in AppState as search_router.

SearchRouter::search(workspace_path, query, limit)
  │
  ├─ classify_intent(query)
  │    Keyword → FTS5 only
  │    Semantic → FTS5 + semantic + RRF merge
  │
  ├─ Always: PageRepository::search() (FTS5)
  │
  └─ If Semantic AND provider present:
       EmbeddingProvider::embed(query) → query vector
       EmbeddingRepository::query_similar() → Vec<SimilarPage>
       merge_rrf(fts_results, semantic_results, limit)

If the embedding provider is None (model not yet downloaded or loaded), all queries fall back to FTS5 only. If embedding or similarity search fails, the router logs a warning and returns FTS5 results.

Intent Classification

SearchRouter::classify_intent(query) uses rule-based heuristics:

Condition	Intent
Empty or whitespace	Keyword
Surrounded by quotes (`"..."` or `'...'`)	Keyword
Contains FTS5 operators (`AND`, `OR`, `NOT`, `NEAR`) (case-sensitive)	Keyword
Contains a date pattern (`YYYY-MM-DD` or `YYYY/MM/DD`)	Keyword
1–2 words	Keyword
Single hyphenated lowercase token (`my-page-slug`)	Keyword
3+ words, no special patterns	Semantic

Known limitation: natural-language use of the words AND/OR/NOT (e.g., “Pros AND Cons”) is classified as Keyword because the detector treats uppercase AND/OR/NOT/NEAR as FTS5 boolean operators.

Similarity Query

SqliteEmbeddingRepository::query_similar() performs a brute-force scan:

Loads all non-deleted page embeddings (capped at 10,000 rows).
Computes cosine similarity between the query vector and each stored vector in Rust.
Filters results below MIN_SIMILARITY_THRESHOLD (0.3).
Sorts by descending score and truncates to limit.

This is O(n) over stored embeddings. The 10,000-row cap prevents unbounded memory usage; a warning is logged if the cap is reached. For workspace-scale datasets (< 10k pages) this is adequate.

RRF Merge

When both FTS5 and semantic results are available, they are merged using Reciprocal Rank Fusion (k=60):

RRF score = Σ 1 / (60 + rank_i)

Each page accumulates contributions from every list it appears in. Pages appearing in both lists are boosted above pages in only one. The final score field on merged results contains the RRF score (range ~0.01–0.03), not a cosine similarity or BM25 score. The FTS5 snippet is preserved when available; semantic results have no snippet.

Key Code Paths

Scenario	Entry Point
Page saved → embed queued	`EmbeddingManager::queue_embed_page()` in `apps/desktop/src-tauri/src/embedding.rs:106`
Workspace opened → bulk index	`EmbeddingManager::trigger_reindex()` in `apps/desktop/src-tauri/src/embedding.rs:120`
Search query dispatched	`SearchRouter::search()` in `crates/application/src/search/search_router.rs:141`
Intent classified	`SearchRouter::classify_intent()` in `crates/application/src/search/search_router.rs:94`
Single page embedded	`EmbeddingPipeline::embed_page()` in `crates/application/src/embedding/pipeline.rs:82`
Stale pages detected	`SqliteEmbeddingRepository::get_stale_pages()` in `crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:282`
Similarity query	`SqliteEmbeddingRepository::query_similar()` in `crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:168`
ONNX inference + pooling	`OnnxEmbeddingProvider::run_inference()` in `crates/infrastructure/onnx/src/provider.rs:111`

ADR-002: SQLite as the workspace storage backend. The page_embeddings table lives in the same inklings.db database as pages.
ADR-006: Block content is stored as Loro CRDT BLOBs. The embedding pipeline uses the materialized Block.content string (domain layer), not the raw CRDT binary.
ADR-007: MCP writes route through the Tauri app — embedding side effects (queue_embed_page) are triggered from the same Tauri command layer as user edits.

Enables semantic search. See also: Event Log System, Search System.

Previous
Command Palette Next
Event Log System

Was this page helpful?