Embedding System
Status: Implemented Depends On: Page System
Overview
Section titled “Overview”The Embedding System generates and stores dense vector embeddings for workspace pages, enabling semantic similarity
search alongside the existing FTS5 keyword search. When a user types a natural-language query (3+ words, no boolean
operators), the SearchRouter automatically embeds the query and merges FTS5 and semantic results via Reciprocal Rank
Fusion (RRF). Keyword queries (short, quoted, or containing FTS5 operators) bypass embedding entirely and go directly to
FTS5.
Embeddings are computed locally using the snowflake-arctic-embed-m-v2.0 sentence transformer model (fp16), run via
ONNX Runtime. No data leaves the device.
Overview Diagram
Section titled “Overview Diagram”Architecture
Section titled “Architecture”Framework (Tauri) └── EmbeddingManager apps/desktop/src-tauri/src/embedding.rs │ Tokio background worker, bounded channel (256), debounce (2s) │Application ├── EmbeddingPipeline crates/application/src/embedding/pipeline.rs │ Orchestrates: provider + repos → generate → store ├── EmbeddingProvider (trait) crates/application/src/embedding/provider.rs ├── EmbeddingRepository (trait) crates/application/src/embedding/services.rs └── SearchRouter crates/application/src/search/search_router.rs Intent classification, FTS5 + semantic dispatch, RRF merge
Infrastructure ├── OnnxEmbeddingProvider crates/infrastructure/onnx/src/provider.rs │ snowflake-arctic-embed-m-v2.0 (fp16), ONNX Runtime, mean pooling, L2 normalize └── SqliteEmbeddingRepository crates/infrastructure/sqlite/src/workspace/embedding_repository.rs page_embeddings table, f32 BLOB, brute-force cosine similarityDependencies flow inward: Framework → Infrastructure → Application → Domain. The application layer sees only traits
(EmbeddingProvider, EmbeddingRepository); ONNX and SQLite details are confined to the infrastructure crates.
Pipeline
Section titled “Pipeline”Incremental (after page save)
Section titled “Incremental (after page save)”- Tauri command saves page content → calls
EmbeddingManager::queue_embed_page(page_id). - Manager sends
EmbedPagecommand to the background worker channel (bounded, 256 capacity). - Worker deduplicates via
HashSetand resets a 2-second debounce timer. - After the debounce fires,
process_pages()dispatches totokio::task::spawn_blocking. EmbeddingPipeline::embed_page()loads the page viaPageRepository, concatenates title + block text, callsEmbeddingProvider::embed(), then callsEmbeddingRepository::upsert().- Pages with empty text after trimming are silently skipped — no embedding stored.
Bulk indexing (workspace open / model upgrade)
Section titled “Bulk indexing (workspace open / model upgrade)”EmbeddingManager::trigger_reindex()sendsIndexWorkspaceto the worker.- Any pending debounced page embeds are flushed first.
run_bulk_index()dispatches tospawn_blocking.EmbeddingPipeline::index_workspace()loops:- Calls
EmbeddingRepository::get_stale_pages()up to 100 at a time. - Breaks into batches of 16 (configurable via
PipelineConfig::batch_size). - Calls
EmbeddingProvider::embed_batch()for each chunk. - Writes all results in a single transaction via
upsert_batch(). - Reports
(completed, total)progress via a callback → forwarded to thewatchchannel asIndexingStatus::Indexing { completed, total }.
- Calls
- When no more stale pages remain, status transitions to
IndexingStatus::Idle.
Model Details
Section titled “Model Details”| Property | Value |
|---|---|
| Model | snowflake-arctic-embed-m-v2.0 |
| Source | HuggingFace Snowflake/snowflake-arctic-embed-m-v2.0 |
| Format | ONNX fp16 (model_fp16.onnx, saved as model.onnx) |
| Runtime | ONNX Runtime via the ort crate |
| Output dimensions | 768 |
| Max input tokens | 512 (truncated, padded at construction time) |
| Post-processing | Mean pooling over token embeddings (masked), then L2 normalization |
| Model size | ~613 MB (fp16) |
The model and tokenizer are not bundled in the repository. Download them with:
./tools/dev/download-embedding-model.shFiles are placed in .data/models/ and symlinked into crates/infrastructure/onnx/models/ for unit tests. SHA-256
checksums are verified on download when configured.
The ONNX session is initialized once with GraphOptimizationLevel::Level3 and intra_threads=1. Both the session and
tokenizer are wrapped in Mutex for interior mutability across the Send + Sync trait bound.
Storage Format
Section titled “Storage Format”Embeddings are stored in the page_embeddings table inside the workspace SQLite database
({workspace}/.inklings/inklings.db):
page_embeddings ( page_id TEXT PRIMARY KEY, -- UUID, foreign key to pages.id model_id TEXT NOT NULL, -- e.g. "snowflake-arctic-embed-m-v2.0" model_version TEXT NOT NULL, -- e.g. "1.0.0" embedding BLOB NOT NULL, -- 768 f32 values, little-endian (3072 bytes) updated_at TEXT NOT NULL -- ISO 8601 timestamp)The embedding vector is stored as a raw little-endian f32 byte BLOB (768 floats × 4 bytes = 3072 bytes). Conversion
functions f32_slice_to_bytes and bytes_to_f32_vec in the repository handle serialization. There is no dependency on
the sqlite-vec extension — similarity is computed in Rust.
Stale Detection
Section titled “Stale Detection”A page’s embedding is considered stale if any of the following is true:
page_embeddings.page_id IS NULL— no embedding exists yet.page_embeddings.model_id != current_model_id— model has changed.page_embeddings.model_version != current_model_version— model was updated.pages.updated_at > page_embeddings.updated_at— page was modified after the embedding was computed.
The SQL query (get_stale_pages) uses a LEFT JOIN with these conditions. Soft-deleted pages (is_deleted = 1) are
excluded.
After a model upgrade, delete_all() can be called to wipe all embeddings, then trigger_reindex() rebuilds from
scratch.
Search Integration
Section titled “Search Integration”SearchRouter
Section titled “SearchRouter”SearchRouter (crates/application/src/search/search_router.rs) is the single entry point for all page search. It is
constructed with a PageRepository, an EmbeddingRepository, and an optional EmbeddingProvider. It is cached in
AppState as search_router.
SearchRouter::search(workspace_path, query, limit) │ ├─ classify_intent(query) │ Keyword → FTS5 only │ Semantic → FTS5 + semantic + RRF merge │ ├─ Always: PageRepository::search() (FTS5) │ └─ If Semantic AND provider present: EmbeddingProvider::embed(query) → query vector EmbeddingRepository::query_similar() → Vec<SimilarPage> merge_rrf(fts_results, semantic_results, limit)If the embedding provider is None (model not yet downloaded or loaded), all queries fall back to FTS5 only. If
embedding or similarity search fails, the router logs a warning and returns FTS5 results.
Intent Classification
Section titled “Intent Classification”SearchRouter::classify_intent(query) uses rule-based heuristics:
| Condition | Intent |
|---|---|
| Empty or whitespace | Keyword |
Surrounded by quotes ("..." or '...') | Keyword |
Contains FTS5 operators (AND, OR, NOT, NEAR) (case-sensitive) | Keyword |
Contains a date pattern (YYYY-MM-DD or YYYY/MM/DD) | Keyword |
| 1–2 words | Keyword |
Single hyphenated lowercase token (my-page-slug) | Keyword |
| 3+ words, no special patterns | Semantic |
Known limitation: natural-language use of the words AND/OR/NOT (e.g., “Pros AND Cons”) is classified as Keyword
because the detector treats uppercase AND/OR/NOT/NEAR as FTS5 boolean operators.
Similarity Query
Section titled “Similarity Query”SqliteEmbeddingRepository::query_similar() performs a brute-force scan:
- Loads all non-deleted page embeddings (capped at 10,000 rows).
- Computes cosine similarity between the query vector and each stored vector in Rust.
- Filters results below
MIN_SIMILARITY_THRESHOLD(0.3). - Sorts by descending score and truncates to
limit.
This is O(n) over stored embeddings. The 10,000-row cap prevents unbounded memory usage; a warning is logged if the cap is reached. For workspace-scale datasets (< 10k pages) this is adequate.
RRF Merge
Section titled “RRF Merge”When both FTS5 and semantic results are available, they are merged using Reciprocal Rank Fusion (k=60):
RRF score = Σ 1 / (60 + rank_i)Each page accumulates contributions from every list it appears in. Pages appearing in both lists are boosted above pages
in only one. The final score field on merged results contains the RRF score (range ~0.01–0.03), not a cosine
similarity or BM25 score. The FTS5 snippet is preserved when available; semantic results have no snippet.
Key Code Paths
Section titled “Key Code Paths”| Scenario | Entry Point |
|---|---|
| Page saved → embed queued | EmbeddingManager::queue_embed_page() in apps/desktop/src-tauri/src/embedding.rs:106 |
| Workspace opened → bulk index | EmbeddingManager::trigger_reindex() in apps/desktop/src-tauri/src/embedding.rs:120 |
| Search query dispatched | SearchRouter::search() in crates/application/src/search/search_router.rs:141 |
| Intent classified | SearchRouter::classify_intent() in crates/application/src/search/search_router.rs:94 |
| Single page embedded | EmbeddingPipeline::embed_page() in crates/application/src/embedding/pipeline.rs:82 |
| Stale pages detected | SqliteEmbeddingRepository::get_stale_pages() in crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:282 |
| Similarity query | SqliteEmbeddingRepository::query_similar() in crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:168 |
| ONNX inference + pooling | OnnxEmbeddingProvider::run_inference() in crates/infrastructure/onnx/src/provider.rs:111 |
Related ADRs
Section titled “Related ADRs”- ADR-002: SQLite as the workspace storage backend. The
page_embeddingstable lives in the sameinklings.dbdatabase as pages. - ADR-006: Block content is stored as Loro CRDT BLOBs. The embedding pipeline uses the materialized
Block.contentstring (domain layer), not the raw CRDT binary. - ADR-007: MCP writes route through the Tauri app — embedding side effects (queue_embed_page) are triggered from the same Tauri command layer as user edits.
Enables semantic search. See also: Event Log System, Search System.
Was this page helpful?
Thanks for your feedback!