Skip to content
Documentation GitHub
Platform

Embedding System

Status: Implemented Depends On: Page System


The Embedding System generates and stores dense vector embeddings for workspace pages, enabling semantic similarity search alongside the existing FTS5 keyword search. When a user types a natural-language query (3+ words, no boolean operators), the SearchRouter automatically embeds the query and merges FTS5 and semantic results via Reciprocal Rank Fusion (RRF). Keyword queries (short, quoted, or containing FTS5 operators) bypass embedding entirely and go directly to FTS5.

Embeddings are computed locally using the snowflake-arctic-embed-m-v2.0 sentence transformer model (fp16), run via ONNX Runtime. No data leaves the device.


Framework (Tauri)
└── EmbeddingManager apps/desktop/src-tauri/src/embedding.rs
│ Tokio background worker, bounded channel (256), debounce (2s)
Application
├── EmbeddingPipeline crates/application/src/embedding/pipeline.rs
│ Orchestrates: provider + repos → generate → store
├── EmbeddingProvider (trait) crates/application/src/embedding/provider.rs
├── EmbeddingRepository (trait) crates/application/src/embedding/services.rs
└── SearchRouter crates/application/src/search/search_router.rs
Intent classification, FTS5 + semantic dispatch, RRF merge
Infrastructure
├── OnnxEmbeddingProvider crates/infrastructure/onnx/src/provider.rs
│ snowflake-arctic-embed-m-v2.0 (fp16), ONNX Runtime, mean pooling, L2 normalize
└── SqliteEmbeddingRepository crates/infrastructure/sqlite/src/workspace/embedding_repository.rs
page_embeddings table, f32 BLOB, brute-force cosine similarity

Dependencies flow inward: Framework → Infrastructure → Application → Domain. The application layer sees only traits (EmbeddingProvider, EmbeddingRepository); ONNX and SQLite details are confined to the infrastructure crates.


  1. Tauri command saves page content → calls EmbeddingManager::queue_embed_page(page_id).
  2. Manager sends EmbedPage command to the background worker channel (bounded, 256 capacity).
  3. Worker deduplicates via HashSet and resets a 2-second debounce timer.
  4. After the debounce fires, process_pages() dispatches to tokio::task::spawn_blocking.
  5. EmbeddingPipeline::embed_page() loads the page via PageRepository, concatenates title + block text, calls EmbeddingProvider::embed(), then calls EmbeddingRepository::upsert().
  6. Pages with empty text after trimming are silently skipped — no embedding stored.

Bulk indexing (workspace open / model upgrade)

Section titled “Bulk indexing (workspace open / model upgrade)”
  1. EmbeddingManager::trigger_reindex() sends IndexWorkspace to the worker.
  2. Any pending debounced page embeds are flushed first.
  3. run_bulk_index() dispatches to spawn_blocking.
  4. EmbeddingPipeline::index_workspace() loops:
    • Calls EmbeddingRepository::get_stale_pages() up to 100 at a time.
    • Breaks into batches of 16 (configurable via PipelineConfig::batch_size).
    • Calls EmbeddingProvider::embed_batch() for each chunk.
    • Writes all results in a single transaction via upsert_batch().
    • Reports (completed, total) progress via a callback → forwarded to the watch channel as IndexingStatus::Indexing { completed, total }.
  5. When no more stale pages remain, status transitions to IndexingStatus::Idle.

PropertyValue
Modelsnowflake-arctic-embed-m-v2.0
SourceHuggingFace Snowflake/snowflake-arctic-embed-m-v2.0
FormatONNX fp16 (model_fp16.onnx, saved as model.onnx)
RuntimeONNX Runtime via the ort crate
Output dimensions768
Max input tokens512 (truncated, padded at construction time)
Post-processingMean pooling over token embeddings (masked), then L2 normalization
Model size~613 MB (fp16)

The model and tokenizer are not bundled in the repository. Download them with:

Terminal window
./tools/dev/download-embedding-model.sh

Files are placed in .data/models/ and symlinked into crates/infrastructure/onnx/models/ for unit tests. SHA-256 checksums are verified on download when configured.

The ONNX session is initialized once with GraphOptimizationLevel::Level3 and intra_threads=1. Both the session and tokenizer are wrapped in Mutex for interior mutability across the Send + Sync trait bound.


Embeddings are stored in the page_embeddings table inside the workspace SQLite database ({workspace}/.inklings/inklings.db):

page_embeddings (
page_id TEXT PRIMARY KEY, -- UUID, foreign key to pages.id
model_id TEXT NOT NULL, -- e.g. "snowflake-arctic-embed-m-v2.0"
model_version TEXT NOT NULL, -- e.g. "1.0.0"
embedding BLOB NOT NULL, -- 768 f32 values, little-endian (3072 bytes)
updated_at TEXT NOT NULL -- ISO 8601 timestamp
)

The embedding vector is stored as a raw little-endian f32 byte BLOB (768 floats × 4 bytes = 3072 bytes). Conversion functions f32_slice_to_bytes and bytes_to_f32_vec in the repository handle serialization. There is no dependency on the sqlite-vec extension — similarity is computed in Rust.


A page’s embedding is considered stale if any of the following is true:

  • page_embeddings.page_id IS NULL — no embedding exists yet.
  • page_embeddings.model_id != current_model_id — model has changed.
  • page_embeddings.model_version != current_model_version — model was updated.
  • pages.updated_at > page_embeddings.updated_at — page was modified after the embedding was computed.

The SQL query (get_stale_pages) uses a LEFT JOIN with these conditions. Soft-deleted pages (is_deleted = 1) are excluded.

After a model upgrade, delete_all() can be called to wipe all embeddings, then trigger_reindex() rebuilds from scratch.


SearchRouter (crates/application/src/search/search_router.rs) is the single entry point for all page search. It is constructed with a PageRepository, an EmbeddingRepository, and an optional EmbeddingProvider. It is cached in AppState as search_router.

SearchRouter::search(workspace_path, query, limit)
├─ classify_intent(query)
│ Keyword → FTS5 only
│ Semantic → FTS5 + semantic + RRF merge
├─ Always: PageRepository::search() (FTS5)
└─ If Semantic AND provider present:
EmbeddingProvider::embed(query) → query vector
EmbeddingRepository::query_similar() → Vec<SimilarPage>
merge_rrf(fts_results, semantic_results, limit)

If the embedding provider is None (model not yet downloaded or loaded), all queries fall back to FTS5 only. If embedding or similarity search fails, the router logs a warning and returns FTS5 results.

SearchRouter::classify_intent(query) uses rule-based heuristics:

ConditionIntent
Empty or whitespaceKeyword
Surrounded by quotes ("..." or '...')Keyword
Contains FTS5 operators (AND, OR, NOT, NEAR) (case-sensitive)Keyword
Contains a date pattern (YYYY-MM-DD or YYYY/MM/DD)Keyword
1–2 wordsKeyword
Single hyphenated lowercase token (my-page-slug)Keyword
3+ words, no special patternsSemantic

Known limitation: natural-language use of the words AND/OR/NOT (e.g., “Pros AND Cons”) is classified as Keyword because the detector treats uppercase AND/OR/NOT/NEAR as FTS5 boolean operators.

SqliteEmbeddingRepository::query_similar() performs a brute-force scan:

  • Loads all non-deleted page embeddings (capped at 10,000 rows).
  • Computes cosine similarity between the query vector and each stored vector in Rust.
  • Filters results below MIN_SIMILARITY_THRESHOLD (0.3).
  • Sorts by descending score and truncates to limit.

This is O(n) over stored embeddings. The 10,000-row cap prevents unbounded memory usage; a warning is logged if the cap is reached. For workspace-scale datasets (< 10k pages) this is adequate.

When both FTS5 and semantic results are available, they are merged using Reciprocal Rank Fusion (k=60):

RRF score = Σ 1 / (60 + rank_i)

Each page accumulates contributions from every list it appears in. Pages appearing in both lists are boosted above pages in only one. The final score field on merged results contains the RRF score (range ~0.01–0.03), not a cosine similarity or BM25 score. The FTS5 snippet is preserved when available; semantic results have no snippet.


ScenarioEntry Point
Page saved → embed queuedEmbeddingManager::queue_embed_page() in apps/desktop/src-tauri/src/embedding.rs:106
Workspace opened → bulk indexEmbeddingManager::trigger_reindex() in apps/desktop/src-tauri/src/embedding.rs:120
Search query dispatchedSearchRouter::search() in crates/application/src/search/search_router.rs:141
Intent classifiedSearchRouter::classify_intent() in crates/application/src/search/search_router.rs:94
Single page embeddedEmbeddingPipeline::embed_page() in crates/application/src/embedding/pipeline.rs:82
Stale pages detectedSqliteEmbeddingRepository::get_stale_pages() in crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:282
Similarity querySqliteEmbeddingRepository::query_similar() in crates/infrastructure/sqlite/src/workspace/embedding_repository.rs:168
ONNX inference + poolingOnnxEmbeddingProvider::run_inference() in crates/infrastructure/onnx/src/provider.rs:111

  • ADR-002: SQLite as the workspace storage backend. The page_embeddings table lives in the same inklings.db database as pages.
  • ADR-006: Block content is stored as Loro CRDT BLOBs. The embedding pipeline uses the materialized Block.content string (domain layer), not the raw CRDT binary.
  • ADR-007: MCP writes route through the Tauri app — embedding side effects (queue_embed_page) are triggered from the same Tauri command layer as user edits.

Enables semantic search. See also: Event Log System, Search System.

Was this page helpful?