Embedding
How page content is vectorized and indexed for semantic search, covering both incremental and bulk paths.
Overview
Section titled “Overview”Step-by-Step Details
Section titled “Step-by-Step Details”1. Trigger: Incremental Embed After Save
Section titled “1. Trigger: Incremental Embed After Save”Code: apps/desktop/src-tauri/src/side_effects.rs, apps/desktop/src-tauri/src/embedding.rs
After SaveBlockContentUseCase succeeds, WriteEffectCoordinator::on_block_content_saved calls:
task.events.try_send(EmbeddingEvent::EmbedPage { page_id })try_send is non-blocking. If the channel is full (capacity 256), the event is silently dropped — the next save will
re-queue it. This prevents backpressure from slow embedding from blocking the write path.
2. Trigger: Bulk Workspace Index
Section titled “2. Trigger: Bulk Workspace Index”Code: apps/desktop/src-tauri/src/embedding.rs
On workspace open or after a model upgrade, EmbeddingEvent::IndexWorkspace is sent to the same bounded channel. The
EmbeddingTask prioritizes bulk indexing: if both individual page events and an IndexWorkspace event appear in the
same batch, the workspace index runs after all individual pages are processed.
3. Debounce and Deduplication
Section titled “3. Debounce and Deduplication”Code: apps/desktop/src-tauri/src/embedding.rs — EmbeddingTask::handle_batch
The EmbeddingTask accumulates events with a 2-second debounce window. Within a batch, page IDs from EmbedPage events
are deduplicated via HashSet — if a page is saved 10 times within the debounce window, it is embedded only once.
4. spawn_blocking for CPU Work
Section titled “4. spawn_blocking for CPU Work”ONNX inference is CPU-bound and can take 10-50ms per page. Running it on a Tokio async thread would block the executor.
EmbeddingTask uses tokio::task::spawn_blocking to move all embedding work to a dedicated thread pool thread.
5. EmbeddingPipeline: Load Text Content
Section titled “5. EmbeddingPipeline: Load Text Content”Code: crates/application/src/embedding/pipeline.rs — EmbeddingPipeline::embed_page
The pipeline loads page content via PageRepository::get_text_content(page_id), which returns (title, Vec<String>) —
the page title and block text contents. This method intentionally avoids loading content_loro BLOBs; only the
materialized text columns are needed for embedding.
Text is assembled as:
{title}\n\n{block_text_1}\n\n{block_text_2}\n...Pages with empty assembled text are skipped (no embedding stored).
6. ONNX Inference: Tokenize -> Infer -> Pool -> Normalize
Section titled “6. ONNX Inference: Tokenize -> Infer -> Pool -> Normalize”Code: crates/infrastructure/onnx/src/provider.rs
OnnxEmbeddingProvider uses the snowflake-arctic-embed-m-v2.0 model with 768-dimensional output and a 512-token
context window.
Pipeline for a single text:
- Tokenize: HuggingFace
tokenizerscrate truncates to 512 tokens and pads for batch alignment - ONNX inference: Session runs with
GraphOptimizationLevel::Level3and 1 intra-thread (no parallelism per call — the Mutex ensures single-threaded session access) - Pool: If the model outputs
[batch, seq_len, hidden_size](token-level), mean pooling over non-masked tokens is applied. If the model outputs[batch, hidden_size](pre-pooled), pooling is skipped - L2 normalize: The pooled vector is divided by its L2 norm so cosine similarity can be computed as a dot product
For batch indexing, embed_batch tokenizes all texts together and runs a single ONNX session call.
7. Upsert to SQLite
Section titled “7. Upsert to SQLite”Code: crates/infrastructure/sqlite/src/workspace/embedding_repository.rs
The 768-dimensional Vec<f32> is serialized as a raw byte BLOB (f32 little-endian) and upserted into the
page_embeddings table with (page_id, model_id, model_version) as the key. Existing rows are updated on conflict.
For bulk indexing, upsert_batch wraps all inserts in a single transaction.
8. Bulk Index: Stale Page Loop
Section titled “8. Bulk Index: Stale Page Loop”Code: crates/application/src/embedding/pipeline.rs — EmbeddingPipeline::index_workspace
The pipeline fetches stale pages in batches of 100 (pages missing an embedding for the current
model_id+model_version). Each batch is chunked into groups of 16 (batch_size) for ONNX batch inference. Progress
is reported via a FnMut(completed, total) callback published to a watch::Sender<IndexingStatus> channel visible to
the UI.
9. Search Integration
Section titled “9. Search Integration”Code: crates/application/src/search/search_router.rs
At search time, SearchRouter::classify_intent determines whether to run semantic search (queries of 3+
natural-language words). For semantic queries:
- The query text is embedded via the same
OnnxEmbeddingProvider(5-15ms) - All non-deleted page embeddings are loaded from SQLite (capped at 10,000 rows)
- Cosine similarity is computed in Rust as a dot product (vectors are pre-normalized)
- Results below
MIN_SIMILARITY_THRESHOLD(0.3) are filtered out - Semantic results are merged with FTS5 BM25 results via Reciprocal Rank Fusion (k=60)
Error Handling
Section titled “Error Handling”| Failure | Behavior |
|---|---|
Channel full (try_send fails) | Event dropped; page embedded on next save |
Page not found in get_text_content | warn! logged; page skipped; no embedding stored |
| Empty page content | Embedding skipped silently; no row upserted |
| ONNX model not downloaded | Provider not constructed; SearchRouter falls back to FTS5 only |
| ONNX inference error | PipelineError::Embedding logged; page skipped during bulk; query falls back to FTS5 |
spawn_blocking panics | error! logged; TaskError returned; task continues processing next batch |
| SQLite upsert failure | PipelineError::Repository logged; page skipped |
| Semantic search failure at query time | warn! logged; FTS5 results returned; no error to user |
Related
Section titled “Related”- Embedding System — Model download, ONNX provider configuration, and
page_embeddingsschema - Search System — SearchRouter intent classification and RRF merge details
- Write Path —
WriteEffectCoordinatortriggers the embedding pipeline after every block save - Search Data Flow — Full search query flow from frontend to ranked results
Was this page helpful?
Thanks for your feedback!