Skip to content
Documentation GitHub
Data Flow

Embedding

How page content is vectorized and indexed for semantic search, covering both incremental and bulk paths.



Code: apps/desktop/src-tauri/src/side_effects.rs, apps/desktop/src-tauri/src/embedding.rs

After SaveBlockContentUseCase succeeds, WriteEffectCoordinator::on_block_content_saved calls:

task.events.try_send(EmbeddingEvent::EmbedPage { page_id })

try_send is non-blocking. If the channel is full (capacity 256), the event is silently dropped — the next save will re-queue it. This prevents backpressure from slow embedding from blocking the write path.

Code: apps/desktop/src-tauri/src/embedding.rs

On workspace open or after a model upgrade, EmbeddingEvent::IndexWorkspace is sent to the same bounded channel. The EmbeddingTask prioritizes bulk indexing: if both individual page events and an IndexWorkspace event appear in the same batch, the workspace index runs after all individual pages are processed.

Code: apps/desktop/src-tauri/src/embedding.rsEmbeddingTask::handle_batch

The EmbeddingTask accumulates events with a 2-second debounce window. Within a batch, page IDs from EmbedPage events are deduplicated via HashSet — if a page is saved 10 times within the debounce window, it is embedded only once.

ONNX inference is CPU-bound and can take 10-50ms per page. Running it on a Tokio async thread would block the executor. EmbeddingTask uses tokio::task::spawn_blocking to move all embedding work to a dedicated thread pool thread.

Code: crates/application/src/embedding/pipeline.rsEmbeddingPipeline::embed_page

The pipeline loads page content via PageRepository::get_text_content(page_id), which returns (title, Vec<String>) — the page title and block text contents. This method intentionally avoids loading content_loro BLOBs; only the materialized text columns are needed for embedding.

Text is assembled as:

{title}\n\n{block_text_1}\n\n{block_text_2}\n...

Pages with empty assembled text are skipped (no embedding stored).

6. ONNX Inference: Tokenize -> Infer -> Pool -> Normalize

Section titled “6. ONNX Inference: Tokenize -> Infer -> Pool -> Normalize”

Code: crates/infrastructure/onnx/src/provider.rs

OnnxEmbeddingProvider uses the snowflake-arctic-embed-m-v2.0 model with 768-dimensional output and a 512-token context window.

Pipeline for a single text:

  1. Tokenize: HuggingFace tokenizers crate truncates to 512 tokens and pads for batch alignment
  2. ONNX inference: Session runs with GraphOptimizationLevel::Level3 and 1 intra-thread (no parallelism per call — the Mutex ensures single-threaded session access)
  3. Pool: If the model outputs [batch, seq_len, hidden_size] (token-level), mean pooling over non-masked tokens is applied. If the model outputs [batch, hidden_size] (pre-pooled), pooling is skipped
  4. L2 normalize: The pooled vector is divided by its L2 norm so cosine similarity can be computed as a dot product

For batch indexing, embed_batch tokenizes all texts together and runs a single ONNX session call.

Code: crates/infrastructure/sqlite/src/workspace/embedding_repository.rs

The 768-dimensional Vec<f32> is serialized as a raw byte BLOB (f32 little-endian) and upserted into the page_embeddings table with (page_id, model_id, model_version) as the key. Existing rows are updated on conflict.

For bulk indexing, upsert_batch wraps all inserts in a single transaction.

Code: crates/application/src/embedding/pipeline.rsEmbeddingPipeline::index_workspace

The pipeline fetches stale pages in batches of 100 (pages missing an embedding for the current model_id+model_version). Each batch is chunked into groups of 16 (batch_size) for ONNX batch inference. Progress is reported via a FnMut(completed, total) callback published to a watch::Sender<IndexingStatus> channel visible to the UI.

Code: crates/application/src/search/search_router.rs

At search time, SearchRouter::classify_intent determines whether to run semantic search (queries of 3+ natural-language words). For semantic queries:

  1. The query text is embedded via the same OnnxEmbeddingProvider (5-15ms)
  2. All non-deleted page embeddings are loaded from SQLite (capped at 10,000 rows)
  3. Cosine similarity is computed in Rust as a dot product (vectors are pre-normalized)
  4. Results below MIN_SIMILARITY_THRESHOLD (0.3) are filtered out
  5. Semantic results are merged with FTS5 BM25 results via Reciprocal Rank Fusion (k=60)

FailureBehavior
Channel full (try_send fails)Event dropped; page embedded on next save
Page not found in get_text_contentwarn! logged; page skipped; no embedding stored
Empty page contentEmbedding skipped silently; no row upserted
ONNX model not downloadedProvider not constructed; SearchRouter falls back to FTS5 only
ONNX inference errorPipelineError::Embedding logged; page skipped during bulk; query falls back to FTS5
spawn_blocking panicserror! logged; TaskError returned; task continues processing next batch
SQLite upsert failurePipelineError::Repository logged; page skipped
Semantic search failure at query timewarn! logged; FTS5 results returned; no error to user

  • Embedding System — Model download, ONNX provider configuration, and page_embeddings schema
  • Search System — SearchRouter intent classification and RRF merge details
  • Write PathWriteEffectCoordinator triggers the embedding pipeline after every block save
  • Search Data Flow — Full search query flow from frontend to ranked results

Was this page helpful?