Search
How a search query is classified, executed in parallel, and merged into ranked results.
Overview
Section titled “Overview”Step-by-Step Details
Section titled “Step-by-Step Details”1. Frontend Query
Section titled “1. Frontend Query”The user types in the search input. The React component debounces input and calls:
invoke<SearchResult[]>("search_pages", { query, limit: 20 })2. Tauri Command
Section titled “2. Tauri Command”Code: apps/desktop/src-tauri/src/commands/search.rs
The search_pages command validates the query string, resolves the workspace and permission guard (requires SearchUse
capability), then delegates to the SearchRouter cached in AppState:
let router = state.search_router.lock();let results = router.search(&guard, &query, limit)?;The router is constructed once during workspace open (start_embedding_pipeline) and shared as an Arc-wrapped
instance. It is also shared with the MCP server for MCP search tool calls.
3. Intent Classification
Section titled “3. Intent Classification”Code: crates/application/src/search/search_router.rs — SearchRouter::classify_intent
classify_intent applies rule-based heuristics with no ML model:
| Condition | Intent |
|---|---|
| Empty or whitespace | Keyword |
Surrounded by quotes ("..." or '...') | Keyword |
Contains FTS5 operators (AND, OR, NOT, NEAR) | Keyword |
Contains a date pattern (YYYY-MM-DD or YYYY/MM/DD) | Keyword |
| 1-2 words | Keyword |
Single hyphenated lowercase token (my-page-slug) | Keyword |
| 3+ words, none of the above | Semantic |
Classification is synchronous and executes in microseconds. It runs before any I/O.
Known limitation: Uppercase AND/OR/NOT in natural-language queries (e.g., “Pros AND Cons”) are classified as
Keyword because the detector treats them as FTS5 boolean operators.
4. FTS5 Search (always runs)
Section titled “4. FTS5 Search (always runs)”Code: crates/infrastructure/sqlite/src/workspace/page/search.rs — SqlitePageRepository::search
FTS5 search runs for every query regardless of intent. The pages_fts virtual table uses a 3-column contentless index:
SELECT pages.id, pages.slug, pages.title, pages.page_type, bm25(pages_fts, 10.0, 1.0, 5.0) as score, COALESCE(snippet(...), ...) as snippetFROM pages_ftsJOIN pages ON pages.id = pages_fts.rowidWHERE pages_fts MATCH ? AND pages.is_deleted = 0ORDER BY scoreLIMIT ?BM25 weights: title=10.0, content=1.0, tags=5.0. The tokenizer is unicode61 (default), which treats non-alphanumeric
characters as separators.
The score is negative (lower is better in BM25); the application takes the absolute value before returning.
5. Semantic Search (Semantic intent only)
Section titled “5. Semantic Search (Semantic intent only)”If the query is classified as Semantic and an embedding provider is available:
5a. Query Embedding
Section titled “5a. Query Embedding”Code: crates/infrastructure/onnx/src/provider.rs — OnnxEmbeddingProvider::embed
The query string is tokenized and run through the ONNX Runtime with the snowflake-arctic-embed-m-v2.0 model. The
output is a 768-dimensional Vec<f32> vector. This step takes approximately 5-15ms (model inference; dominates the
total search latency).
If embedding fails at runtime, SearchRouter logs a warn! and returns FTS5 results only — search never errors out due
to an embedding failure.
5b. Similarity Search
Section titled “5b. Similarity Search”Code: crates/infrastructure/sqlite/src/workspace/embedding_repository.rs —
SqliteEmbeddingRepository::query_similar
- Load all non-deleted page embeddings from the
page_embeddingstable (capped at 10,000 rows). - Compute cosine similarity in Rust against the query vector (not in SQL — no
sqlite-vecdependency). - Filter results below
MIN_SIMILARITY_THRESHOLD(0.3). - Sort descending by score, truncate to
limit.
This is O(n) over stored embeddings. The 10,000-row cap prevents unbounded memory usage. At workspace-scale datasets (fewer than 10k pages) this is adequate without an ANN index.
6. RRF Merge
Section titled “6. RRF Merge”Code: crates/application/src/search/search_router.rs — merge_rrf
When both FTS5 and semantic results are available, they are merged using Reciprocal Rank Fusion with k=60 (the standard value from the original RRF research paper):
RRF score for page P = sum over all lists L of: 1 / (60 + rank_of_P_in_L)Pages that appear in both lists accumulate contributions from both and rank above pages found in only one list. The FTS5 snippet is preserved for any page that has one (semantic results carry no snippet).
After merging, score on each SearchResult is the RRF score (typically 0.01-0.03), not a BM25 or cosine value.
7. Results Returned
Section titled “7. Results Returned”The command returns Vec<SearchResult> to the frontend via Tauri IPC. Each result contains:
| Field | Source |
|---|---|
id | Page UUID |
slug | Page slug |
title | Page title |
snippet | Highlighted excerpt from FTS5 (empty for semantic-only results) |
score | RRF score (hybrid) or BM25 absolute value (keyword-only) |
page_type | Page type from the pages table |
Graceful Degradation
Section titled “Graceful Degradation”| Scenario | Behavior |
|---|---|
| No embedding model downloaded | FTS5 only; no error to user |
| Embedding provider fails to embed query | FTS5 only; warn! logged |
| Similarity search fails | FTS5 only; warn! logged |
No SearchUse capability | InvalidOperation error |
SearchRouter Construction
Section titled “SearchRouter Construction”The SearchRouter is generic over its dependencies:
pub struct SearchRouter<PR: PageRepository, ER: EmbeddingRepository, EP: EmbeddingProvider> { page_repo: Arc<PR>, embedding_repo: Arc<ER>, embedding_provider: Option<Arc<EP>>, // None if model not loaded}It is constructed during workspace open and stored in AppState. The same instance is shared with McpState — MCP
search tool calls use the same router with no duplication.
Related
Section titled “Related”- Search System — Full search system reference including FTS5 configuration and intent classification details
- Embedding System — ONNX embedding provider, model download, and background embedding pipeline
- MCP System —
searchMCP tool delegates to the sameSearchRouter - Tag System — Tags are indexed as the third FTS5 column with BM25 weight 5.0
Was this page helpful?
Thanks for your feedback!