Skip to content
Documentation GitHub
Database Issues

SQLite Queue Full-Table Scan Anti-Pattern

SQLite Queue Full-Table Scan Anti-Pattern

Problem

When marking individual queue entries as “synced” after successful push/pull, the code loaded the entire pending queue into memory and filtered client-side to find matching IDs. This turned an O(1) lookup into O(n) per item, and O(n^2) per sync cycle.

Symptoms:

  • Sync cycle time grows quadratically with queue depth
  • After prolonged offline period (1000+ pending changes), sync becomes visibly slow
  • Memory usage spikes during sync as entire queue is loaded per mark operation

Investigation

How it manifested

CompositePageSyncOperations::mark_metadata_synced was called once per successfully pushed metadata field:

fn mark_metadata_synced(&self, workspace_path: &Path, page_id: &str, field: &str) -> SyncResult<()> {
let pending = self.metadata_queue.dequeue_metadata(&db, usize::MAX); // Load ALL pending
let ids: Vec<i64> = pending.iter()
.filter(|e| e.page_id == page_id && e.field == field) // Filter in memory
.map(|e| e.id)
.collect();
self.metadata_queue.mark_metadata_synced(&ids, &db)?;
Ok(())
}

With 50 pending metadata changes, pushing them one-by-one meant 50 full-table scans.

Why it happened

The queue repository initially only had enqueue, dequeue(limit), and mark_synced(ids). There was no method to look up IDs by natural key (page_id + field). The implementer worked around the missing API by loading everything and filtering.

Root Cause

Missing targeted query method on the queue repository trait. The repository API was designed for batch dequeue but not for targeted lookup by business key.

Solution

1. Add targeted lookup methods to queue traits

// In queue_services.rs
pub trait MetadataSyncQueueRepository: Send + Sync {
// ... existing methods ...
fn find_pending_ids_by_key(
&self,
workspace_path: &Path,
page_id: &str,
field: &str,
) -> SyncResult<Vec<i64>>;
}
pub trait DeletionSyncQueueRepository: Send + Sync {
// ... existing methods ...
fn find_pending_ids_by_page(
&self,
workspace_path: &Path,
page_id: &str,
) -> SyncResult<Vec<i64>>;
}

2. Implement with direct SQL

fn find_pending_ids_by_key(&self, workspace_path: &Path, page_id: &str, field: &str) -> SyncResult<Vec<i64>> {
let db = Self::get_db(workspace_path)?;
let mut stmt = db.prepare(
"SELECT id FROM metadata_sync_queue WHERE page_id = ?1 AND field = ?2 AND synced_at IS NULL"
)?;
let ids = stmt.query_map(params![page_id, field], |row| row.get(0))?
.collect::<Result<Vec<i64>, _>>()?;
Ok(ids)
}

3. Add supporting indexes

CREATE INDEX IF NOT EXISTS idx_metadata_sync_queue_page_field
ON metadata_sync_queue(page_id, field) WHERE synced_at IS NULL;
CREATE INDEX IF NOT EXISTS idx_deletion_sync_queue_page
ON deletion_sync_queue(page_id) WHERE synced_at IS NULL;

4. Update mark_synced to use targeted lookup

fn mark_metadata_synced(&self, workspace_path: &Path, page_id: &str, field: &str) -> SyncResult<()> {
let ids = self.metadata_queue.find_pending_ids_by_key(workspace_path, page_id, field)?;
if !ids.is_empty() {
self.metadata_queue.mark_metadata_synced(&ids, workspace_path)?;
}
Ok(())
}

Prevention

Best Practices

  • Design queue APIs with lookup-by-key from the start. Any queue that supports “mark item X as processed” needs a way to find X by its natural key, not just by internal ID.
  • Never load unbounded data to filter in memory. If you need a subset, query for just that subset.
  • Add indexes for WHERE clauses in queue tables. Partial indexes (WHERE synced_at IS NULL) are especially valuable for queues where most rows are already processed.
  • Profile sync cycle time with realistic queue depths (100+, 1000+ entries) during development.

Warning Signs

  • dequeue_*(usize::MAX) or equivalent “load everything” calls
  • In-memory .filter() on database query results
  • Queue operations that slow down linearly with queue depth
  • A mark_synced method that doesn’t accept the natural key directly

References

  • INK-190: Fix O(n^2) mark_synced queue scanning
  • INK-200: Add missing indexes on sync queue tables
  • Commits: ea37b73, 02f2458

Was this page helpful?