SQLite Multi-Database Checkpoint Patterns
Note: The checkpoint system described in this document was removed in Wave 2 (replaced by event log + named bookmarks). The multi-database transaction patterns documented here remain valid as institutional knowledge for any future multi-DB operations. See the Event Log System documentation in
apps/codex/src/content/docs/systems/platform/event-log-system.mdxfor the replacement.
title: “SQLite Multi-Database Checkpoint Patterns” category: architecture created: 2026-02-12 symptoms:
-
Orphaned snapshot files after crash during checkpoint creation
-
Partial page metadata restore on error
-
Checkpoint delete leaves inconsistent metadata tags:
-
sqlite
-
checkpoint
-
transactions
-
file-swap
-
durability
-
multi-database related:
-
apps/codex/src/content/docs/systems/platform/event-log-system.mdx
-
docs/solutions/patterns/atomic-settings-persistence.md
-
docs/ADR/002-sqlite-storage.md
SQLite Multi-Database Checkpoint Patterns
Problem
The checkpoint system operates across two separate SQLite databases (inklings.db for live data, metadata.db for
checkpoint index) plus snapshot .db files on disk. Multi-step operations spanning these resources need explicit
transaction and ordering strategies to maintain consistency.
Symptoms when not handled:
-
Orphaned snapshot files (backup succeeds, metadata insert fails)
-
Partial metadata restore (some fields applied, error on later field)
-
Metadata references deleted files after incorrect delete ordering
Root Cause
Each failure mode stems from the same root: multi-resource operations without atomic semantics. SQLite transactions only cover a single database connection, so operations spanning multiple databases or the filesystem need explicit choreography.
Solution
Pattern 1: Transaction + Cleanup for Create Operations
When creating a resource that spans filesystem + database:
// 1. Create the file first (can be cleaned up)let snapshot_path = create_snapshot(workspace_path)?;
// 2. Write metadata in a transactionlet mut meta_conn = open_metadata_db(workspace_path)?;let result = (|| -> Result<()> { let tx = meta_conn.transaction()?; tx.execute("INSERT INTO checkpoints ...", params![...])?; tx.commit()?; Ok(())})();
// 3. On metadata failure, clean up the fileif let Err(e) = result { let _ = std::fs::remove_file(&snapshot_path); return Err(e);}Principle: File creation is the “tentative” operation. Metadata commit is the “confirm” operation. Orphaned files are harmless; orphaned metadata is confusing.
Pattern 2: Transaction Wrapping for Multi-Update Restores
When applying multiple related changes to the same database:
let tx = conn.transaction()?;for (field, value) in &field_values { let sql: &str = match field.as_str() { "title" => "UPDATE pages SET title = ?1 WHERE id = ?2", "slug" => "UPDATE pages SET slug = ?1 WHERE id = ?2", // Static SQL per field — never interpolate column names _ => continue, }; tx.execute(sql, params![value, page_id])?;}tx.commit()?; // All fields restore or none doPrinciple: Use static SQL per field (never format!("{field}")) and wrap in a transaction for atomic
apply-or-rollback semantics.
Pattern 3: Metadata-First Delete Ordering
When deleting a resource spanning metadata + filesystem:
// 1. Delete metadata FIRST (this is the "authoritative" delete)meta_conn.execute("DELETE FROM checkpoints WHERE id = ?1", [id])?;
// 2. Delete file SECOND (best-effort; orphan is harmless)if snapshot_path.exists() { if let Err(e) = std::fs::remove_file(&snapshot_path) { tracing::warn!(error = %e, "File delete failed; reconciliation will clean up"); }}Principle: Delete the metadata reference first. If the file delete fails, the file becomes an orphan — invisible to the application and cleanable by reconciliation. The reverse (file first) leaves metadata pointing at a missing file.
Pattern 4: Process Coordination for File Swap
When replacing a live database file that other processes may be reading:
// SAFETY: The caller MUST ensure no other processes hold open// connections to inklings.db; rename during active reads risks// stale FDs or silent data corruption.
// 1. Flush WAL — ensure all pending writes are in the main DB file{ let conn = Connection::open(&db_path)?; conn.pragma_update(None, "wal_checkpoint", "TRUNCATE")?; // Connection drops here — no open handles during file swap}
// ── File swap: no database connections may be open past this point ──
// 2. Copy snapshot to temp file (same filesystem for atomic rename)std::fs::copy(&snapshot_path, &temp_path)?;
// 3. Atomic rename: temp file → inklings.dbstd::fs::rename(&temp_path, &db_path)?;
// 4. Clean up stale WAL/SHM from the replaced databasePrinciple: WAL flush → drop all connections → file swap. The ordering is critical — any connection open during the rename sees a stale file descriptor.
Pattern 5: Reconciliation as Safety Net
Design the system to detect and repair drift between metadata and filesystem:
fn reconcile_metadata(workspace_path: &Path) -> ReconciliationReport { // 1. Remove orphaned metadata (entries with no matching .db file) // 2. Discover unindexed snapshots (files with no metadata entry) // — read self-describing _checkpoint_meta from each snapshot // 3. Update stale entries (size mismatch between metadata and file)}Run reconciliation on workspace open (background, fire-and-forget). This makes the system self-healing — any crash or partial failure during create/delete/restore is eventually repaired.
Principle: metadata.db is a cache, not source of truth. Self-describing snapshots (each .db contains a
_checkpoint_meta table) enable lossless reconstruction.
Prevention
Best Practices
- Always identify multi-resource boundaries (DB + filesystem, DB + DB) before implementing
- Use transactions for multi-row updates within a single database
- Use cleanup-on-failure for cross-resource operations
- Order deletes: metadata first, filesystem second
- Document process synchronization contracts with SAFETY comments
- Build reconciliation as a first-class operation, not an afterthought
Warning Signs
- Any operation that writes to both a database and the filesystem
- Any file rename/swap where other processes may hold open file descriptors
- Any query that constructs SQL dynamically from database values (use static SQL instead)
- Any multi-row update without a wrapping transaction
References
- Feature branch:
feat/ink-55-checkpoint-system - Commits:
b95b31c..dd92eb8(8 commits) - System doc:
apps/codex/src/content/docs/systems/platform/event-log-system.mdx - Related:
docs/solutions/patterns/atomic-settings-persistence.md
Permission Guard: Single Source of Truth Pattern Next
Sync Engine Safety: Cursor Advancement, Poison Pill Recovery, and Self-Update Prevention
Was this page helpful?
Thanks for your feedback!