Architecture

SQLite Multi-Database Checkpoint Patterns

Note: The checkpoint system described in this document was removed in Wave 2 (replaced by event log + named bookmarks). The multi-database transaction patterns documented here remain valid as institutional knowledge for any future multi-DB operations. See the Event Log System documentation in apps/codex/src/content/docs/systems/platform/event-log-system.mdx for the replacement.

title: “SQLite Multi-Database Checkpoint Patterns” category: architecture created: 2026-02-12 symptoms:

Orphaned snapshot files after crash during checkpoint creation
Partial page metadata restore on error
Checkpoint delete leaves inconsistent metadata tags:
sqlite
checkpoint
transactions
file-swap
durability
multi-database related:
apps/codex/src/content/docs/systems/platform/event-log-system.mdx
docs/solutions/patterns/atomic-settings-persistence.md
docs/ADR/002-sqlite-storage.md

SQLite Multi-Database Checkpoint Patterns

Problem

The checkpoint system operates across two separate SQLite databases (inklings.db for live data, metadata.db for checkpoint index) plus snapshot .db files on disk. Multi-step operations spanning these resources need explicit transaction and ordering strategies to maintain consistency.

Symptoms when not handled:

Orphaned snapshot files (backup succeeds, metadata insert fails)
Partial metadata restore (some fields applied, error on later field)
Metadata references deleted files after incorrect delete ordering

Root Cause

Each failure mode stems from the same root: multi-resource operations without atomic semantics. SQLite transactions only cover a single database connection, so operations spanning multiple databases or the filesystem need explicit choreography.

Solution

Pattern 1: Transaction + Cleanup for Create Operations

When creating a resource that spans filesystem + database:

// 1. Create the file first (can be cleaned up)
let snapshot_path = create_snapshot(workspace_path)?;

// 2. Write metadata in a transaction
let mut meta_conn = open_metadata_db(workspace_path)?;
let result = (|| -> Result<()> {
    let tx = meta_conn.transaction()?;
    tx.execute("INSERT INTO checkpoints ...", params![...])?;
    tx.commit()?;
    Ok(())
})();

// 3. On metadata failure, clean up the file
if let Err(e) = result {
    let _ = std::fs::remove_file(&snapshot_path);
    return Err(e);
}

Principle: File creation is the “tentative” operation. Metadata commit is the “confirm” operation. Orphaned files are harmless; orphaned metadata is confusing.

Pattern 2: Transaction Wrapping for Multi-Update Restores

When applying multiple related changes to the same database:

let tx = conn.transaction()?;
for (field, value) in &field_values {
    let sql: &str = match field.as_str() {
        "title" => "UPDATE pages SET title = ?1 WHERE id = ?2",
        "slug" => "UPDATE pages SET slug = ?1 WHERE id = ?2",
        // Static SQL per field — never interpolate column names
        _ => continue,
    };
    tx.execute(sql, params![value, page_id])?;
}
tx.commit()?;  // All fields restore or none do

Principle: Use static SQL per field (never format!("{field}")) and wrap in a transaction for atomic apply-or-rollback semantics.

Pattern 3: Metadata-First Delete Ordering

When deleting a resource spanning metadata + filesystem:

// 1. Delete metadata FIRST (this is the "authoritative" delete)
meta_conn.execute("DELETE FROM checkpoints WHERE id = ?1", [id])?;

// 2. Delete file SECOND (best-effort; orphan is harmless)
if snapshot_path.exists() {
    if let Err(e) = std::fs::remove_file(&snapshot_path) {
        tracing::warn!(error = %e, "File delete failed; reconciliation will clean up");
    }
}

Principle: Delete the metadata reference first. If the file delete fails, the file becomes an orphan — invisible to the application and cleanable by reconciliation. The reverse (file first) leaves metadata pointing at a missing file.

Pattern 4: Process Coordination for File Swap

When replacing a live database file that other processes may be reading:

// SAFETY: The caller MUST ensure no other processes hold open
// connections to inklings.db; rename during active reads risks
// stale FDs or silent data corruption.

// 1. Flush WAL — ensure all pending writes are in the main DB file
{
    let conn = Connection::open(&db_path)?;
    conn.pragma_update(None, "wal_checkpoint", "TRUNCATE")?;
    // Connection drops here — no open handles during file swap
}

// ── File swap: no database connections may be open past this point ──

// 2. Copy snapshot to temp file (same filesystem for atomic rename)
std::fs::copy(&snapshot_path, &temp_path)?;

// 3. Atomic rename: temp file → inklings.db
std::fs::rename(&temp_path, &db_path)?;

// 4. Clean up stale WAL/SHM from the replaced database

Principle: WAL flush → drop all connections → file swap. The ordering is critical — any connection open during the rename sees a stale file descriptor.

Pattern 5: Reconciliation as Safety Net

Design the system to detect and repair drift between metadata and filesystem:

fn reconcile_metadata(workspace_path: &Path) -> ReconciliationReport {
    // 1. Remove orphaned metadata (entries with no matching .db file)
    // 2. Discover unindexed snapshots (files with no metadata entry)
    //    — read self-describing _checkpoint_meta from each snapshot
    // 3. Update stale entries (size mismatch between metadata and file)
}

Run reconciliation on workspace open (background, fire-and-forget). This makes the system self-healing — any crash or partial failure during create/delete/restore is eventually repaired.

Principle: metadata.db is a cache, not source of truth. Self-describing snapshots (each .db contains a _checkpoint_meta table) enable lossless reconstruction.

Prevention

Best Practices

Always identify multi-resource boundaries (DB + filesystem, DB + DB) before implementing
Use transactions for multi-row updates within a single database
Use cleanup-on-failure for cross-resource operations
Order deletes: metadata first, filesystem second
Document process synchronization contracts with SAFETY comments
Build reconciliation as a first-class operation, not an afterthought

Warning Signs

Any operation that writes to both a database and the filesystem
Any file rename/swap where other processes may hold open file descriptors
Any query that constructs SQL dynamically from database values (use static SQL instead)
Any multi-row update without a wrapping transaction

References

Feature branch: feat/ink-55-checkpoint-system
Commits: b95b31c..dd92eb8 (8 commits)
System doc: apps/codex/src/content/docs/systems/platform/event-log-system.mdx
Related: docs/solutions/patterns/atomic-settings-persistence.md

Previous
Permission Guard: Single Source of Truth Pattern Next
Sync Engine Safety: Cursor Advancement, Poison Pill Recovery, and Self-Update Prevention

Was this page helpful?