Skip to content
Documentation GitHub
Architecture

SQLite Multi-Database Checkpoint Patterns

Note: The checkpoint system described in this document was removed in Wave 2 (replaced by event log + named bookmarks). The multi-database transaction patterns documented here remain valid as institutional knowledge for any future multi-DB operations. See the Event Log System documentation in apps/codex/src/content/docs/systems/platform/event-log-system.mdx for the replacement.


title: “SQLite Multi-Database Checkpoint Patterns” category: architecture created: 2026-02-12 symptoms:

  • Orphaned snapshot files after crash during checkpoint creation

  • Partial page metadata restore on error

  • Checkpoint delete leaves inconsistent metadata tags:

  • sqlite

  • checkpoint

  • transactions

  • file-swap

  • durability

  • multi-database related:

  • apps/codex/src/content/docs/systems/platform/event-log-system.mdx

  • docs/solutions/patterns/atomic-settings-persistence.md

  • docs/ADR/002-sqlite-storage.md


SQLite Multi-Database Checkpoint Patterns

Problem

The checkpoint system operates across two separate SQLite databases (inklings.db for live data, metadata.db for checkpoint index) plus snapshot .db files on disk. Multi-step operations spanning these resources need explicit transaction and ordering strategies to maintain consistency.

Symptoms when not handled:

  • Orphaned snapshot files (backup succeeds, metadata insert fails)

  • Partial metadata restore (some fields applied, error on later field)

  • Metadata references deleted files after incorrect delete ordering

Root Cause

Each failure mode stems from the same root: multi-resource operations without atomic semantics. SQLite transactions only cover a single database connection, so operations spanning multiple databases or the filesystem need explicit choreography.

Solution

Pattern 1: Transaction + Cleanup for Create Operations

When creating a resource that spans filesystem + database:

// 1. Create the file first (can be cleaned up)
let snapshot_path = create_snapshot(workspace_path)?;
// 2. Write metadata in a transaction
let mut meta_conn = open_metadata_db(workspace_path)?;
let result = (|| -> Result<()> {
let tx = meta_conn.transaction()?;
tx.execute("INSERT INTO checkpoints ...", params![...])?;
tx.commit()?;
Ok(())
})();
// 3. On metadata failure, clean up the file
if let Err(e) = result {
let _ = std::fs::remove_file(&snapshot_path);
return Err(e);
}

Principle: File creation is the “tentative” operation. Metadata commit is the “confirm” operation. Orphaned files are harmless; orphaned metadata is confusing.

Pattern 2: Transaction Wrapping for Multi-Update Restores

When applying multiple related changes to the same database:

let tx = conn.transaction()?;
for (field, value) in &field_values {
let sql: &str = match field.as_str() {
"title" => "UPDATE pages SET title = ?1 WHERE id = ?2",
"slug" => "UPDATE pages SET slug = ?1 WHERE id = ?2",
// Static SQL per field — never interpolate column names
_ => continue,
};
tx.execute(sql, params![value, page_id])?;
}
tx.commit()?; // All fields restore or none do

Principle: Use static SQL per field (never format!("{field}")) and wrap in a transaction for atomic apply-or-rollback semantics.

Pattern 3: Metadata-First Delete Ordering

When deleting a resource spanning metadata + filesystem:

// 1. Delete metadata FIRST (this is the "authoritative" delete)
meta_conn.execute("DELETE FROM checkpoints WHERE id = ?1", [id])?;
// 2. Delete file SECOND (best-effort; orphan is harmless)
if snapshot_path.exists() {
if let Err(e) = std::fs::remove_file(&snapshot_path) {
tracing::warn!(error = %e, "File delete failed; reconciliation will clean up");
}
}

Principle: Delete the metadata reference first. If the file delete fails, the file becomes an orphan — invisible to the application and cleanable by reconciliation. The reverse (file first) leaves metadata pointing at a missing file.

Pattern 4: Process Coordination for File Swap

When replacing a live database file that other processes may be reading:

// SAFETY: The caller MUST ensure no other processes hold open
// connections to inklings.db; rename during active reads risks
// stale FDs or silent data corruption.
// 1. Flush WAL — ensure all pending writes are in the main DB file
{
let conn = Connection::open(&db_path)?;
conn.pragma_update(None, "wal_checkpoint", "TRUNCATE")?;
// Connection drops here — no open handles during file swap
}
// ── File swap: no database connections may be open past this point ──
// 2. Copy snapshot to temp file (same filesystem for atomic rename)
std::fs::copy(&snapshot_path, &temp_path)?;
// 3. Atomic rename: temp file → inklings.db
std::fs::rename(&temp_path, &db_path)?;
// 4. Clean up stale WAL/SHM from the replaced database

Principle: WAL flush → drop all connections → file swap. The ordering is critical — any connection open during the rename sees a stale file descriptor.

Pattern 5: Reconciliation as Safety Net

Design the system to detect and repair drift between metadata and filesystem:

fn reconcile_metadata(workspace_path: &Path) -> ReconciliationReport {
// 1. Remove orphaned metadata (entries with no matching .db file)
// 2. Discover unindexed snapshots (files with no metadata entry)
// — read self-describing _checkpoint_meta from each snapshot
// 3. Update stale entries (size mismatch between metadata and file)
}

Run reconciliation on workspace open (background, fire-and-forget). This makes the system self-healing — any crash or partial failure during create/delete/restore is eventually repaired.

Principle: metadata.db is a cache, not source of truth. Self-describing snapshots (each .db contains a _checkpoint_meta table) enable lossless reconstruction.

Prevention

Best Practices

  • Always identify multi-resource boundaries (DB + filesystem, DB + DB) before implementing
  • Use transactions for multi-row updates within a single database
  • Use cleanup-on-failure for cross-resource operations
  • Order deletes: metadata first, filesystem second
  • Document process synchronization contracts with SAFETY comments
  • Build reconciliation as a first-class operation, not an afterthought

Warning Signs

  • Any operation that writes to both a database and the filesystem
  • Any file rename/swap where other processes may hold open file descriptors
  • Any query that constructs SQL dynamically from database values (use static SQL instead)
  • Any multi-row update without a wrapping transaction

References

  • Feature branch: feat/ink-55-checkpoint-system
  • Commits: b95b31c..dd92eb8 (8 commits)
  • System doc: apps/codex/src/content/docs/systems/platform/event-log-system.mdx
  • Related: docs/solutions/patterns/atomic-settings-persistence.md

Was this page helpful?