Skip to content
Documentation GitHub
Database Issues

SQLite Post-Migration Rust Backfill for Application-Generated Values

SQLite Post-Migration Rust Backfill for Application-Generated Values

Problem

Schema V002 adds ref_code TEXT NOT NULL DEFAULT '' columns to four tables. The ref_code values must be 11-char base62 nanoids generated by Rust’s nanoid crate. Pure SQL cannot generate these — hex(randomblob(N)) produces hex strings (wrong alphabet, wrong length), and SQLite has no built-in nanoid function.

Symptoms:

  • hex(randomblob(8)) placeholder UPDATE in migration produces 16-char hex strings, not 11-char base62
  • Unique constraint on ref_code would reject duplicate placeholders if multiple rows exist
  • Format mismatch between migration-generated and application-generated ref_codes

Root Cause

Application-layer identifier formats (nanoid with custom alphabet) cannot be replicated in pure SQL. The migration schema DDL and the data backfill have different capability requirements: DDL is SQL-native, but value generation needs the application runtime.

Solution

Split the migration into two phases:

Phase 1: Schema DDL (pure SQL, in migration file)

-- Add column with empty default (no constraint violation on existing rows)
ALTER TABLE pages ADD COLUMN ref_code TEXT NOT NULL DEFAULT '';
-- Partial unique index ignores empty strings during backfill window
CREATE UNIQUE INDEX idx_pages_ref_code ON pages(ref_code) WHERE ref_code != '';

Phase 2: Rust post-migration hook (after migrations.to_latest())

fn run_migrations(conn: &Connection) -> MigrationResult<()> {
let migrations = Migrations::new(vec![/* ... */]);
migrations.to_latest(conn)?;
// Post-migration backfill: generate proper ref_codes for empty rows
backfill_ref_codes(conn)?;
Ok(())
}
fn backfill_ref_codes(conn: &Connection) -> MigrationResult<()> {
for table in &["pages", "blocks", "bookmarks", "attachments"] {
backfill_ref_codes_for_table(conn, table)?;
}
Ok(())
}
fn backfill_ref_codes_for_table(conn: &Connection, table: &str) -> MigrationResult<()> {
let sql = format!("SELECT id FROM {} WHERE ref_code = ''", table);
let mut stmt = conn.prepare(&sql)?;
let ids: Vec<String> = stmt
.query_map([], |row| row.get::<_, String>(0))?
.collect::<Result<Vec<_>, _>>()?;
if ids.is_empty() {
return Ok(());
}
let update_sql = format!("UPDATE {} SET ref_code = ?1 WHERE id = ?2", table);
for id in &ids {
let ref_code = RefCode::generate().to_string();
conn.execute(&update_sql, params![ref_code, id])?;
}
Ok(())
}

Implementation Notes

  • The backfill is idempotent: only updates rows where ref_code = ''
  • Runs on every app startup (inside run_migrations), but no-ops after first backfill
  • The partial unique index (WHERE ref_code != '') prevents the empty default from conflicting across rows while still enforcing uniqueness for real values
  • Each ref_code is generated individually to ensure uniqueness (collision probability negligible at 65.5 bits entropy per value)

Prevention

Best Practices

  • If SQL can’t generate the value, don’t fake it in SQL. Use a post-migration hook in the application language.
  • Use DEFAULT '' + partial unique index as a two-phase pattern: DDL creates the column safely, application backfill populates it correctly.
  • Keep the backfill idempotent with a WHERE column = '' or WHERE column IS NULL guard so it’s safe to run on every startup.

When to Use This Pattern

  • Custom identifier formats (nanoid, ULID, KSUID with specific alphabets)
  • Values derived from application configuration (not available in SQL context)
  • CRDT snapshots or binary blobs that need application-layer serialization
  • Any value where the generation algorithm lives in application code

When NOT to Use This Pattern

  • Values SQL can generate natively (UUIDs via lower(hex(randomblob(16))), timestamps, sequential IDs)
  • Simple defaults (booleans, empty strings, zero values)

References

  • crates/infrastructure/sqlite/src/migrations/mod.rs — backfill implementation
  • crates/domain/src/identifiers.rs — RefCode::generate()
  • docs/solutions/database-issues/sql-migration-column-evolution-audit.md — related column audit pattern

Was this page helpful?