Skip to content
Documentation GitHub
Database Issues

Prevention Strategies: FK Constraint Import Failure

Problem Summary

When batch-inserting pages with parent-child relationships, SQLite’s Foreign Key (FK) constraints require that parent records exist before child records can reference them. Inserting pages in arbitrary order causes constraint violations that fail the entire batch operation.

Example of Failure:

INSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...)
-- ERROR: FOREIGN KEY constraint failed
-- The parent page with slug='parent' doesn't exist yet

Prevention Strategy 1: Topological Sorting

How It Works

Topological sorting reorders the batch so parents are always inserted before their children, satisfying FK constraints.

Algorithm:

  1. Pre-compute all page slugs in the batch
  2. Iterate until all pages are ordered
  3. For each iteration, insert any page whose parent is:
    • None (root page), OR
    • Already in the ordered results, OR
    • Not in this batch (treat as external/root)
  4. If no progress is made, detect cycles and insert remaining as roots

Implementation Example

/// Returns indices in topologically sorted order (parents before children)
fn topological_sort_indices(pages: &[(Page, Option<String>)]) -> Vec<usize> {
let slugs: Vec<String> = pages.iter().map(|(p, _)| p.slug()).collect();
let batch_slugs: HashSet<&str> = slugs.iter().map(|s| s.as_str()).collect();
let mut sorted_indices: Vec<usize> = Vec::with_capacity(pages.len());
let mut remaining: Vec<usize> = (0..pages.len()).collect();
let mut inserted_slugs: HashSet<&str> = HashSet::new();
while !remaining.is_empty() {
let before_len = remaining.len();
let mut still_remaining = Vec::new();
for idx in remaining {
let parent_slug = pages[idx].1.as_deref();
let can_insert = match parent_slug {
None => true,
Some(ps) => inserted_slugs.contains(ps) || !batch_slugs.contains(ps),
};
if can_insert {
inserted_slugs.insert(&slugs[idx]);
sorted_indices.push(idx);
} else {
still_remaining.push(idx);
}
}
remaining = still_remaining;
// Cycle detection: if no progress, insert remaining as roots
if remaining.len() == before_len && !remaining.is_empty() {
sorted_indices.append(&mut remaining);
}
}
sorted_indices
}

When to Use

Default choice for batch operations ✓ Guaranteed to work for acyclic hierarchies ✓ Minimal performance overhead O(n²) worst case ✓ Easy to test and understand

Advantages

  • Prevents FK constraint failures entirely
  • Handles external parent references gracefully
  • Detects cycles and recovers gracefully
  • No changes needed to schema

Disadvantages

  • Requires knowing all relationships upfront
  • Cannot be used with streaming inserts
  • Complex hierarchies have more iterations

Prevention Strategy 2: Deferred FK Constraints

How It Works

SQLite allows deferring FK constraint checking until transaction commit, enabling inserts in any order.

Implementation:

BEGIN TRANSACTION;
PRAGMA defer_foreign_keys = ON;
-- Insert in any order now
INSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...);
INSERT INTO pages (slug, parent_slug, ...) VALUES ('parent', NULL, ...);
COMMIT;
-- All FK constraints checked here

Rust Implementation

fn batch_insert_deferred(conn: &Connection, pages: &[(Page, Option<String>)])
-> Result<usize>
{
conn.pragma_update(None, "defer_foreign_keys", "ON")?;
let mut count = 0;
for (page, parent_slug) in pages {
conn.execute(
"INSERT INTO pages (id, slug, parent_slug, ...) VALUES (?, ?, ?, ...)",
params![page.id, page.slug(), parent_slug, ...],
)?;
count += 1;
}
// Constraints checked on commit
Ok(count)
}

When to Use

✓ Input order doesn’t matter ✓ Streaming inserts possible ✓ Large batches with many relationships ✗ Requires session-level pragma support

Advantages

  • Insert in any order
  • Better for streaming data
  • Single-pass operation
  • Lower memory footprint

Disadvantages

  • Requires pragmas (dialect-specific)
  • Errors only appear at commit
  • Harder to debug which insert failed
  • Cannot identify problem record at insertion time

Compatibility Note

Deferred FK constraints are not typically used in sqlite-rust because:

  • Connection lifecycle doesn’t guarantee pragma persistence
  • Errors at transaction commit are harder to handle
  • Testing becomes more complex
  • Topological sort is more portable

Prevention Strategy 3: Explicit Transaction Ordering

How It Works

Wrap topological sorting with explicit transaction management and conflict handling.

Implementation:

fn create_batch_safe(
&self,
workspace_path: &Path,
pages: &[(Page, Option<String>)],
) -> PageResult<BatchImportResult> {
let db = Self::open_db(workspace_path)?;
let sorted_indices = Self::topological_sort_indices(pages);
let mut created = vec![];
let mut failed = vec![];
let mut skipped = 0;
db.with_transaction(|conn| {
for (attempt, idx) in sorted_indices.iter().enumerate() {
let (page, parent_slug) = &pages[*idx];
let slug = page.slug();
// Check for existing pages
let exists = check_exists(conn, &slug)?;
if exists {
skipped += 1;
continue;
}
// Verify parent exists (if specified)
if let Some(ps) = parent_slug {
let parent_exists = check_exists(conn, ps)?;
if !parent_exists {
failed.push(ImportError {
file: slug.clone(),
reason: format!("Parent '{}' not found", ps),
attempt,
});
continue;
}
}
// Insert with explicit error handling
match insert_page(conn, page, parent_slug) {
Ok(_) => created.push(slug),
Err(e) => failed.push(ImportError {
file: slug,
reason: e.to_string(),
attempt,
}),
}
}
Ok(())
})?;
Ok(BatchImportResult { created, failed, skipped })
}

When to Use

✓ Need detailed error reporting ✓ Want to continue on partial failures ✓ Must track which pages failed and why ✓ Audit trail required

Advantages

  • Continues despite some failures
  • Detailed error information per page
  • Can implement retry logic
  • Transparent failure attribution

Disadvantages

  • More complex code
  • Partial success can leave inconsistent state
  • Higher memory for tracking failures
  • Requires careful rollback strategy

Prevention Strategy 4: Hierarchical Validation

How It Works

Validate the entire hierarchy before any inserts to catch issues early.

Implementation:

#[derive(Debug, Clone)]
pub struct HierarchyIssue {
pub slug: String,
pub issue: HierarchyError,
}
#[derive(Debug, Clone)]
pub enum HierarchyError {
Cycle(Vec<String>), // slug1 -> slug2 -> slug1
OrphanParent(String), // parent doesn't exist in DB or batch
Duplicate(String), // slug appears twice in batch
InvalidSlug(String), // empty or invalid slug format
}
fn validate_hierarchy(
pages: &[(Page, Option<String>)],
existing_parents: &HashSet<String>,
) -> Result<Vec<HierarchyIssue>, String> {
let mut issues = vec![];
let mut batch_slugs = HashSet::new();
let mut graph: HashMap<String, Vec<String>> = HashMap::new();
// Check for duplicates and build graph
for (page, parent_slug) in pages {
let slug = page.slug();
if !slug.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '/') {
issues.push(HierarchyIssue {
slug: slug.clone(),
issue: HierarchyError::InvalidSlug(slug),
});
continue;
}
if batch_slugs.contains(&slug) {
issues.push(HierarchyIssue {
slug: slug.clone(),
issue: HierarchyError::Duplicate(slug),
});
continue;
}
batch_slugs.insert(slug.clone());
if let Some(ps) = parent_slug {
if !batch_slugs.contains(ps) && !existing_parents.contains(ps) {
issues.push(HierarchyIssue {
slug: slug.clone(),
issue: HierarchyError::OrphanParent(ps.clone()),
});
}
graph.entry(ps.clone()).or_default().push(slug.clone());
}
}
// Detect cycles using DFS
let mut visited = HashSet::new();
let mut rec_stack = HashSet::new();
for slug in batch_slugs.iter() {
if detect_cycle(slug, &graph, &mut visited, &mut rec_stack).is_some() {
if let Some(cycle) = find_cycle_path(slug, &graph) {
issues.push(HierarchyIssue {
slug: slug.clone(),
issue: HierarchyError::Cycle(cycle),
});
}
}
}
Ok(issues)
}

When to Use

✓ Large imports where issues are common ✓ Need to report all problems at once ✓ Can show preview before committing ✓ User-facing import workflow

Advantages

  • Find all issues before any inserts
  • Can show user a preview of problems
  • Prevents partial failures
  • Better UX with detailed feedback

Disadvantages

  • Pre-processing overhead
  • Need access to existing database state
  • More complex code
  • Duplication with sort validation

Warning Signs and Risk Indicators

Code-Level Warning Signs

  1. Unordered Batch Inserts

    // BAD: No ordering
    for page in pages {
    insert_page(&conn, page)?;
    }
    // GOOD: Topological sort
    let sorted = topological_sort(&pages);
    for idx in sorted { ... }
  2. Missing Parent Verification

    // BAD: No check
    conn.execute(
    "INSERT INTO pages (slug, parent_slug) VALUES (?, ?)",
    params![slug, parent_slug],
    )?;
    // GOOD: Verify parent
    if parent_slug.is_some() {
    parent_exists_check()?;
    }
  3. Ignoring FK Constraint Errors

    // BAD: Swallow errors
    let _ = insert_page(&conn, page);
    // GOOD: Handle and report
    match insert_page(&conn, page) {
    Ok(_) => { /* success */ },
    Err(e) if e.is_fk_constraint() => {
    return Err(FkConstraintError { parent_slug });
    }
    }
  4. Mixing Insert Strategies

    // BAD: Inconsistent ordering
    insert_parent()?;
    for child in children {
    insert_child(&child)?; // Works if order is right
    }
    insert_another_parent()?; // May fail
    // GOOD: Consistent strategy throughout
    let sorted = topological_sort(all_pages);
    for page in sorted { insert_page(page)?; }

Test-Level Warning Signs

  1. Tests Only Cover Happy Path

    • ✗ Always inserts in child-to-parent order
    • ✗ Never tests out-of-order inserts
    • ✓ Test with multiple orderings
    • ✓ Test with gaps in hierarchy
  2. No Negative Tests

    • ✗ Don’t test circular parents
    • ✗ Don’t test missing parents
    • ✓ Add regression tests for reported issues
  3. Insufficient Coverage of Edge Cases

    • ✗ Only test flat hierarchies
    • ✗ Only test 2-3 levels deep
    • ✓ Test 5+ levels
    • ✓ Test wide hierarchies (many siblings)

Test Cases for Prevention

Unit Test: Topological Sort

#[test]
fn test_topological_sort_out_of_order() {
// CRITICAL: Test non-ideal input order
let grandchild = Page::new("GC");
let child = Page::new("C");
let parent = Page::new("P");
let pages = vec![
(grandchild, Some("c".to_string())), // 0: references child
(child, Some("p".to_string())), // 1: references parent
(parent, None), // 2: root
];
let sorted = topological_sort_indices(&pages);
// Verify ordering: parent before child before grandchild
assert_eq!(sorted[0], 2); // parent first
assert_eq!(sorted[1], 1); // child second
assert_eq!(sorted[2], 0); // grandchild last
}
#[test]
fn test_topological_sort_with_external_parent() {
// CRITICAL: Parent not in batch
let orphan = Page::new("Orphan");
let root = Page::new("Root");
let pages = vec![
(orphan, Some("nonexistent".to_string())),
(root, None),
];
let sorted = topological_sort_indices(&pages);
// Both should be treated as roots, order doesn't matter
assert_eq!(sorted.len(), 2);
}
#[test]
fn test_topological_sort_wide_hierarchy() {
// CRITICAL: Test performance and correctness with many siblings
let parent = Page::new("Parent");
let children: Vec<_> = (0..100)
.map(|i| (Page::new(&format!("Child{}", i)), Some("parent".to_string())))
.collect();
let mut pages = vec![(parent, None)];
pages.extend(children);
let sorted = topological_sort_indices(&pages);
// Parent must be first
assert_eq!(sorted[0], 0);
// All children after parent
for &idx in &sorted[1..] {
assert!(idx > 0);
}
}
#[test]
fn test_topological_sort_deep_hierarchy() {
// CRITICAL: Test deep nesting (5+ levels)
let root = Page::new("L0");
let l1 = Page::new("L1");
let l2 = Page::new("L2");
let l3 = Page::new("L3");
let l4 = Page::new("L4");
let l5 = Page::new("L5");
// REVERSE order: deepest first
let pages = vec![
(l5, Some("l4".to_string())), // 0
(l4, Some("l3".to_string())), // 1
(l3, Some("l2".to_string())), // 2
(l2, Some("l1".to_string())), // 3
(l1, Some("l0".to_string())), // 4
(root, None), // 5
];
let sorted = topological_sort_indices(&pages);
// Root (5) must come before L1 (4)
// L1 (4) must come before L2 (3), etc.
let positions: HashMap<usize, usize> = sorted
.iter()
.enumerate()
.map(|(i, &idx)| (idx, i))
.collect();
assert!(positions[&5] < positions[&4]); // root before l1
assert!(positions[&4] < positions[&3]); // l1 before l2
assert!(positions[&3] < positions[&2]); // l2 before l3
assert!(positions[&2] < positions[&1]); // l3 before l4
assert!(positions[&1] < positions[&0]); // l4 before l5
}

Integration Test: Batch Insert

#[test]
fn test_batch_insert_handles_out_of_order() {
let (workspace, repo) = setup_workspace();
// Create in wrong order: child, parent
let child_page = Page::new("Child");
let parent_page = Page::new("Parent");
let pages = vec![
(child_page, Some("parent".to_string())),
(parent_page, None),
];
// Should succeed despite wrong order
let created = repo.create_batch(workspace.path(), &pages).unwrap();
assert_eq!(created, 2);
// Verify both exist
assert!(repo.exists_by_slug(workspace.path(), "parent"));
assert!(repo.exists_by_slug(workspace.path(), "child"));
// Verify relationship
let child = repo.get_by_slug(workspace.path(), "child").unwrap();
// Parent should be "parent" (verify via tree structure)
let tree = repo.list_tree(workspace.path()).unwrap();
assert_eq!(tree[0].slug, "parent");
assert_eq!(tree[0].children[0].slug, "child");
}
#[test]
fn test_batch_insert_cycle_detection() {
let (workspace, repo) = setup_workspace();
// Create circular reference: A -> B -> A
// (This shouldn't be possible with FK constraints, but test handling)
let page_a = Page::new("A");
let page_b = Page::new("B");
let pages = vec![
(page_a, Some("b".to_string())), // A's parent is B
(page_b, Some("a".to_string())), // B's parent is A (CYCLE!)
];
// Should handle gracefully (treat one as root)
let result = repo.create_batch(workspace.path(), &pages);
// Either succeeds (with one as root) or returns clear error
match result {
Ok(created) => assert!(created >= 1), // At least one inserted
Err(e) => {
// Error message should mention cycle
assert!(e.to_string().to_lowercase().contains("cycle")
|| e.to_string().to_lowercase().contains("constraint"));
}
}
}
#[test]
fn test_batch_insert_partial_external_parents() {
let (workspace, repo) = setup_workspace();
// Create some pages first
let root = Page::new("Root");
repo.create(workspace.path(), &root).unwrap();
// Now batch import with mixed: internal and external parents
let child_of_root = Page::new("ChildOfRoot");
let orphan = Page::new("Orphan");
let pages = vec![
(orphan, Some("nonexistent".to_string())), // parent doesn't exist
(child_of_root, Some("root".to_string())), // parent exists
];
let created = repo.create_batch(workspace.path(), &pages).unwrap();
// orphan is treated as root despite invalid parent
assert!(created >= 1);
// Both should exist
assert!(repo.exists_by_slug(workspace.path(), "orphan"));
assert!(repo.exists_by_slug(workspace.path(), "child-of-root"));
}

Regression Test: Import Scenario

#[test]
fn test_import_obsidian_vault_hierarchy() {
// REAL SCENARIO: Obsidian vault with nested folders
// Structure:
// - ProjectA/
// - Overview.md
// - Design/
// - Architecture.md
// - Components/
// - Service.md
// - Implementation/
// - Database.md
let (workspace, importer) = setup_importer();
let (temp_source, _) = create_obsidian_fixture();
let params = ImportParams::in_place(temp_source.path());
let result = importer.execute(&params).unwrap();
// All files should be imported despite directory order
assert_eq!(result.imported_count, 5);
assert!(result.pages.iter().any(|p| p.slug == "overview"));
assert!(result.pages.iter().any(|p| p.slug.contains("architecture")));
assert!(result.pages.iter().any(|p| p.slug.contains("database")));
}

Best Practices Checklist

Before Writing Batch Insert Code

  • Understand the full hierarchy (depth, branching factor)
  • Decide on ordering strategy (topological sort vs deferred FK)
  • Plan for external parent references
  • Design error handling and recovery
  • Plan audit/logging of inserted items

When Implementing

  • Add topological sort if sorting is needed
  • Verify parents exist before insert (or validate upfront)
  • Use transactions to ensure atomicity
  • Log/track each insert for error reporting
  • Handle FK constraint errors explicitly
  • Consider partial failures and recovery

When Testing

  • Test with pages in reverse order
  • Test with gaps in hierarchy (parent not in batch)
  • Test with deep nesting (5+ levels)
  • Test with wide hierarchies (100+ siblings)
  • Test with duplicate slugs
  • Test with invalid slug characters
  • Test cancellation during import
  • Test concurrent imports (if applicable)

When Code Reviewing

  • Is hierarchy ordered correctly?
  • Are FK constraint errors handled?
  • Are there tests for out-of-order input?
  • Is the error message helpful?
  • Can the import be retried safely?
  • Is the transaction scope appropriate?

Summary

The topological sort strategy is recommended as the default:

  • ✓ Prevents FK constraint failures completely
  • ✓ Works with hierarchies of any depth
  • ✓ Handles external parent references
  • ✓ Minimal code complexity
  • ✓ Excellent test coverage available

For streaming or deferred scenarios, consider deferred FK constraints, but with careful error handling at commit time.

Always include tests that specifically exercise out-of-order insertion to prevent regressions.


Quick Diagnosis Flowchart

FK Constraint Error Occurs
[Does error mention parent_slug?]
YES → Go to "Parent Lookup"
NO → Go to "Other Constraint"
Parent Lookup:
[Is parent page in this batch?]
YES → Pages inserted in wrong order → Add Topological Sort
NO → [Does parent exist in database?]
YES → Verify parent_slug value matches
NO → Treat as root or return error
Other Constraint:
[Check error message]
"unique constraint" → Duplicate slug
"not null constraint" → Missing required field
Other → Check schema changes
ApproachSpeedMemoryTest Difficulty
Topological SortO(n²) worstO(n)Easy
Deferred FKO(n)O(n)Harder
Single-insert loopO(n)O(1)Medium

For most workloads (< 10,000 pages): Topological Sort is preferred. For very large imports: Profile before optimizing.

Was this page helpful?