Prevention Strategies: FK Constraint Import Failure
Problem Summary
When batch-inserting pages with parent-child relationships, SQLite’s Foreign Key (FK) constraints require that parent records exist before child records can reference them. Inserting pages in arbitrary order causes constraint violations that fail the entire batch operation.
Example of Failure:
INSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...)-- ERROR: FOREIGN KEY constraint failed-- The parent page with slug='parent' doesn't exist yetPrevention Strategy 1: Topological Sorting
How It Works
Topological sorting reorders the batch so parents are always inserted before their children, satisfying FK constraints.
Algorithm:
- Pre-compute all page slugs in the batch
- Iterate until all pages are ordered
- For each iteration, insert any page whose parent is:
- None (root page), OR
- Already in the ordered results, OR
- Not in this batch (treat as external/root)
- If no progress is made, detect cycles and insert remaining as roots
Implementation Example
/// Returns indices in topologically sorted order (parents before children)fn topological_sort_indices(pages: &[(Page, Option<String>)]) -> Vec<usize> { let slugs: Vec<String> = pages.iter().map(|(p, _)| p.slug()).collect(); let batch_slugs: HashSet<&str> = slugs.iter().map(|s| s.as_str()).collect();
let mut sorted_indices: Vec<usize> = Vec::with_capacity(pages.len()); let mut remaining: Vec<usize> = (0..pages.len()).collect(); let mut inserted_slugs: HashSet<&str> = HashSet::new();
while !remaining.is_empty() { let before_len = remaining.len(); let mut still_remaining = Vec::new();
for idx in remaining { let parent_slug = pages[idx].1.as_deref(); let can_insert = match parent_slug { None => true, Some(ps) => inserted_slugs.contains(ps) || !batch_slugs.contains(ps), };
if can_insert { inserted_slugs.insert(&slugs[idx]); sorted_indices.push(idx); } else { still_remaining.push(idx); } } remaining = still_remaining;
// Cycle detection: if no progress, insert remaining as roots if remaining.len() == before_len && !remaining.is_empty() { sorted_indices.append(&mut remaining); } }
sorted_indices}When to Use
✓ Default choice for batch operations ✓ Guaranteed to work for acyclic hierarchies ✓ Minimal performance overhead O(n²) worst case ✓ Easy to test and understand
Advantages
- Prevents FK constraint failures entirely
- Handles external parent references gracefully
- Detects cycles and recovers gracefully
- No changes needed to schema
Disadvantages
- Requires knowing all relationships upfront
- Cannot be used with streaming inserts
- Complex hierarchies have more iterations
Prevention Strategy 2: Deferred FK Constraints
How It Works
SQLite allows deferring FK constraint checking until transaction commit, enabling inserts in any order.
Implementation:
BEGIN TRANSACTION;PRAGMA defer_foreign_keys = ON;
-- Insert in any order nowINSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...);INSERT INTO pages (slug, parent_slug, ...) VALUES ('parent', NULL, ...);
COMMIT;-- All FK constraints checked hereRust Implementation
fn batch_insert_deferred(conn: &Connection, pages: &[(Page, Option<String>)]) -> Result<usize>{ conn.pragma_update(None, "defer_foreign_keys", "ON")?;
let mut count = 0; for (page, parent_slug) in pages { conn.execute( "INSERT INTO pages (id, slug, parent_slug, ...) VALUES (?, ?, ?, ...)", params![page.id, page.slug(), parent_slug, ...], )?; count += 1; }
// Constraints checked on commit Ok(count)}When to Use
✓ Input order doesn’t matter ✓ Streaming inserts possible ✓ Large batches with many relationships ✗ Requires session-level pragma support
Advantages
- Insert in any order
- Better for streaming data
- Single-pass operation
- Lower memory footprint
Disadvantages
- Requires pragmas (dialect-specific)
- Errors only appear at commit
- Harder to debug which insert failed
- Cannot identify problem record at insertion time
Compatibility Note
Deferred FK constraints are not typically used in sqlite-rust because:
- Connection lifecycle doesn’t guarantee pragma persistence
- Errors at transaction commit are harder to handle
- Testing becomes more complex
- Topological sort is more portable
Prevention Strategy 3: Explicit Transaction Ordering
How It Works
Wrap topological sorting with explicit transaction management and conflict handling.
Implementation:
fn create_batch_safe( &self, workspace_path: &Path, pages: &[(Page, Option<String>)],) -> PageResult<BatchImportResult> { let db = Self::open_db(workspace_path)?; let sorted_indices = Self::topological_sort_indices(pages);
let mut created = vec![]; let mut failed = vec![]; let mut skipped = 0;
db.with_transaction(|conn| { for (attempt, idx) in sorted_indices.iter().enumerate() { let (page, parent_slug) = &pages[*idx]; let slug = page.slug();
// Check for existing pages let exists = check_exists(conn, &slug)?; if exists { skipped += 1; continue; }
// Verify parent exists (if specified) if let Some(ps) = parent_slug { let parent_exists = check_exists(conn, ps)?; if !parent_exists { failed.push(ImportError { file: slug.clone(), reason: format!("Parent '{}' not found", ps), attempt, }); continue; } }
// Insert with explicit error handling match insert_page(conn, page, parent_slug) { Ok(_) => created.push(slug), Err(e) => failed.push(ImportError { file: slug, reason: e.to_string(), attempt, }), } } Ok(()) })?;
Ok(BatchImportResult { created, failed, skipped })}When to Use
✓ Need detailed error reporting ✓ Want to continue on partial failures ✓ Must track which pages failed and why ✓ Audit trail required
Advantages
- Continues despite some failures
- Detailed error information per page
- Can implement retry logic
- Transparent failure attribution
Disadvantages
- More complex code
- Partial success can leave inconsistent state
- Higher memory for tracking failures
- Requires careful rollback strategy
Prevention Strategy 4: Hierarchical Validation
How It Works
Validate the entire hierarchy before any inserts to catch issues early.
Implementation:
#[derive(Debug, Clone)]pub struct HierarchyIssue { pub slug: String, pub issue: HierarchyError,}
#[derive(Debug, Clone)]pub enum HierarchyError { Cycle(Vec<String>), // slug1 -> slug2 -> slug1 OrphanParent(String), // parent doesn't exist in DB or batch Duplicate(String), // slug appears twice in batch InvalidSlug(String), // empty or invalid slug format}
fn validate_hierarchy( pages: &[(Page, Option<String>)], existing_parents: &HashSet<String>,) -> Result<Vec<HierarchyIssue>, String> { let mut issues = vec![]; let mut batch_slugs = HashSet::new(); let mut graph: HashMap<String, Vec<String>> = HashMap::new();
// Check for duplicates and build graph for (page, parent_slug) in pages { let slug = page.slug();
if !slug.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '/') { issues.push(HierarchyIssue { slug: slug.clone(), issue: HierarchyError::InvalidSlug(slug), }); continue; }
if batch_slugs.contains(&slug) { issues.push(HierarchyIssue { slug: slug.clone(), issue: HierarchyError::Duplicate(slug), }); continue; } batch_slugs.insert(slug.clone());
if let Some(ps) = parent_slug { if !batch_slugs.contains(ps) && !existing_parents.contains(ps) { issues.push(HierarchyIssue { slug: slug.clone(), issue: HierarchyError::OrphanParent(ps.clone()), }); } graph.entry(ps.clone()).or_default().push(slug.clone()); } }
// Detect cycles using DFS let mut visited = HashSet::new(); let mut rec_stack = HashSet::new();
for slug in batch_slugs.iter() { if detect_cycle(slug, &graph, &mut visited, &mut rec_stack).is_some() { if let Some(cycle) = find_cycle_path(slug, &graph) { issues.push(HierarchyIssue { slug: slug.clone(), issue: HierarchyError::Cycle(cycle), }); } } }
Ok(issues)}When to Use
✓ Large imports where issues are common ✓ Need to report all problems at once ✓ Can show preview before committing ✓ User-facing import workflow
Advantages
- Find all issues before any inserts
- Can show user a preview of problems
- Prevents partial failures
- Better UX with detailed feedback
Disadvantages
- Pre-processing overhead
- Need access to existing database state
- More complex code
- Duplication with sort validation
Warning Signs and Risk Indicators
Code-Level Warning Signs
-
Unordered Batch Inserts
// BAD: No orderingfor page in pages {insert_page(&conn, page)?;}// GOOD: Topological sortlet sorted = topological_sort(&pages);for idx in sorted { ... } -
Missing Parent Verification
// BAD: No checkconn.execute("INSERT INTO pages (slug, parent_slug) VALUES (?, ?)",params![slug, parent_slug],)?;// GOOD: Verify parentif parent_slug.is_some() {parent_exists_check()?;} -
Ignoring FK Constraint Errors
// BAD: Swallow errorslet _ = insert_page(&conn, page);// GOOD: Handle and reportmatch insert_page(&conn, page) {Ok(_) => { /* success */ },Err(e) if e.is_fk_constraint() => {return Err(FkConstraintError { parent_slug });}} -
Mixing Insert Strategies
// BAD: Inconsistent orderinginsert_parent()?;for child in children {insert_child(&child)?; // Works if order is right}insert_another_parent()?; // May fail// GOOD: Consistent strategy throughoutlet sorted = topological_sort(all_pages);for page in sorted { insert_page(page)?; }
Test-Level Warning Signs
-
Tests Only Cover Happy Path
- ✗ Always inserts in child-to-parent order
- ✗ Never tests out-of-order inserts
- ✓ Test with multiple orderings
- ✓ Test with gaps in hierarchy
-
No Negative Tests
- ✗ Don’t test circular parents
- ✗ Don’t test missing parents
- ✓ Add regression tests for reported issues
-
Insufficient Coverage of Edge Cases
- ✗ Only test flat hierarchies
- ✗ Only test 2-3 levels deep
- ✓ Test 5+ levels
- ✓ Test wide hierarchies (many siblings)
Test Cases for Prevention
Unit Test: Topological Sort
#[test]fn test_topological_sort_out_of_order() { // CRITICAL: Test non-ideal input order let grandchild = Page::new("GC"); let child = Page::new("C"); let parent = Page::new("P");
let pages = vec![ (grandchild, Some("c".to_string())), // 0: references child (child, Some("p".to_string())), // 1: references parent (parent, None), // 2: root ];
let sorted = topological_sort_indices(&pages);
// Verify ordering: parent before child before grandchild assert_eq!(sorted[0], 2); // parent first assert_eq!(sorted[1], 1); // child second assert_eq!(sorted[2], 0); // grandchild last}
#[test]fn test_topological_sort_with_external_parent() { // CRITICAL: Parent not in batch let orphan = Page::new("Orphan"); let root = Page::new("Root");
let pages = vec![ (orphan, Some("nonexistent".to_string())), (root, None), ];
let sorted = topological_sort_indices(&pages);
// Both should be treated as roots, order doesn't matter assert_eq!(sorted.len(), 2);}
#[test]fn test_topological_sort_wide_hierarchy() { // CRITICAL: Test performance and correctness with many siblings let parent = Page::new("Parent"); let children: Vec<_> = (0..100) .map(|i| (Page::new(&format!("Child{}", i)), Some("parent".to_string()))) .collect();
let mut pages = vec![(parent, None)]; pages.extend(children);
let sorted = topological_sort_indices(&pages);
// Parent must be first assert_eq!(sorted[0], 0); // All children after parent for &idx in &sorted[1..] { assert!(idx > 0); }}
#[test]fn test_topological_sort_deep_hierarchy() { // CRITICAL: Test deep nesting (5+ levels) let root = Page::new("L0"); let l1 = Page::new("L1"); let l2 = Page::new("L2"); let l3 = Page::new("L3"); let l4 = Page::new("L4"); let l5 = Page::new("L5");
// REVERSE order: deepest first let pages = vec![ (l5, Some("l4".to_string())), // 0 (l4, Some("l3".to_string())), // 1 (l3, Some("l2".to_string())), // 2 (l2, Some("l1".to_string())), // 3 (l1, Some("l0".to_string())), // 4 (root, None), // 5 ];
let sorted = topological_sort_indices(&pages);
// Root (5) must come before L1 (4) // L1 (4) must come before L2 (3), etc. let positions: HashMap<usize, usize> = sorted .iter() .enumerate() .map(|(i, &idx)| (idx, i)) .collect();
assert!(positions[&5] < positions[&4]); // root before l1 assert!(positions[&4] < positions[&3]); // l1 before l2 assert!(positions[&3] < positions[&2]); // l2 before l3 assert!(positions[&2] < positions[&1]); // l3 before l4 assert!(positions[&1] < positions[&0]); // l4 before l5}Integration Test: Batch Insert
#[test]fn test_batch_insert_handles_out_of_order() { let (workspace, repo) = setup_workspace();
// Create in wrong order: child, parent let child_page = Page::new("Child"); let parent_page = Page::new("Parent");
let pages = vec![ (child_page, Some("parent".to_string())), (parent_page, None), ];
// Should succeed despite wrong order let created = repo.create_batch(workspace.path(), &pages).unwrap(); assert_eq!(created, 2);
// Verify both exist assert!(repo.exists_by_slug(workspace.path(), "parent")); assert!(repo.exists_by_slug(workspace.path(), "child"));
// Verify relationship let child = repo.get_by_slug(workspace.path(), "child").unwrap(); // Parent should be "parent" (verify via tree structure) let tree = repo.list_tree(workspace.path()).unwrap(); assert_eq!(tree[0].slug, "parent"); assert_eq!(tree[0].children[0].slug, "child");}
#[test]fn test_batch_insert_cycle_detection() { let (workspace, repo) = setup_workspace();
// Create circular reference: A -> B -> A // (This shouldn't be possible with FK constraints, but test handling) let page_a = Page::new("A"); let page_b = Page::new("B");
let pages = vec![ (page_a, Some("b".to_string())), // A's parent is B (page_b, Some("a".to_string())), // B's parent is A (CYCLE!) ];
// Should handle gracefully (treat one as root) let result = repo.create_batch(workspace.path(), &pages);
// Either succeeds (with one as root) or returns clear error match result { Ok(created) => assert!(created >= 1), // At least one inserted Err(e) => { // Error message should mention cycle assert!(e.to_string().to_lowercase().contains("cycle") || e.to_string().to_lowercase().contains("constraint")); } }}
#[test]fn test_batch_insert_partial_external_parents() { let (workspace, repo) = setup_workspace();
// Create some pages first let root = Page::new("Root"); repo.create(workspace.path(), &root).unwrap();
// Now batch import with mixed: internal and external parents let child_of_root = Page::new("ChildOfRoot"); let orphan = Page::new("Orphan");
let pages = vec![ (orphan, Some("nonexistent".to_string())), // parent doesn't exist (child_of_root, Some("root".to_string())), // parent exists ];
let created = repo.create_batch(workspace.path(), &pages).unwrap();
// orphan is treated as root despite invalid parent assert!(created >= 1);
// Both should exist assert!(repo.exists_by_slug(workspace.path(), "orphan")); assert!(repo.exists_by_slug(workspace.path(), "child-of-root"));}Regression Test: Import Scenario
#[test]fn test_import_obsidian_vault_hierarchy() { // REAL SCENARIO: Obsidian vault with nested folders // Structure: // - ProjectA/ // - Overview.md // - Design/ // - Architecture.md // - Components/ // - Service.md // - Implementation/ // - Database.md
let (workspace, importer) = setup_importer(); let (temp_source, _) = create_obsidian_fixture();
let params = ImportParams::in_place(temp_source.path()); let result = importer.execute(¶ms).unwrap();
// All files should be imported despite directory order assert_eq!(result.imported_count, 5); assert!(result.pages.iter().any(|p| p.slug == "overview")); assert!(result.pages.iter().any(|p| p.slug.contains("architecture"))); assert!(result.pages.iter().any(|p| p.slug.contains("database")));}Best Practices Checklist
Before Writing Batch Insert Code
- Understand the full hierarchy (depth, branching factor)
- Decide on ordering strategy (topological sort vs deferred FK)
- Plan for external parent references
- Design error handling and recovery
- Plan audit/logging of inserted items
When Implementing
- Add topological sort if sorting is needed
- Verify parents exist before insert (or validate upfront)
- Use transactions to ensure atomicity
- Log/track each insert for error reporting
- Handle FK constraint errors explicitly
- Consider partial failures and recovery
When Testing
- Test with pages in reverse order
- Test with gaps in hierarchy (parent not in batch)
- Test with deep nesting (5+ levels)
- Test with wide hierarchies (100+ siblings)
- Test with duplicate slugs
- Test with invalid slug characters
- Test cancellation during import
- Test concurrent imports (if applicable)
When Code Reviewing
- Is hierarchy ordered correctly?
- Are FK constraint errors handled?
- Are there tests for out-of-order input?
- Is the error message helpful?
- Can the import be retried safely?
- Is the transaction scope appropriate?
Summary
The topological sort strategy is recommended as the default:
- ✓ Prevents FK constraint failures completely
- ✓ Works with hierarchies of any depth
- ✓ Handles external parent references
- ✓ Minimal code complexity
- ✓ Excellent test coverage available
For streaming or deferred scenarios, consider deferred FK constraints, but with careful error handling at commit time.
Always include tests that specifically exercise out-of-order insertion to prevent regressions.
Quick Diagnosis Flowchart
FK Constraint Error Occurs ↓[Does error mention parent_slug?] YES → Go to "Parent Lookup" NO → Go to "Other Constraint"
Parent Lookup: ↓[Is parent page in this batch?] YES → Pages inserted in wrong order → Add Topological Sort NO → [Does parent exist in database?] YES → Verify parent_slug value matches NO → Treat as root or return error
Other Constraint: ↓[Check error message] "unique constraint" → Duplicate slug "not null constraint" → Missing required field Other → Check schema changes| Approach | Speed | Memory | Test Difficulty |
|---|---|---|---|
| Topological Sort | O(n²) worst | O(n) | Easy |
| Deferred FK | O(n) | O(n) | Harder |
| Single-insert loop | O(n) | O(1) | Medium |
For most workloads (< 10,000 pages): Topological Sort is preferred. For very large imports: Profile before optimizing.
Rust Clippy Idioms Catalog Next
SQL Migration Column Evolution — Silent Data Loss on INSERT
Was this page helpful?
Thanks for your feedback!