Database Issues

Prevention Strategies: FK Constraint Import Failure

Problem Summary

When batch-inserting pages with parent-child relationships, SQLite’s Foreign Key (FK) constraints require that parent records exist before child records can reference them. Inserting pages in arbitrary order causes constraint violations that fail the entire batch operation.

Example of Failure:

INSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...)
-- ERROR: FOREIGN KEY constraint failed
-- The parent page with slug='parent' doesn't exist yet

Prevention Strategy 1: Topological Sorting

How It Works

Topological sorting reorders the batch so parents are always inserted before their children, satisfying FK constraints.

Algorithm:

Pre-compute all page slugs in the batch
Iterate until all pages are ordered
For each iteration, insert any page whose parent is:
- None (root page), OR
- Already in the ordered results, OR
- Not in this batch (treat as external/root)
If no progress is made, detect cycles and insert remaining as roots

Implementation Example

/// Returns indices in topologically sorted order (parents before children)
fn topological_sort_indices(pages: &[(Page, Option<String>)]) -> Vec<usize> {
    let slugs: Vec<String> = pages.iter().map(|(p, _)| p.slug()).collect();
    let batch_slugs: HashSet<&str> = slugs.iter().map(|s| s.as_str()).collect();

    let mut sorted_indices: Vec<usize> = Vec::with_capacity(pages.len());
    let mut remaining: Vec<usize> = (0..pages.len()).collect();
    let mut inserted_slugs: HashSet<&str> = HashSet::new();

    while !remaining.is_empty() {
        let before_len = remaining.len();
        let mut still_remaining = Vec::new();

        for idx in remaining {
            let parent_slug = pages[idx].1.as_deref();
            let can_insert = match parent_slug {
                None => true,
                Some(ps) => inserted_slugs.contains(ps) || !batch_slugs.contains(ps),
            };

            if can_insert {
                inserted_slugs.insert(&slugs[idx]);
                sorted_indices.push(idx);
            } else {
                still_remaining.push(idx);
            }
        }
        remaining = still_remaining;

        // Cycle detection: if no progress, insert remaining as roots
        if remaining.len() == before_len && !remaining.is_empty() {
            sorted_indices.append(&mut remaining);
        }
    }

    sorted_indices
}

When to Use

✓ Default choice for batch operations ✓ Guaranteed to work for acyclic hierarchies ✓ Minimal performance overhead O(n²) worst case ✓ Easy to test and understand

Advantages

Prevents FK constraint failures entirely
Handles external parent references gracefully
Detects cycles and recovers gracefully
No changes needed to schema

Disadvantages

Requires knowing all relationships upfront
Cannot be used with streaming inserts
Complex hierarchies have more iterations

Prevention Strategy 2: Deferred FK Constraints

How It Works

SQLite allows deferring FK constraint checking until transaction commit, enabling inserts in any order.

Implementation:

BEGIN TRANSACTION;
PRAGMA defer_foreign_keys = ON;

-- Insert in any order now
INSERT INTO pages (slug, parent_slug, ...) VALUES ('child', 'parent', ...);
INSERT INTO pages (slug, parent_slug, ...) VALUES ('parent', NULL, ...);

COMMIT;
-- All FK constraints checked here

Rust Implementation

fn batch_insert_deferred(conn: &Connection, pages: &[(Page, Option<String>)])
    -> Result<usize>
{
    conn.pragma_update(None, "defer_foreign_keys", "ON")?;

    let mut count = 0;
    for (page, parent_slug) in pages {
        conn.execute(
            "INSERT INTO pages (id, slug, parent_slug, ...) VALUES (?, ?, ?, ...)",
            params![page.id, page.slug(), parent_slug, ...],
        )?;
        count += 1;
    }

    // Constraints checked on commit
    Ok(count)
}

When to Use

✓ Input order doesn’t matter ✓ Streaming inserts possible ✓ Large batches with many relationships ✗ Requires session-level pragma support

Advantages

Insert in any order
Better for streaming data
Single-pass operation
Lower memory footprint

Disadvantages

Requires pragmas (dialect-specific)
Errors only appear at commit
Harder to debug which insert failed
Cannot identify problem record at insertion time

Compatibility Note

Deferred FK constraints are not typically used in sqlite-rust because:

Connection lifecycle doesn’t guarantee pragma persistence
Errors at transaction commit are harder to handle
Testing becomes more complex
Topological sort is more portable

Prevention Strategy 3: Explicit Transaction Ordering

How It Works

Wrap topological sorting with explicit transaction management and conflict handling.

Implementation:

fn create_batch_safe(
    &self,
    workspace_path: &Path,
    pages: &[(Page, Option<String>)],
) -> PageResult<BatchImportResult> {
    let db = Self::open_db(workspace_path)?;
    let sorted_indices = Self::topological_sort_indices(pages);

    let mut created = vec![];
    let mut failed = vec![];
    let mut skipped = 0;

    db.with_transaction(|conn| {
        for (attempt, idx) in sorted_indices.iter().enumerate() {
            let (page, parent_slug) = &pages[*idx];
            let slug = page.slug();

            // Check for existing pages
            let exists = check_exists(conn, &slug)?;
            if exists {
                skipped += 1;
                continue;
            }

            // Verify parent exists (if specified)
            if let Some(ps) = parent_slug {
                let parent_exists = check_exists(conn, ps)?;
                if !parent_exists {
                    failed.push(ImportError {
                        file: slug.clone(),
                        reason: format!("Parent '{}' not found", ps),
                        attempt,
                    });
                    continue;
                }
            }

            // Insert with explicit error handling
            match insert_page(conn, page, parent_slug) {
                Ok(_) => created.push(slug),
                Err(e) => failed.push(ImportError {
                    file: slug,
                    reason: e.to_string(),
                    attempt,
                }),
            }
        }
        Ok(())
    })?;

    Ok(BatchImportResult { created, failed, skipped })
}

When to Use

✓ Need detailed error reporting ✓ Want to continue on partial failures ✓ Must track which pages failed and why ✓ Audit trail required

Advantages

Continues despite some failures
Detailed error information per page
Can implement retry logic
Transparent failure attribution

Disadvantages

More complex code
Partial success can leave inconsistent state
Higher memory for tracking failures
Requires careful rollback strategy

Prevention Strategy 4: Hierarchical Validation

How It Works

Validate the entire hierarchy before any inserts to catch issues early.

Implementation:

#[derive(Debug, Clone)]
pub struct HierarchyIssue {
    pub slug: String,
    pub issue: HierarchyError,
}

#[derive(Debug, Clone)]
pub enum HierarchyError {
    Cycle(Vec<String>),           // slug1 -> slug2 -> slug1
    OrphanParent(String),         // parent doesn't exist in DB or batch
    Duplicate(String),            // slug appears twice in batch
    InvalidSlug(String),          // empty or invalid slug format
}

fn validate_hierarchy(
    pages: &[(Page, Option<String>)],
    existing_parents: &HashSet<String>,
) -> Result<Vec<HierarchyIssue>, String> {
    let mut issues = vec![];
    let mut batch_slugs = HashSet::new();
    let mut graph: HashMap<String, Vec<String>> = HashMap::new();

    // Check for duplicates and build graph
    for (page, parent_slug) in pages {
        let slug = page.slug();

        if !slug.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '/') {
            issues.push(HierarchyIssue {
                slug: slug.clone(),
                issue: HierarchyError::InvalidSlug(slug),
            });
            continue;
        }

        if batch_slugs.contains(&slug) {
            issues.push(HierarchyIssue {
                slug: slug.clone(),
                issue: HierarchyError::Duplicate(slug),
            });
            continue;
        }
        batch_slugs.insert(slug.clone());

        if let Some(ps) = parent_slug {
            if !batch_slugs.contains(ps) && !existing_parents.contains(ps) {
                issues.push(HierarchyIssue {
                    slug: slug.clone(),
                    issue: HierarchyError::OrphanParent(ps.clone()),
                });
            }
            graph.entry(ps.clone()).or_default().push(slug.clone());
        }
    }

    // Detect cycles using DFS
    let mut visited = HashSet::new();
    let mut rec_stack = HashSet::new();

    for slug in batch_slugs.iter() {
        if detect_cycle(slug, &graph, &mut visited, &mut rec_stack).is_some() {
            if let Some(cycle) = find_cycle_path(slug, &graph) {
                issues.push(HierarchyIssue {
                    slug: slug.clone(),
                    issue: HierarchyError::Cycle(cycle),
                });
            }
        }
    }

    Ok(issues)
}

When to Use

✓ Large imports where issues are common ✓ Need to report all problems at once ✓ Can show preview before committing ✓ User-facing import workflow

Advantages

Find all issues before any inserts
Can show user a preview of problems
Prevents partial failures
Better UX with detailed feedback

Disadvantages

Pre-processing overhead
Need access to existing database state
More complex code
Duplication with sort validation

Warning Signs and Risk Indicators

Code-Level Warning Signs

Unordered Batch Inserts

// BAD: No ordering
for page in pages {
    insert_page(&conn, page)?;
}

// GOOD: Topological sort
let sorted = topological_sort(&pages);
for idx in sorted { ... }

Missing Parent Verification

// BAD: No check
conn.execute(
    "INSERT INTO pages (slug, parent_slug) VALUES (?, ?)",
    params![slug, parent_slug],
)?;

// GOOD: Verify parent
if parent_slug.is_some() {
    parent_exists_check()?;
}

Ignoring FK Constraint Errors

// BAD: Swallow errors
let _ = insert_page(&conn, page);

// GOOD: Handle and report
match insert_page(&conn, page) {
    Ok(_) => { /* success */ },
    Err(e) if e.is_fk_constraint() => {
        return Err(FkConstraintError { parent_slug });
    }
}

Mixing Insert Strategies

// BAD: Inconsistent ordering
insert_parent()?;
for child in children {
    insert_child(&child)?;  // Works if order is right
}
insert_another_parent()?;   // May fail

// GOOD: Consistent strategy throughout
let sorted = topological_sort(all_pages);
for page in sorted { insert_page(page)?; }

Test-Level Warning Signs

Tests Only Cover Happy Path
- ✗ Always inserts in child-to-parent order
- ✗ Never tests out-of-order inserts
- ✓ Test with multiple orderings
- ✓ Test with gaps in hierarchy
No Negative Tests
- ✗ Don’t test circular parents
- ✗ Don’t test missing parents
- ✓ Add regression tests for reported issues
Insufficient Coverage of Edge Cases
- ✗ Only test flat hierarchies
- ✗ Only test 2-3 levels deep
- ✓ Test 5+ levels
- ✓ Test wide hierarchies (many siblings)

Test Cases for Prevention

Unit Test: Topological Sort

#[test]
fn test_topological_sort_out_of_order() {
    // CRITICAL: Test non-ideal input order
    let grandchild = Page::new("GC");
    let child = Page::new("C");
    let parent = Page::new("P");

    let pages = vec![
        (grandchild, Some("c".to_string())),  // 0: references child
        (child, Some("p".to_string())),       // 1: references parent
        (parent, None),                        // 2: root
    ];

    let sorted = topological_sort_indices(&pages);

    // Verify ordering: parent before child before grandchild
    assert_eq!(sorted[0], 2);  // parent first
    assert_eq!(sorted[1], 1);  // child second
    assert_eq!(sorted[2], 0);  // grandchild last
}

#[test]
fn test_topological_sort_with_external_parent() {
    // CRITICAL: Parent not in batch
    let orphan = Page::new("Orphan");
    let root = Page::new("Root");

    let pages = vec![
        (orphan, Some("nonexistent".to_string())),
        (root, None),
    ];

    let sorted = topological_sort_indices(&pages);

    // Both should be treated as roots, order doesn't matter
    assert_eq!(sorted.len(), 2);
}

#[test]
fn test_topological_sort_wide_hierarchy() {
    // CRITICAL: Test performance and correctness with many siblings
    let parent = Page::new("Parent");
    let children: Vec<_> = (0..100)
        .map(|i| (Page::new(&format!("Child{}", i)), Some("parent".to_string())))
        .collect();

    let mut pages = vec![(parent, None)];
    pages.extend(children);

    let sorted = topological_sort_indices(&pages);

    // Parent must be first
    assert_eq!(sorted[0], 0);
    // All children after parent
    for &idx in &sorted[1..] {
        assert!(idx > 0);
    }
}

#[test]
fn test_topological_sort_deep_hierarchy() {
    // CRITICAL: Test deep nesting (5+ levels)
    let root = Page::new("L0");
    let l1 = Page::new("L1");
    let l2 = Page::new("L2");
    let l3 = Page::new("L3");
    let l4 = Page::new("L4");
    let l5 = Page::new("L5");

    // REVERSE order: deepest first
    let pages = vec![
        (l5, Some("l4".to_string())),  // 0
        (l4, Some("l3".to_string())),  // 1
        (l3, Some("l2".to_string())),  // 2
        (l2, Some("l1".to_string())),  // 3
        (l1, Some("l0".to_string())),  // 4
        (root, None),                  // 5
    ];

    let sorted = topological_sort_indices(&pages);

    // Root (5) must come before L1 (4)
    // L1 (4) must come before L2 (3), etc.
    let positions: HashMap<usize, usize> = sorted
        .iter()
        .enumerate()
        .map(|(i, &idx)| (idx, i))
        .collect();

    assert!(positions[&5] < positions[&4]);  // root before l1
    assert!(positions[&4] < positions[&3]);  // l1 before l2
    assert!(positions[&3] < positions[&2]);  // l2 before l3
    assert!(positions[&2] < positions[&1]);  // l3 before l4
    assert!(positions[&1] < positions[&0]);  // l4 before l5
}

Integration Test: Batch Insert

#[test]
fn test_batch_insert_handles_out_of_order() {
    let (workspace, repo) = setup_workspace();

    // Create in wrong order: child, parent
    let child_page = Page::new("Child");
    let parent_page = Page::new("Parent");

    let pages = vec![
        (child_page, Some("parent".to_string())),
        (parent_page, None),
    ];

    // Should succeed despite wrong order
    let created = repo.create_batch(workspace.path(), &pages).unwrap();
    assert_eq!(created, 2);

    // Verify both exist
    assert!(repo.exists_by_slug(workspace.path(), "parent"));
    assert!(repo.exists_by_slug(workspace.path(), "child"));

    // Verify relationship
    let child = repo.get_by_slug(workspace.path(), "child").unwrap();
    // Parent should be "parent" (verify via tree structure)
    let tree = repo.list_tree(workspace.path()).unwrap();
    assert_eq!(tree[0].slug, "parent");
    assert_eq!(tree[0].children[0].slug, "child");
}

#[test]
fn test_batch_insert_cycle_detection() {
    let (workspace, repo) = setup_workspace();

    // Create circular reference: A -> B -> A
    // (This shouldn't be possible with FK constraints, but test handling)
    let page_a = Page::new("A");
    let page_b = Page::new("B");

    let pages = vec![
        (page_a, Some("b".to_string())),  // A's parent is B
        (page_b, Some("a".to_string())),  // B's parent is A (CYCLE!)
    ];

    // Should handle gracefully (treat one as root)
    let result = repo.create_batch(workspace.path(), &pages);

    // Either succeeds (with one as root) or returns clear error
    match result {
        Ok(created) => assert!(created >= 1),  // At least one inserted
        Err(e) => {
            // Error message should mention cycle
            assert!(e.to_string().to_lowercase().contains("cycle")
                || e.to_string().to_lowercase().contains("constraint"));
        }
    }
}

#[test]
fn test_batch_insert_partial_external_parents() {
    let (workspace, repo) = setup_workspace();

    // Create some pages first
    let root = Page::new("Root");
    repo.create(workspace.path(), &root).unwrap();

    // Now batch import with mixed: internal and external parents
    let child_of_root = Page::new("ChildOfRoot");
    let orphan = Page::new("Orphan");

    let pages = vec![
        (orphan, Some("nonexistent".to_string())),    // parent doesn't exist
        (child_of_root, Some("root".to_string())),    // parent exists
    ];

    let created = repo.create_batch(workspace.path(), &pages).unwrap();

    // orphan is treated as root despite invalid parent
    assert!(created >= 1);

    // Both should exist
    assert!(repo.exists_by_slug(workspace.path(), "orphan"));
    assert!(repo.exists_by_slug(workspace.path(), "child-of-root"));
}

Regression Test: Import Scenario

#[test]
fn test_import_obsidian_vault_hierarchy() {
    // REAL SCENARIO: Obsidian vault with nested folders
    // Structure:
    //   - ProjectA/
    //     - Overview.md
    //     - Design/
    //       - Architecture.md
    //       - Components/
    //         - Service.md
    //     - Implementation/
    //       - Database.md

    let (workspace, importer) = setup_importer();
    let (temp_source, _) = create_obsidian_fixture();

    let params = ImportParams::in_place(temp_source.path());
    let result = importer.execute(&params).unwrap();

    // All files should be imported despite directory order
    assert_eq!(result.imported_count, 5);
    assert!(result.pages.iter().any(|p| p.slug == "overview"));
    assert!(result.pages.iter().any(|p| p.slug.contains("architecture")));
    assert!(result.pages.iter().any(|p| p.slug.contains("database")));
}

Best Practices Checklist

Before Writing Batch Insert Code

Understand the full hierarchy (depth, branching factor)
Decide on ordering strategy (topological sort vs deferred FK)
Plan for external parent references
Design error handling and recovery
Plan audit/logging of inserted items

When Implementing

Add topological sort if sorting is needed
Verify parents exist before insert (or validate upfront)
Use transactions to ensure atomicity
Log/track each insert for error reporting
Handle FK constraint errors explicitly
Consider partial failures and recovery

When Testing

Test with pages in reverse order
Test with gaps in hierarchy (parent not in batch)
Test with deep nesting (5+ levels)
Test with wide hierarchies (100+ siblings)
Test with duplicate slugs
Test with invalid slug characters
Test cancellation during import
Test concurrent imports (if applicable)

When Code Reviewing

Is hierarchy ordered correctly?
Are FK constraint errors handled?
Are there tests for out-of-order input?
Is the error message helpful?
Can the import be retried safely?
Is the transaction scope appropriate?

Summary

The topological sort strategy is recommended as the default:

✓ Prevents FK constraint failures completely
✓ Works with hierarchies of any depth
✓ Handles external parent references
✓ Minimal code complexity
✓ Excellent test coverage available

For streaming or deferred scenarios, consider deferred FK constraints, but with careful error handling at commit time.

Always include tests that specifically exercise out-of-order insertion to prevent regressions.

Quick Diagnosis Flowchart

FK Constraint Error Occurs
  ↓
[Does error mention parent_slug?]
  YES → Go to "Parent Lookup"
  NO  → Go to "Other Constraint"

Parent Lookup:
  ↓
[Is parent page in this batch?]
  YES → Pages inserted in wrong order → Add Topological Sort
  NO  → [Does parent exist in database?]
    YES → Verify parent_slug value matches
    NO  → Treat as root or return error

Other Constraint:
  ↓
[Check error message]
  "unique constraint" → Duplicate slug
  "not null constraint" → Missing required field
  Other → Check schema changes

Approach	Speed	Memory	Test Difficulty
Topological Sort	O(n²) worst	O(n)	Easy
Deferred FK	O(n)	O(n)	Harder
Single-insert loop	O(n)	O(1)	Medium

For most workloads (< 10,000 pages): Topological Sort is preferred. For very large imports: Profile before optimizing.

Previous
Rust Clippy Idioms Catalog Next
SQL Migration Column Evolution — Silent Data Loss on INSERT

Was this page helpful?