CRDT Binary Must Pass Through Untouched — Never Re-serialize from Materialized Text
CRDT Binary Must Pass Through Untouched — Never Re-serialize from Materialized Text
Problem
When adding CRDT-based storage (Loro) to a Tauri app, Tauri commands for saving and loading binary content were implemented as:
- Save: receive binary bytes from frontend, discard them, call
text_to_loro_bytes(content_text)to create a fresh LoroDoc from plain text, store that - Load: read BLOB from SQLite, discard it, call
text_to_loro_bytes(text_column)to create a fresh LoroDoc, return that
This appeared to work — content round-tripped correctly as visible text. But it silently destroyed the CRDT’s entire operation graph on every save.
Symptoms:
- Content appears to save and load correctly (text matches)
- But
LoroDoc.version()resets to a single operation after every save - Undo history shows one giant “insert” instead of character-level edits
- Future sync (INK-72) would treat every save as a full-document replacement, causing merge conflicts
- Event log and bookmarks would see only coarse save-level snapshots (no operation-level history)
Investigation
Steps Tried
- Manual testing — editing, saving, reloading all worked. Text matched. Problem was invisible to functional testing.
- Code review — identified that
save_block_contentcalledtext_to_loro_bytes()on the text parameter instead of storing the binarycontent_bytesparameter. The binary bytes from the frontend were received but never used. - Traced the data flow — Frontend LoroDoc (with full operation history) -> export snapshot bytes -> Tauri command
->
text_to_loro_bytes(text)creates brand new LoroDoc with single insert op -> stored to SQLite. The frontend’s rich CRDT graph was thrown away on every save.
Root Cause
The Tauri command treated the binary LoroDoc snapshot as redundant data alongside the text content, rather than as the primary storage artifact. The text content is a derived projection of the CRDT document — useful for search indexing and export, but not the source of truth.
The mental model error: thinking of the BLOB as “just another encoding of the same text” rather than understanding it carries structural information (operation history, version vectors, author IDs, timestamps) that plain text cannot represent.
Solution
Architecture Rule
The frontend’s CRDT binary snapshot is the source of truth. The backend stores it unchanged. Text is a derived projection for indexing and compatibility.
Code Changes
// Before (BROKEN — destroys CRDT history)#[tauri::command]async fn save_block_content( slug: String, content_bytes: Vec<u8>, // <-- received but IGNORED content_text: Option<String>, state: State<'_, AppState>,) -> Result<(), CommandError> { let text = content_text.unwrap_or_default(); // BUG: Creates brand-new LoroDoc from text, discarding content_bytes let blob = text_to_loro_bytes(&text); repo.save_block_content_blob(&path, &slug, &blob, &text)?; Ok(())}
// After (CORRECT — preserves CRDT history)#[tauri::command]async fn save_block_content( slug: String, content_bytes: Vec<u8>, // <-- stored directly content_text: String, state: State<'_, AppState>,) -> Result<(), CommandError> { if content_bytes.len() > MAX_CONTENT_BYTES { return Err(/* size limit error */); } // Pass the frontend's binary snapshot through untouched use_case.execute(&path, &slug, &content_bytes, &content_text)?; Ok(())}Implementation Notes
- Two content paths exist by design: the “hot” editor path passes binary BLOB through; the “cold” import path creates new LoroDoc from text (acceptable — external editors don’t participate in the CRDT graph)
- Text column is denormalized:
raw_markdownTEXT stays alongsidecontent_loroBLOB for FTS5 indexing and debugging. Updated on every save from the text parameter. - Fallback on decode failure: if the BLOB is corrupt or NULL (pre-migration data), the repository falls back to
creating a LoroDoc from the TEXT column with a
tracing::warn!. This is the one place wheretext_to_loro_bytesis acceptable at read time.
Prevention
Design Principles
-
Binary-in, binary-out: Any command that receives a CRDT binary must store/return it without re-serialization. Treat it like a file upload — you wouldn’t re-encode a JPEG from its pixel values.
-
Text is a projection, not a source: When a system has both structured binary data and a text representation, establish clearly which is authoritative. In a CRDT system, the binary document is always authoritative.
-
Layer the concern correctly: CRDT serialization belongs in the infrastructure layer, not the framework/command layer. Tauri commands should be thin pass-through adapters.
Warning Signs
- A save command that receives binary data but calls
*_to_bytes(text)instead of storing the binary directly - A load command that reads a BLOB but returns a freshly-created document instead of the stored one
- Tests that only verify text content matches, without checking CRDT metadata (version vectors, operation count)
- The word “re-create” or “re-serialize” near CRDT document handling
Test Strategy
// Test that verifies CRDT history survives a round-trip#[test]fn test_loro_blob_preserves_operation_history() { let mut doc = LoroDoc::new(); let text = doc.get_text("content"); text.insert(0, "hello").unwrap(); text.insert(5, " world").unwrap(); // doc now has 2 operations
let blob = doc.export(ExportMode::Snapshot).unwrap(); // Store blob, then load it back let restored = LoroDoc::new(); restored.import(&blob).unwrap();
// Verify: restored doc has same version, not a fresh single-op doc assert_eq!(restored.get_text("content").to_string(), "hello world"); // The version vector should reflect 2 operations, not 1 assert!(restored.oplog_vv().len() >= 2, "version vector should reflect multiple operations");}References
- Commits:
b34f6b6(P1 fix),2daecac(use case extraction),4733042(BLOB repository methods) - ADR:
docs/ADR/006-loro-crdt-block-storage.md - Loro docs: https://loro.dev/docs
- Issue: INK-73 (Loro CRDT Block Storage Adoption)
Clean Architecture Layer Boundary Remediation Patterns Next
Frontend as Dumb Pipe: No Business Logic in React Layer
Was this page helpful?
Thanks for your feedback!