Skip to content
Documentation GitHub
Workspace

Attachments — File Upload, Storage & Lifecycle

Attachments — File Upload, Storage & Lifecycle

Covers the complete attachment lifecycle: uploading files by path and by raw bytes, retrieving metadata and file content, listing attachments scoped to a page, deleting attachments, content-addressable deduplication via SHA-256, extension allowlist enforcement, storage quota limits, and cleanup of on-disk files and image variant caches. This spec is P2 because attachments are the primary mechanism for embedding non-text content into a workspace — breakage here prevents users from attaching images, PDFs, and documents to their pages.

The attachment system enforces an allowlist of file extensions at the domain layer (ALLOWED_EXTENSIONS), a 10 MB per-file quota and 100 MB workspace quota at the use-case layer, and content-addressable deduplication so identical files share one on-disk copy. Files are stored at {workspace}/attachments/{uuid}.{ext}. Metadata is persisted in SQLite. Sync state begins as local_only and is updated asynchronously.

Preconditions

  • HTTP bridge running on port 9990
  • A workspace initialized via initialize_workspace before each scenario
  • Bridge shim injected via playwright.config.ts
  • The HTTP bridge exposes all attachment routes: upload_attachment, upload_attachment_bytes, get_attachment, get_attachment_file, list_attachments, and delete_attachment. All scenarios in this spec are exercisable via the bridge.

Scenarios

Seed: seed.spec.ts

1. Upload an attachment by file path

upload_attachment reads a file from a given path, computes its SHA-256 hash, writes it to the workspace attachments directory, and returns the attachment metadata.

Steps:

  1. Prepare a small PNG file on disk at a known path (e.g., a 1 KB test image).
  2. Create a page titled “Image Owner”.
  3. Call upload_attachment with the file path pointing to the PNG.

Expected: The response is an Attachment object with original_filename: "test.png", file_extension: "png", content_type matching PNG, size_bytes > 0, a non-nil id (UUID), and sync_status: "local_only". A file exists at {workspace}/attachments/{id}.png on disk.

2. Upload an attachment via raw bytes

upload_attachment_bytes accepts raw file content directly, bypassing the filesystem path. Used for clipboard paste and drag-drop.

Steps:

  1. Create a page titled “Bytes Upload”.
  2. Call upload_attachment_bytes with file_name: "pasted.png" and non-empty byte array representing a minimal PNG.

Expected: The response is an Attachment object with original_filename: "pasted.png", file_extension: "png", and size_bytes equal to the byte array length. The attachment file exists on disk.

3. Upload with empty file name is rejected

Providing an empty file_name is caught as a validation error before any disk I/O.

Steps:

  1. Call upload_attachment_bytes with file_name: "" and non-empty bytes.

Expected: A validation error is returned (no attachment is created). The error message indicates the file name cannot be empty.

4. Upload with empty bytes is rejected

Providing a zero-length byte array is caught as a validation error.

Steps:

  1. Call upload_attachment_bytes with file_name: "empty.png" and an empty byte array ([]).

Expected: A validation error is returned. The error message indicates the file must be non-empty (or “greater than zero” for size). No attachment metadata is saved.

5. Upload with a disallowed file extension is rejected

The domain allowlist (ALLOWED_EXTENSIONS) rejects unknown file types at validation time.

Steps:

  1. Call upload_attachment_bytes with file_name: "virus.exe" and non-empty bytes.

Expected: A validation error is returned. The error message indicates the .exe extension is not allowed. No file is written to disk and no metadata is saved.

6. Upload with an allowed extension succeeds

All extensions in the domain allowlist are accepted.

Steps:

  1. For each of the following extensions: png, jpg, pdf, docx, csv, upload a minimal byte array with file_name matching the extension (e.g., "doc.pdf", "data.csv").

Expected: Each upload returns an Attachment object with the correct file_extension. No validation errors occur. At least png, pdf, and csv succeed.

7. SHA-256 deduplication — same content returns the same attachment

Uploading the same file content twice does not create a second on-disk copy. The existing attachment is returned.

Steps:

  1. Call upload_attachment_bytes with file_name: "first.png" and a fixed byte array (e.g., [0x89, 0x50, 0x4E, 0x47, ...]).
  2. Record the returned id from the first upload.
  3. Call upload_attachment_bytes again with file_name: "second.png" and the same byte array.
  4. Record the returned id from the second upload.

Expected: Both calls succeed and return the same id. Only one file exists at {workspace}/attachments/{id}.png. The original_filename field reflects whichever upload was first (not “second.png”). This confirms content-addressable deduplication is working.

8. Get attachment metadata by ID

get_attachment returns the stored metadata for an uploaded attachment.

Steps:

  1. Upload a file and record its id.
  2. Call get_attachment with the id.

Expected: The response matches the metadata returned by the upload: same id, original_filename, file_extension, content_type, size_bytes, and content_hash. The created_at and updated_at timestamps are present and non-empty.

9. Get attachment metadata for non-existent ID returns not-found

Requesting metadata for an ID that was never uploaded returns an appropriate error.

Steps:

  1. Generate a random UUID (do not upload any file with this ID).
  2. Call get_attachment with the random UUID.

Expected: A “not found” error is returned. No crash or internal server error occurs.

10. Get attachment file content

get_attachment_file returns the raw bytes of an uploaded file.

Steps:

  1. Upload a file with known content (e.g., a byte array containing "test content").
  2. Record the id.
  3. Call get_attachment_file with the id.

Expected: The response body contains the original file bytes exactly. The Content-Type header matches the attachment’s MIME type (e.g., image/png). The byte count equals the uploaded size_bytes.

11. List all attachments in the workspace

list_attachments without a page_id filter returns all attachments in the workspace.

Steps:

  1. Upload three files: "alpha.png", "beta.pdf", "gamma.txt".
  2. Call list_attachments with no page_id argument.

Expected: The response is an array of at least 3 Attachment objects. Each uploaded file appears in the list. The list includes correct metadata for each entry.

12. List attachments scoped to a page

list_attachments filtered by page_id returns only attachments referenced by that page.

Steps:

  1. Create two pages: “Page Alpha” and “Page Beta”.
  2. Upload "alpha.png" and associate it with “Page Alpha” (via upsert_references or by embedding in the page content).
  3. Upload "beta.png" and associate it with “Page Beta”.
  4. Call list_attachments with the page_id of “Page Alpha”.

Expected: The response contains only "alpha.png". "beta.png" does not appear. The list is correctly scoped to the specified page.

13. Delete an attachment — metadata and file removed

delete_attachment removes both the SQLite metadata row and the on-disk file.

Steps:

  1. Upload a file and record its id and file_extension.
  2. Verify the file exists at {workspace}/attachments/{id}.{ext}.
  3. Call delete_attachment with the id.
  4. Call get_attachment with the same id.

Expected: The delete succeeds with no error. The subsequent get_attachment call returns a “not found” error. The on-disk file no longer exists at its original path.

14. Delete an attachment with active page references is blocked

delete_attachment (single delete) is blocked when a page still references the attachment, to prevent dangling references.

Steps:

  1. Upload a file and record its id.
  2. Associate the attachment with a page via upsert_references.
  3. Call delete_attachment with the id.

Expected: The delete returns a validation error indicating the attachment is “referenced by” one or more pages. The attachment metadata and file on disk are unchanged. get_attachment still succeeds.

15. Per-file size limit enforcement

Files exceeding the 10 MB per-file limit (use-case layer) or 16 MB limit (Tauri command layer for bytes upload) are rejected.

Steps:

  1. Construct a byte array larger than 10 MB (e.g., 11 * 1024 * 1024 bytes of zeros) with file_name: "huge.pdf".
  2. Call upload_attachment_bytes with this oversized byte array.

Expected: A quota or validation error is returned. The error message mentions the file size limit (either “per-file limit” or “exceeds maximum”). No file is written to disk.

16. Attachment metadata includes all expected fields

The Attachment struct returned from any upload command includes all required metadata fields.

Steps:

  1. Upload a file "metadata-check.png" with a minimal PNG byte array.
  2. Inspect all fields of the returned Attachment.

Expected: The returned object contains: id (non-nil UUID), original_filename ("metadata-check.png"), file_extension ("png"), content_type (Png variant), size_bytes (> 0), content_hash (64-character hex SHA-256 string), created_at (ISO timestamp), updated_at (ISO timestamp), and sync_status ("local_only").

17. Attachment persists after page navigation

Uploaded attachments are not lost when the user navigates away from the page and returns.

Steps:

  1. Upload a file and record its id.
  2. Navigate to a different page.
  3. Navigate back to the original page.
  4. Call get_attachment with the recorded id.

Expected: The attachment is still retrievable with all original metadata intact. No data was lost during navigation.

Test Data

KeyValueNotes
test_png_nametest.pngFilename for basic upload scenarios
test_png_bytesminimal PNG magic bytesSmallest valid PNG (26 bytes minimum)
disallowed_extensionexeMust be rejected by allowlist
allowed_extensionspng, jpg, pdf, docx, csvSampling of allowed extensions for coverage
dedup_contentfixed byte array (same both uploads)Identical bytes to trigger SHA-256 dedup
per_file_limit_bytes10 * 1024 * 1024 (10 MB)Use-case level limit
tauri_bytes_limit16 * 1024 * 1024 (16 MB)Tauri command layer limit for upload_attachment_bytes
storage_path_pattern{workspace}/attachments/{uuid}.{ext}On-disk file naming convention
cache_thumb_pattern{workspace}/attachments/.cache/{uuid}_thumb.webpThumbnail cache path cleaned on delete
cache_display_pattern{workspace}/attachments/.cache/{uuid}_display.webpDisplay cache path cleaned on delete
default_sync_statuslocal_onlyInitial sync status for all uploaded attachments

Notes

  • The HTTP bridge router does not currently expose attachment commands (upload_attachment, get_attachment, get_attachment_file, list_attachments, delete_attachment). These are Tauri IPC commands invoked from the React frontend. E2E tests for these scenarios must run against the full desktop app rather than the bridge.
  • upload_attachment validates that the file path exists and is a regular file before reading. Path traversal characters are rejected by validate_ipc_path at the Tauri command layer.
  • SHA-256 deduplication is content-addressable: the get_by_hash lookup runs before file write and quota checks. If a hash match is found, the existing Attachment is returned immediately without writing a new file.
  • delete_attachment (single) is blocked by active page references. BulkDeleteAttachmentsUseCase with force: true overrides this guard. Single-delete tests should verify that references are cleared before expecting a successful delete.
  • When an attachment is deleted, both the primary file ({uuid}.{ext}) and any cached image variants ({uuid}_thumb.webp, {uuid}_display.webp in the .cache/ subdirectory) are removed.
  • The domain validation runs at creation time via Attachment::new. Infrastructure reads use Attachment::from_parts (bypasses validation). This means a file stored with an extension that later becomes disallowed will still be readable.
  • Magic-byte validation applies to image content types only. If the file extension says PNG but the bytes are JPEG magic, the upload is rejected. Non-image files (PDFs, office docs) only log a warning on mismatch.
  • The content_hash field is a lowercase SHA-256 hex digest (64 characters). Tests checking this field should verify the length and character class, not an exact value.

Was this page helpful?