Colibri supports uploading EPUB, MOBI, and PDF ebooks with automatic metadata extraction and enrichment from external sources.
Supported Formats
Colibri can read and store the following ebook formats:
| Format | Extension | Metadata Support | Cover Extraction | Features |
|---|---|---|---|---|
| EPUB | .epub | Full support | Yes | Best format, native support for all metadata fields including series, authors, publisher, description, and subjects |
| MOBI | .mobi, .azw, .azw3 | Full support | Yes | Full metadata including Amazon-specific fields, creator information, and cover images |
.pdf | Limited | Sometimes | Basic metadata (title, author, subject), cover extraction from first page |
Format Recommendations
- EPUB: Recommended for fiction and non-fiction. Best metadata support and smallest file sizes.
- MOBI/AZW: Good for Kindle books. Full metadata support but larger files.
- PDF: Suitable for technical books and documents, but limited metadata extraction.
Upload Methods
Single Upload
Upload one book at a time with immediate metadata review:
- Click the Upload button in the sidebar (or press
Cmd+U/Ctrl+U) - Drag and drop a file onto the upload area, or click Browse to select a file
- Wait for the file to upload and metadata to be extracted (usually 2-5 seconds)
- Review the extracted metadata:
- Title
- Authors/Contributors
- Cover image
- Publisher and publication date
- Description/Synopsis
- ISBN and other identifiers
- Language
- Series information (if available)
- Edit any fields as needed
- Click Enrich Metadata to fetch additional data from external sources (optional)
- Click Save to add the book to your library
When to use single upload:
- You want to carefully review each book’s metadata
- You’re uploading rare or unusual books that may need manual metadata corrections
- You want to immediately add the book to specific collections
Bulk Upload
Upload multiple books at once for faster library building:
- Click the Upload button in the sidebar
- Select multiple files at once:
- Drag and drop: Select multiple files in your file explorer and drag them to the upload area
- File browser: Hold
Cmd(Mac) orCtrl(Windows/Linux) while selecting multiple files
- Files are added to the upload queue and processed automatically
- Monitor progress in the Queue Status Widget (bottom-right corner):
- Shows total files in queue
- Current file being processed
- Processing status (extracting, enriching, saving)
- Any errors encountered
- Each book is processed sequentially to avoid overwhelming the database
- Review and edit metadata for each book as it completes processing
- Click Save on each book to add it to your library
Queue Processing:
- Files are processed one at a time to ensure data integrity
- Average processing time: 5-10 seconds per book (depending on enrichment)
- Queue persists across page refreshes
- You can continue browsing while uploads process in the background
When to use bulk upload:
- You’re importing an existing library
- You have many books with good embedded metadata
- You trust the automatic enrichment process
Upload from CLI
Advanced users can upload books using the Colibri CLI:
# Upload a single book
colibri works add ./path/to/book.epub
# Upload all books in a directory
colibri works import ./path/to/books/See the CLI Documentation for more details.
Automatic Metadata Extraction
When you upload a book, Colibri automatically extracts embedded metadata from the ebook file:
Core Metadata
- Title: Main title and subtitle (if available)
- Sort Title: Automatically generated for proper alphabetical sorting (e.g., “Great Gatsby, The” for “The Great Gatsby”)
- Authors: Creator names with roles (author, editor, translator, illustrator, etc.)
- Publisher: Publishing company name
- Publication Date: Original publication date or edition date
- Language: ISO 639-1 language code (e.g., “en” for English, “de” for German)
Identifiers
- ISBN: ISBN-10 and/or ISBN-13 (automatically validated and normalized)
- ASIN: Amazon Standard Identification Number (for Kindle books)
- DOI: Digital Object Identifier (for academic publications)
- LCCN: Library of Congress Control Number
Content Metadata
- Description: Book synopsis or summary
- Subjects/Tags: Categories, genres, and topics (automatically parsed from BISAC codes if present)
- Series: Series name and position/volume number
- Page Count: Number of pages (for EPUB and MOBI)
Visual Elements
- Cover Image: Extracted from the ebook file and automatically generated thumbnails
- Blurhash: Low-resolution preview for fast loading
Format-Specific Extraction
EPUB:
- Reads metadata from
content.opffile - Extracts Dublin Core metadata
- Parses Calibre metadata extensions
- Supports series information from meta tags
MOBI:
- Reads metadata from EXTH records
- Extracts Amazon-specific fields
- Supports creator information and contributor roles
PDF:
- Reads metadata from PDF Info dictionary
- Limited to basic fields (title, author, subject, keywords)
- Attempts to extract cover from first page
Metadata Enrichment
After extracting embedded metadata, Colibri can automatically enrich it with data from external sources.
What Gets Enriched?
Enrichment fills in missing fields and improves existing metadata:
- Missing authors: Find creator names from ISBN lookup
- Missing publisher: Discover publisher information
- Missing description: Add synopsis from book databases
- Missing subjects: Add genre tags and categories
- Missing series: Detect series relationships
- Cover image: Find higher-quality covers
- Publication details: Add precise publication dates
- Identifiers: Discover additional ISBNs and identifiers
Enrichment Process
Trigger enrichment:
- Automatically during upload (if enabled in settings)
- Manually by clicking Enrich Metadata on the upload form
- Later from the book detail page by clicking Fetch Metadata
Provider selection:
- Colibri queries multiple metadata providers in parallel
- Default providers: Open Library, WikiData, Library of Congress
- Additional providers (requires API keys): Google Books, ISBNdb, Springer
Data aggregation:
- Results from all providers are collected
- Confidence scores are calculated for each field
- Conflicts are detected and resolved automatically
Preview and apply:
- Review suggested changes before applying
- Each field shows the source and confidence score
- Accept all changes, or selectively apply specific fields
- Manually edit any values before saving
See Metadata Enrichment for detailed information about the enrichment system.
Enrichment Strategies
Conservative (Default):
- Only fills in missing fields
- Does not overwrite existing metadata
- Best for books with good embedded metadata
Aggressive:
- Overwrites existing metadata with higher-confidence data
- Replaces all fields if external sources have better data
- Best for books with poor or missing metadata
Merge:
- Combines embedded and external metadata
- Keeps all unique subjects/tags
- Adds additional authors and identifiers
- Best for comprehensive metadata coverage
Configure enrichment strategy in Settings > Instance > Metadata.
Duplicate Detection
Colibri automatically detects potential duplicates to prevent importing the same book multiple times.
Detection Methods
1. Exact File Match
- Compares SHA-256 hash of the file content
- Confidence: 100% (identical file)
- Action: Skip upload by default, or prompt to replace
2. ISBN Match
- Compares ISBN-10 or ISBN-13 identifiers
- Confidence: 95%+ (same edition)
- Action: Add as new edition or prompt
- Note: Different formats (EPUB vs MOBI) of the same book will match
3. Fuzzy Title and Author Match
- Compares normalized titles and author names using Levenshtein distance
- Ignores case, punctuation, and common words (a, an, the)
- Confidence: 70-90% (depends on similarity)
- Action: Always prompts for review
Duplicate Handling Options
When a duplicate is detected, you can:
Skip Upload
- Discard the new file
- Keep the existing book
- No changes to your library
Replace Existing
- Delete the old file
- Upload the new file
- Keep existing metadata (unless you choose to re-enrich)
- Use when you have a better quality file
Add as New Edition
- Keep both files
- Link them as different editions of the same work
- Useful for different formats (EPUB and PDF) or different editions (original and annotated)
- Share metadata between editions
Import Anyway
- Treat as a completely separate work
- Useful for false positive matches
- Creates a new work entry
Duplicate Detection Settings
Configure detection sensitivity in Settings > Instance > Library:
- Strict: Only exact ISBN or file hash matches
- Normal (Default): ISBN matches and high-confidence title/author matches (85%+)
- Relaxed: Include lower-confidence fuzzy matches (70%+)
- Off: Disable duplicate detection entirely
Queue Status Monitoring
The Queue Status Widget in the bottom-right corner shows real-time upload progress.
Widget States
Idle
- No uploads in progress
- Widget is minimized or hidden
Processing
- Shows current file name and progress
- Displays processing stage:
- Uploading: Transferring file to server
- Extracting: Reading embedded metadata
- Enriching: Querying external sources
- Saving: Writing to database
- Progress bar shows completion percentage
Error
- Red indicator with error count
- Click to expand and see error details
- Options to retry failed uploads or skip them
Queue Actions
- Expand/Collapse: Click widget to show/hide details
- Pause: Temporarily stop processing
- Resume: Continue processing after pause
- Clear Queue: Remove all pending uploads
- Retry Failed: Attempt to re-process failed uploads
Queue Persistence
- Queue state is saved to browser storage
- Survives page refreshes and browser restarts
- Automatically resumes when you return to Colibri
- Queue is per-user (not shared across devices)
Error Handling and Troubleshooting
Common Upload Errors
File Too Large
- Error: “File exceeds maximum size limit”
- Cause: File is larger than 100 MB
- Solution: Use a file compression tool or split into smaller files
- Note: Administrators can adjust the limit in instance settings
Unsupported Format
- Error: “File type not supported”
- Cause: File is not a valid EPUB, MOBI, or PDF
- Solution: Convert the file to a supported format using Calibre or similar tools
Corrupt File
- Error: “Unable to read file” or “Invalid ebook format”
- Cause: File is damaged or incomplete
- Solution: Re-download the file or obtain a new copy
Metadata Extraction Failed
- Error: “Could not extract metadata”
- Cause: File has malformed metadata or uses an unusual format
- Solution: Upload will continue, but you’ll need to enter metadata manually
Enrichment Timeout
- Error: “Metadata enrichment timed out”
- Cause: External providers are slow or unavailable
- Solution: Skip enrichment and add metadata manually, or retry later
Storage Error
- Error: “Failed to save file to storage”
- Cause: S3 storage is misconfigured or unreachable
- Solution: Contact your administrator to check storage settings
Retry Logic
Colibri automatically retries transient errors:
- Network errors: 3 retries with exponential backoff
- Server errors (5xx): 2 retries
- Client errors (4xx): No retries (requires manual intervention)
Getting Help
If you encounter persistent upload errors:
- Check the browser console for detailed error messages (F12 → Console tab)
- Try uploading a different file to isolate the issue
- Verify your storage configuration in Settings
- Contact your instance administrator
- Report bugs at https://github.com/colibri-hq/colibri/issues
Format Conversion Tips
Converting to EPUB
If you have books in unsupported formats, convert them to EPUB:
Using Calibre (Recommended):
# Install Calibre
# Visit https://calibre-ebook.com/
# Convert single file
ebook-convert input.azw output.epub
# Convert with metadata
ebook-convert input.pdf output.epub --authors "Author Name" --title "Book Title"Using Pandoc (for text-based formats):
# Install Pandoc
# Visit https://pandoc.org/
# Convert Markdown to EPUB
pandoc input.md -o output.epub --metadata title="Book Title"Preserving Metadata During Conversion
- Use Calibre’s metadata editor before converting
- Export metadata from original file format
- Re-embed metadata after conversion
- Manually verify metadata after upload
Best Practices
Before Uploading
- Organize files: Use consistent naming (e.g., “Author - Title.epub”)
- Pre-clean metadata: Use Calibre to fix obvious metadata errors
- Check for duplicates: Search your library before uploading
- Verify file integrity: Ensure files are not corrupt
During Upload
- Use enrichment: Let Colibri fill in missing metadata automatically
- Review carefully: Check that automatic metadata is accurate
- Add to collections: Organize books immediately during upload
- Tag appropriately: Add custom tags for better discoverability
After Upload
- Verify cover images: Ensure covers loaded correctly
- Check series relationships: Verify series order if applicable
- Add reviews: Rate and review books you’ve read
- Share collections: Make your curated lists public (if desired)
Keyboard Shortcuts
Speed up your upload workflow with keyboard shortcuts:
Cmd/Ctrl + U: Open upload modalCmd/Ctrl + V: Paste files from clipboard (if supported by browser)Cmd/Ctrl + Enter: Save and add book to libraryEscape: Cancel upload and close modal
Next Steps
After uploading your books:
- Organize with Collections: Create custom reading lists
- Enrich Metadata: Improve book information
- Rate and Review: Share your opinions
- Search Your Library: Find books in your library