Architecture
System Overview
Receipt OCR App (Next.js on Cloudflare Workers)
βββ Storage Brain SDK β Cloudflare R2 (file uploads)
βββ Google Cloud Vision β OCR (text extraction)
βββ OpenRouter β LLM (classification + chat)
βββ @marlinjai/data-table-adapter-d1 β Cloudflare D1 (structured data)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Receipt OCR App (Next.js / Cloudflare Workers) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββ ββββββββββββββββ ββββββββββββββββ βββββββββββββ β
β β Upload β β Dashboard β β AI Chat β β API β β
β β Page β β (4 views) β β Sidebar β β Routes β β
β βββββββ¬βββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ βββββββ¬ββββββ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Component Layer β β
β β ββββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ β β
β β β Receipt β β Data Table β β ChatSidebar β β β
β β β Uploader β β (4 views) β β (SSE + tool approval) β β β
β β ββββββββ¬ββββββββ ββββββββ¬βββββββ ββββββββββββββ¬ββββββββββββ β β
β βββββββββββΌβββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββ β
β β β β β
ββββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββββ
β β β
ββββββββ βββββββββ ββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββββββ ββββββββββββββββ βββββββββββββββββ
β Storage β β Data Table β β OpenRouter β β Google Cloud β
β Brain SDK β β React + β β (LLM API) β β Vision API β
β β β D1 Adapter β β β β (OCR) β
βββββββ¬ββββββ βββββββββ¬ββββββββ ββββββββββββββββ βββββββββββββββββ
β β
βΌ βΌ
βββββββββββββ βββββββββββββββββ
β Cloudflareβ β Cloudflare β
β R2 β β D1 β
βββββββββββββ βββββββββββββββββ
Page Structure
Upload Page (/)
- Drag-and-drop zone for images and PDFs (multi-file selection supported)
- Batch upload queue: files are processed sequentially through upload, OCR, classify, and save phases
- Per-file progress indicators with phase-level detail
- Failed files do not block the remaining queue
- Automatic redirect to dashboard when all files complete
Dashboard (/dashboard)
- Powered by
@marlinjai/data-table-react - 4 pre-configured views:
- Table -- grouped by Category (default)
- By Konto -- grouped by SKR03 account number
- Board -- Kanban-style board grouped by Status
- Calendar -- date-based view using the Date column
- Column management, multi-row selection, search, filter, pagination
- Inline cell editing
- AI Chat sidebar toggle
Data Flow
Upload Flow (Batch)
Users can select multiple files at once. Each file is added to a queue and processed sequentially through the full pipeline. Failed files do not block subsequent files.
User drops one or more images/PDFs (or clicks to browse)
β
βΌ
Files added to upload queue (QueueItem[])
β
βΌ
ββββ For each file in queue (sequential) ββββββββββββββββββββββββ
β β
β Phase 1: Upload to Storage Brain (R2) β
β β β
β βΌ β
β Phase 2: POST /api/ocr with fileId β
β β Fetch file from Storage Brain β β
β β send to Google Cloud Vision API β
β β (images: images:annotate, PDFs: files:annotate) β
β βΌ β
β Return OcrResult { fullText, blocks, confidence } β
β β β
β βΌ β
β extractReceiptFields(ocrResult) β heuristic extraction β
β β β vendor, gross, net, taxRate, date, category, β
β β konto, name β
β βΌ β
β Phase 3: POST /api/classify-single (AI classification) β
β β β category, konto, zuordnung, confidence, reasoning β
β βΌ β
β Phase 4: Create row in receipts table via D1 adapter β
β β β
β βΌ β
β File marked done (or error) β next file begins β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
All files processed β redirect to dashboard
AI Chat Flow
User opens chat sidebar β types message
β
βΌ
POST /api/chat (streaming SSE)
β system prompt includes: table schema, select options,
β SKR03 mappings, zuordnung options, user rules
βΌ
LLM responds with text + optional tool_calls
β
βΌ
Frontend receives SSE events:
βββ text_delta β rendered incrementally
βββ tool_use β displayed as pending action
β β
β βββ read-only tool (get_rows, get_columns, get_select_options)
β β β auto-executed, result sent back as tool_result
β β
β βββ write tool (update_cells, bulk_update, create_row, delete_rows)
β β requires user approval ("Apply" / "Apply All")
β β on approval: executed client-side, result sent back
β
βββ done β response complete
Field Extraction Engine
Located at src/lib/extract-receipt-fields.ts (~500 lines). Returns an ExtractionResult with name, vendor, gross, net, taxRate, date, category, and konto.
Amount Extraction (multi-pass)
- Net: looks for lines matching subtotal/netto/before-tax labels
- Tax: looks for tax/VAT/MwSt labels (excluding total lines)
- Gross (4 passes):
- High-priority: "grand total", "amount due", "balance due"
- Medium-priority: generic "total" (excluding subtotal/tax lines)
- EU keywords: "gesamt", "summe", "brutto"
- Fallback: largest amount found anywhere in the text
- Derivation: if 2 of 3 values are found, the third is calculated
Vendor Extraction
- Primary: spatial extraction from OCR bounding boxes (topmost non-noise block)
- Fallback: first non-noise line in the first 8 lines of OCR text
- Noise filter: skips pure numbers, addresses, metadata labels, generic headings
Date Extraction
- Priority: labeled dates ("Date:", "Invoice Date:") first
- Formats: ISO (
2024-01-15), EU dot (15.01.2024), US slash (01/15/2024), named months (Jan 15, 2024) - Skips expiry/card dates
Category Inference (3-pass)
- Vendor lookup: matches vendor name against ~80 known vendors (e.g., "starbucks" -> Bewirtung)
- Keyword scan: matches full OCR text against category keyword patterns
- Item patterns: checks for specific line-item hints (e.g., "cappuccino" -> Bewirtung)
- Falls back to "Sonstige Ausgaben" if no match
D1 Adapter
The app uses @marlinjai/data-table-adapter-d1 to persist structured data directly in Cloudflare D1. The adapter is initialized in the app layout using the Cloudflare D1 binding:
// src/app/app/layout.tsx
import { D1Adapter } from '@marlinjai/data-table-adapter-d1';
setAdapter(new D1Adapter(env.DB));The D1 binding (DB) is configured in wrangler.jsonc and the database schema lives in migrations/0001_initial.sql.
Receipt Table Schema
| Column | Type | Description |
|---|---|---|
Name | text | Composite summary (primary column) |
Vendor | text | Merchant name (OCR spatial extraction) |
Gross | number | Total amount incl. tax |
Net | number | Amount before tax |
Tax Rate | number | Tax percentage (e.g. 19 for 19%) |
Date | date | Receipt date (ISO 8601) |
Category | select | SKR03 expense category (10 options) |
Konto | text | SKR03 account number (e.g. "4650") |
Status | select | Pending / Processed / Rejected |
Confidence | number | OCR or AI classification confidence |
Receipt Image | url | Link to original file in Storage Brain |
OCR Text | text | Raw OCR text for AI context |
Zuordnung | select | Dynamic column: Universitat / Geschaftlich / Privat |
SKR03 Category-to-Konto Mapping
| Category | Konto |
|---|---|
| Bewirtung | 4650 |
| Reisekosten | 4670 |
| Burobedarf | 4930 |
| Software & Lizenzen | 4806 |
| Telefon & Internet | 4920 |
| Hardware & IT | 4855 |
| Miete & Nebenkosten | 4210 |
| Versicherungen | 4360 |
| Fachliteratur | 4940 |
| Sonstige Ausgaben | 4900 |
API Routes
| Route | Method | Purpose |
|---|---|---|
/api/ocr | POST | Fetches file from Storage Brain, sends to Google Cloud Vision, returns OcrResult |
/api/classify-single | POST | LLM classification of a single receipt via OpenRouter |
/api/chat | POST | Streaming AI chat with tool use (SSE) |
/api/files/[fileId] | GET | Proxies file downloads from Storage Brain |
Environment Configuration
# Storage Brain (file uploads to R2)
NEXT_PUBLIC_STORAGE_BRAIN_API_KEY=sk_live_...
NEXT_PUBLIC_STORAGE_BRAIN_URL=https://storage-brain-api.marlin-pohl.workers.dev
# Google Cloud Vision (OCR)
GOOGLE_CLOUD_VISION_API_KEY=AIza...
# OpenRouter (AI classification + chat)
OPENROUTER_API_KEY=sk-or-v1-...
# Optional: override AI models
# AI_MODEL=anthropic/claude-sonnet-4-20250514
# AI_CLASSIFY_MODEL=anthropic/claude-sonnet-4-20250514
Database connectivity is handled via the Cloudflare D1 binding (DB) configured in wrangler.jsonc -- no environment variables needed.
Deployment
Target: Cloudflare Workers via @opennextjs/cloudflare
The app is deployed at receipts.lumitra.co. Server-side secrets (GOOGLE_CLOUD_VISION_API_KEY, OPENROUTER_API_KEY) are configured as Cloudflare Workers secrets. Client-side env vars use the NEXT_PUBLIC_ prefix.