RReceipt OCR App
Open App

Architecture

System Overview

Receipt OCR App (Next.js on Cloudflare Workers)
    β”œβ”€β”€ Storage Brain SDK           β†’ Cloudflare R2  (file uploads)
    β”œβ”€β”€ Google Cloud Vision         β†’ OCR            (text extraction)
    β”œβ”€β”€ OpenRouter                  β†’ LLM            (classification + chat)
    └── @marlinjai/data-table-adapter-d1 β†’ Cloudflare D1  (structured data)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Receipt OCR App (Next.js / Cloudflare Workers)       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Upload    β”‚  β”‚  Dashboard   β”‚  β”‚  AI Chat     β”‚  β”‚   API     β”‚  β”‚
β”‚  β”‚  Page      β”‚  β”‚  (4 views)   β”‚  β”‚  Sidebar     β”‚  β”‚  Routes   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β”‚                β”‚                  β”‚                β”‚        β”‚
β”‚        β–Ό                β–Ό                  β–Ό                β–Ό        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                     Component Layer                           β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚  β”‚  β”‚ Receipt      β”‚ β”‚ Data Table  β”‚ β”‚ ChatSidebar            β”‚ β”‚   β”‚
β”‚  β”‚  β”‚ Uploader     β”‚ β”‚ (4 views)   β”‚ β”‚ (SSE + tool approval)  β”‚ β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            β”‚                β”‚                     β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚                β”‚                     β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”˜       β”Œβ”€β”€β”€β”€β”€β”€β”€β”˜           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β–Ό              β–Ό                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Storage   β”‚  β”‚ Data Table    β”‚  β”‚ OpenRouter   β”‚  β”‚ Google Cloud  β”‚
β”‚ Brain SDK β”‚  β”‚ React +       β”‚  β”‚ (LLM API)   β”‚  β”‚ Vision API    β”‚
β”‚           β”‚  β”‚ D1 Adapter    β”‚  β”‚              β”‚  β”‚ (OCR)         β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                β”‚
      β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Cloudflareβ”‚  β”‚ Cloudflare    β”‚
β”‚ R2        β”‚  β”‚ D1            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Page Structure

Upload Page (/)

  • Drag-and-drop zone for images and PDFs (multi-file selection supported)
  • Batch upload queue: files are processed sequentially through upload, OCR, classify, and save phases
  • Per-file progress indicators with phase-level detail
  • Failed files do not block the remaining queue
  • Automatic redirect to dashboard when all files complete

Dashboard (/dashboard)

  • Powered by @marlinjai/data-table-react
  • 4 pre-configured views:
    • Table -- grouped by Category (default)
    • By Konto -- grouped by SKR03 account number
    • Board -- Kanban-style board grouped by Status
    • Calendar -- date-based view using the Date column
  • Column management, multi-row selection, search, filter, pagination
  • Inline cell editing
  • AI Chat sidebar toggle

Data Flow

Upload Flow (Batch)

Users can select multiple files at once. Each file is added to a queue and processed sequentially through the full pipeline. Failed files do not block subsequent files.

User drops one or more images/PDFs (or clicks to browse)
      β”‚
      β–Ό
Files added to upload queue (QueueItem[])
      β”‚
      β–Ό
β”Œβ”€β”€β”€ For each file in queue (sequential) ───────────────────────┐
β”‚                                                                β”‚
β”‚  Phase 1: Upload to Storage Brain (R2)                         β”‚
β”‚        β”‚                                                       β”‚
β”‚        β–Ό                                                       β”‚
β”‚  Phase 2: POST /api/ocr with fileId                            β”‚
β”‚        β”‚   Fetch file from Storage Brain β†’                     β”‚
β”‚        β”‚   send to Google Cloud Vision API                     β”‚
β”‚        β”‚   (images: images:annotate, PDFs: files:annotate)     β”‚
β”‚        β–Ό                                                       β”‚
β”‚  Return OcrResult { fullText, blocks, confidence }             β”‚
β”‚        β”‚                                                       β”‚
β”‚        β–Ό                                                       β”‚
β”‚  extractReceiptFields(ocrResult) β€” heuristic extraction        β”‚
β”‚        β”‚   β†’ vendor, gross, net, taxRate, date, category,      β”‚
β”‚        β”‚     konto, name                                       β”‚
β”‚        β–Ό                                                       β”‚
β”‚  Phase 3: POST /api/classify-single (AI classification)        β”‚
β”‚        β”‚   β†’ category, konto, zuordnung, confidence, reasoning β”‚
β”‚        β–Ό                                                       β”‚
β”‚  Phase 4: Create row in receipts table via D1 adapter          β”‚
β”‚        β”‚                                                       β”‚
β”‚        β–Ό                                                       β”‚
β”‚  File marked done (or error) β€” next file begins                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
All files processed β†’ redirect to dashboard

AI Chat Flow

User opens chat sidebar β†’ types message
      β”‚
      β–Ό
POST /api/chat (streaming SSE)
      β”‚   system prompt includes: table schema, select options,
      β”‚   SKR03 mappings, zuordnung options, user rules
      β–Ό
LLM responds with text + optional tool_calls
      β”‚
      β–Ό
Frontend receives SSE events:
      β”œβ”€β”€ text_delta β†’ rendered incrementally
      β”œβ”€β”€ tool_use  β†’ displayed as pending action
      β”‚       β”‚
      β”‚       β”œβ”€β”€ read-only tool (get_rows, get_columns, get_select_options)
      β”‚       β”‚       β†’ auto-executed, result sent back as tool_result
      β”‚       β”‚
      β”‚       └── write tool (update_cells, bulk_update, create_row, delete_rows)
      β”‚               β†’ requires user approval ("Apply" / "Apply All")
      β”‚               β†’ on approval: executed client-side, result sent back
      β”‚
      └── done β†’ response complete

Field Extraction Engine

Located at src/lib/extract-receipt-fields.ts (~500 lines). Returns an ExtractionResult with name, vendor, gross, net, taxRate, date, category, and konto.

Amount Extraction (multi-pass)

  1. Net: looks for lines matching subtotal/netto/before-tax labels
  2. Tax: looks for tax/VAT/MwSt labels (excluding total lines)
  3. Gross (4 passes):
    • High-priority: "grand total", "amount due", "balance due"
    • Medium-priority: generic "total" (excluding subtotal/tax lines)
    • EU keywords: "gesamt", "summe", "brutto"
    • Fallback: largest amount found anywhere in the text
  4. Derivation: if 2 of 3 values are found, the third is calculated

Vendor Extraction

  • Primary: spatial extraction from OCR bounding boxes (topmost non-noise block)
  • Fallback: first non-noise line in the first 8 lines of OCR text
  • Noise filter: skips pure numbers, addresses, metadata labels, generic headings

Date Extraction

  • Priority: labeled dates ("Date:", "Invoice Date:") first
  • Formats: ISO (2024-01-15), EU dot (15.01.2024), US slash (01/15/2024), named months (Jan 15, 2024)
  • Skips expiry/card dates

Category Inference (3-pass)

  1. Vendor lookup: matches vendor name against ~80 known vendors (e.g., "starbucks" -> Bewirtung)
  2. Keyword scan: matches full OCR text against category keyword patterns
  3. Item patterns: checks for specific line-item hints (e.g., "cappuccino" -> Bewirtung)
  4. Falls back to "Sonstige Ausgaben" if no match

D1 Adapter

The app uses @marlinjai/data-table-adapter-d1 to persist structured data directly in Cloudflare D1. The adapter is initialized in the app layout using the Cloudflare D1 binding:

// src/app/app/layout.tsx
import { D1Adapter } from '@marlinjai/data-table-adapter-d1';

setAdapter(new D1Adapter(env.DB));

The D1 binding (DB) is configured in wrangler.jsonc and the database schema lives in migrations/0001_initial.sql.

Receipt Table Schema

ColumnTypeDescription
NametextComposite summary (primary column)
VendortextMerchant name (OCR spatial extraction)
GrossnumberTotal amount incl. tax
NetnumberAmount before tax
Tax RatenumberTax percentage (e.g. 19 for 19%)
DatedateReceipt date (ISO 8601)
CategoryselectSKR03 expense category (10 options)
KontotextSKR03 account number (e.g. "4650")
StatusselectPending / Processed / Rejected
ConfidencenumberOCR or AI classification confidence
Receipt ImageurlLink to original file in Storage Brain
OCR TexttextRaw OCR text for AI context
ZuordnungselectDynamic column: Universitat / Geschaftlich / Privat

SKR03 Category-to-Konto Mapping

CategoryKonto
Bewirtung4650
Reisekosten4670
Burobedarf4930
Software & Lizenzen4806
Telefon & Internet4920
Hardware & IT4855
Miete & Nebenkosten4210
Versicherungen4360
Fachliteratur4940
Sonstige Ausgaben4900

API Routes

RouteMethodPurpose
/api/ocrPOSTFetches file from Storage Brain, sends to Google Cloud Vision, returns OcrResult
/api/classify-singlePOSTLLM classification of a single receipt via OpenRouter
/api/chatPOSTStreaming AI chat with tool use (SSE)
/api/files/[fileId]GETProxies file downloads from Storage Brain

Environment Configuration

# Storage Brain (file uploads to R2)
NEXT_PUBLIC_STORAGE_BRAIN_API_KEY=sk_live_...
NEXT_PUBLIC_STORAGE_BRAIN_URL=https://storage-brain-api.marlin-pohl.workers.dev

# Google Cloud Vision (OCR)
GOOGLE_CLOUD_VISION_API_KEY=AIza...

# OpenRouter (AI classification + chat)
OPENROUTER_API_KEY=sk-or-v1-...

# Optional: override AI models
# AI_MODEL=anthropic/claude-sonnet-4-20250514
# AI_CLASSIFY_MODEL=anthropic/claude-sonnet-4-20250514

Database connectivity is handled via the Cloudflare D1 binding (DB) configured in wrangler.jsonc -- no environment variables needed.

Deployment

Target: Cloudflare Workers via @opennextjs/cloudflare

The app is deployed at receipts.lumitra.co. Server-side secrets (GOOGLE_CLOUD_VISION_API_KEY, OPENROUTER_API_KEY) are configured as Cloudflare Workers secrets. Client-side env vars use the NEXT_PUBLIC_ prefix.